15 Ağustos 2022 Pazartesi

UTF-8 Desteği

Giriş
Eskiden varsayılan "character encoding" utf8 idi. Ancak artık utf8mb4
Yani özetlersek CREATE TABLE ile 
CHARACTER SET olarak utf8mb4 ve COLLATE olarak utf8mb4_general_ci kullanmak lazım

utf8 vs. utf8mb4
Açıklaması şöyle
The core reason for the separation of utf8 and utf8mb4 is that UTF-8 is different from proper UTF-8 encoding. That's the case because UTF-8 doesn't offer full Unicode support, which can lead to data loss or even security issues. UTF-8's failure to fully support Unicode is the real kicker - the UTF-8 encoding needs up to four bytes per character, while the "utf8" encoding offered by MySQL only supports three. See the issue on that front? In other words, if we want to store smilies represented like so:

We cannot do it - it's not that MySQL will store it in a format of "???" or similar, but it won't store it altogether and will respond with an error message like the following:
Incorrect string value: '\x77\xD0' for column 'demo_column' at row 1

With this error message, MySQL is saying "well, I don't recognize the characters that this smiley is made out of. Sorry, nothing I can do here" - at this point, you might be wondering what is being done to overcome such a problem. 
...
That workaround is called "utf8mb4". utf8mb4 is pretty much the same as its older counterpart - utf8 - it's just that the encoding uses one to four bytes per character which essentially means that it's able to support a wider variety of symbols and characters.
Collation
Collation yani metinlerin sıralanması
1. Veri tabanı seviyesinde
2. Tablo seviyesinde
3. Sütun seviyesinde
yapılabilir

utf8_general_ci collation
Character encoding olarak utf8 kullanıyorsak varsayılan collation utf8_general_ci

utf8mb4_general_ci collation
Character encoding olarak utf8mb4 kullanıyorsak varsayılan collation utf8mb4_general_ci
Açıklaması şöyle. Burada ci uzantısı "case insensitive" anlamına geliyor. Yani sorting ve comparison işlemlerinde bu kullanılıyor
- utf8mb4_general_ci is geared towards a more "general" use of MySQL and utf8. This character set is widely regarded to take "shortcuts" towards data storage which may result in sorting errors in some cases to improve speed.
utf8mb4_general_ci kullanılınca karşılaşılan hatalardan birisi unique constraint hatası. Açıklaması şöyle
That is, "Fred" and "freD" are considered equal at the database level. If you have a unique constraint on a field, it would be illegal to try to insert both "aa" and "AA" into the same column, since they compare as equal (and, hence, non-unique) with the default collation. If you want case-sensitive comparisons on a particular column or table, change the column or table to use the utf8_bin collation.

Şöyle yaparız
CREATE DATABASE demo_db CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
USE tests;

CREATE TABLE demo_tbl (
  'archtype_field' VARCHAR(100) DEFAULT NULL
) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
utf8mb4_unicode_ci collation
Açıklaması şöyle. Burada ci uzantısı "case insensitive" anlamına geliyor. Yani sorting ve comparison işlemlerinde bu kullanılıyor
 utf8mb4_unicode_ci is geared towards "advanced" users - that is, it's a set of collations that is based on Unicode and we can rest assured that our data will be dealt with properly if this collation is in use.


Hiç yorum yok:

Yorum Gönder

Soft Delete

Giriş Açıklaması  şöyle When using the soft delete mechanism on the database, you might run into a situation where a record with a unique co...