UTF-8 in PHP

How PHP supports UTF-8

Chunki Jun @ Modern PUG

Why?

PHP: The Right Way

Why?

PHP Best Practices

UTF-8 @ PHP Best Practices

  1. PHP
    mb_*
  2. Database(MySQL)
    utf8mb4
  3. Browser
    mb_http_output(), ...

1. PHP

Multibyte String Functions

substr() ➨ mb_substr()

strpos() ➨ mb_strpos()

strlen() ➨ mb_strlen()

But, Not all of them!

http://php.net/ref.mbstring

1. PHP

set encoding as UTF-8 explicitly

at script


mb_internal_encoding("UTF-8");
							

But, mb_internal_encoding() uses PHP settings:

  • default_charset
  • mbstring.internal_encoding - deprecated (PHP 5.6)

1. PHP

set encoding as UTF-8 explicitly

at functions


// example at http://php.net/function.htmlentities
echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
							

But, functions use default_charset


string htmlentities ( string $string
	[, int $flags = ENT_COMPAT | ENT_HTML401
	[, string $encoding = ini_get("default_charset")
	[, bool $double_encode = true ]]] )
							

2. Database (MySQL)

Use utf8mb4 (MySQL >= 5.5.3)

the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters

Should I use utf8mb4?

Plane Detail Result
BMP Basic Multilingual Plane utf8 == utf8mb4
SMP Supplementary Multilingual Plane utf8 != utf8mb4

BMP

Ḁ ゐ ⼊ 갊

SMP

😀 🀄 🃍 💩

http://en.wikipedia.org/wiki/Plane_(Unicode)

Browser

  • mb_http_output()
    mb_http_output('UTF-8');

    No need if file saved w/ UTF-8

  • Use charset

Conclusion

  • check php.ini - default_charset, mbstring.internal_encoding
  • Use utf8mb4 if you use 💩
  • Save files w/ UTF-8

Conclusion

UTF-8 in PHP sucks is just fine.

Resources

Thank you