Programming: Converting Latin to Unicode (UTF-8)
Converting from Latin to UTF-8 (and back) in your code
Quick jump A-Z:
And to convert back from UTF-8 to ISO-8859-1 (PHP.net):
If you need to convert to/from other character sets look at iconv.
Make sure not to save your PHP files using a BOM (Byte-Order Marker) UTF-8 file marker (your browser might show these BOM characters between PHP pages on your site).
In older PHP versions: Some native PHP functions such as strtolower(), strtoupper() and ucfirst() might not function correctly with UTF-8 strings. Possible solutions: convert to Latin first or add the following line to your code:
use Encode qw( from_to is_utf8 ); from_to($data, "iso-8859-1", "utf8");
You can use to following routine to to check if a string is valid UTF-8 (more)
To encode in UTF-8:
source_encoding = "iso-8859-1" string = "Names with international characters like 'Andrée'" string = string.encode(source_encoding) string = unicode(string, 'utf-8')
To decode back to locale character set:
In Python 3 UTF-8 is the default character set.
In C-Sharp use System.Text:
byte utf8Bytes = Encoding.UTF8.GetBytes("ASCII to UTF8"); byte isoBytes = Encoding.Convert(Encoding.ASCII, Encoding.UTF8, utf8Bytes); string uf8converted = Encoding.UTF8.GetString(isoBytes);
MySQL uses character sets on all levels, there are settings like: character_set_connection and collation_connection, and you can specify a character set at the database level, the table level and field level. To convert a character set inside a MySQL query use convert:
SELECT CONVERT(latin1field USING utf8)
If you are experiencing speed issues with table joins after converting character sets of tables or fields make sure that all ID fields use the same COLLATE setting . More information.
To avoid character set problems it is sometimes easier to convert your special characters to (plain ASCII) HTML code (especially if you are editing HTML-files manually).
Use our HTML special character converter.
Use the iconv character set conversion tool (more):
iconv -f ISO-8859-1 -t UTF-8 filename.txt
Most good text-editors offer Unicode support, such as UltraEdit (File → Conversions → 'ASCII to UTF-8' or 'ASCII to Unicode (16-Bit)').
Thanks to software developers who sent me corrections and updates!