Character Set & Unicode Tools and Conversions
Convert character to number (Unicode code point)
This tool shows Unicode details about any character (letter), including decimal/hex code point and HTML/URL encode syntax.
Convert number to character
Unicode and UTF-8
Unicode is a standard encoding system for computers to display text and symbols from all writing systems around the world. There are several Unicode encodings: the most popular is UTF-8, other examples are UTF-16 and UTF-7. UTF-8 uses a variable-length character encoding, and all basic Latin character codes are identical to ASCII. On the Unicode website you can read the following definition for Unicode: Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. more ...
Setting a charset in programming, servers & other files
Jump to (A-Z):
Looking to convert to/from UTF-8? View the conversion routines
Apache .htaccess file
You can use .htaccess to set a default character set for all your documents. Apache's default character set is ISO-8859-1. Apache will use this character set in the HTTP header it sends back to the browser after a request.
To set a default charset for your whole site add the following code to your .htaccess file:
To serve just your .html documents as UTF-8 add the following line:
AddCharset UTF-8 .html
AddType 'text/html; charset=UTF-8' html
AddCharset specifies just the charset, AddType specifies both MIME-type and charset in one line.
You can also limit with Files, FilesMatch, Directory etc.
<FilesMatch "\.(htm|html|css|js)$"> ForceType 'text/html; charset=UTF-8' </FilesMatch> <Files "index.php"> ForceType 'text/html; charset=UTF-8' </Files>
You can also create a new extension (index.utf8 is served as an Unicode UTF-8 document, index.html is ISO-8859-1):
AddCharset UTF-8 .utf8
Use the header function to send a HTTP header:
header("Content-Type: text/html; charset=UTF-8");
You must use this function before any output is sent to the browser. more ...
In your source code set the character set:
#!/usr/bin/env python # -*- coding: utf-8 -*-
In Python 3 UTF-8 is the default character set.
Set a META tag, there is a short version (introduced in HTML5) and a long version (also compatible with earlier HTML versions, like XHTML):
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
You can use the short version unless you are targeting old browsers like IE6/IE7. Both versions will work in HTML5. The long version will overrule the short one and HTTP headers will overrule both.
Add the META tag in the <head>-section of your HTML document. Browsers might ignore this statement if your document has a BOM-header (see below).
In the first line of the XML document:
<?xml version='1.0' encoding='utf-8'?>
The BOM-header or Byte Order Mark is a U+FEFF ("zero-width no-break space", EF BB BF in hex, 239 187 191 in decimal), is a mark that is saved at the beginning of a text-document to tell editors, browsers and other programs that the text file is UTF-8 encoded (or UTF-16, 32). Many editors will automatically add a BOM-header once you specify that the encoding is UTF-8. Some editors also have alternatives for the BOM-header, for example "UTF-8 Cookie", where the editor remembers that the document is UTF-8 by setting a cookie on your system.
BOM-headers might give problems with some scripting languages such as PHP (you will see some strange characters -the BOM header- flashing for a fraction of a second before a page is loaded).