Setting a charset in an Apache .htaccess file

You can use .htaccess to set a default character set for all your documents. Apache's default character set is ISO-8859-1. Apache will use this character set in the HTTP header it sends back to the browser after a request.

To set a default charset for your whole site add the following code to your .htaccess file: AddDefaultCharset UTF-8

To serve just your .html documents as UTF-8 add the following line: AddCharset UTF-8 .html or: AddType 'text/html; charset=UTF-8' html AddCharset specifes just the charset, AddType specifies both MIME-type and charset in one line.

You can also limit with Files, FilesMatch, Directory etc. <FilesMatch "\.(htm|html|css|js)$">
ForceType 'text/html; charset=UTF-8'
</FilesMatch>
<Files "index.php">
ForceType 'text/html; charset=UTF-8'
</Files>

You can also create a new extension (index.utf8 is served as an Unicode UTF-8 document, index.html is ISO-8859-1): AddCharset UTF-8 .utf8

Please note that forcing scripting languages (such as PHP) to use another charset does not always work (for example when the content-type is defined in the PHP or PHP is run in CGI mode). See below how to fix your scripts.

Setting a character set in programming languages

Jump to:

PHP

Use the header function to send a HTTP header: header("Content-Type: text/html; charset=UTF-8"); You must use this function before any output is sent to the browser! More information here.

HTML

Set the Content-Type META tag: <meta http-equiv="content-type" content="text/html; charset=UTF-8"> Add this META tag in the <head>-section of your HTML document. Some scripts (such as Google Cached pages) will ignore this metatag if you have specified a charset in your Apache configuration (or .htaccess). Browsers might also ignore this statement if your document has a BOM-header (see bolow).

XML

In the first line of the XML document: <?xml version='1.0' encoding='utf-8'?>

BOM-mark

The BOM-header or Byte Order Mark is a U+FEFF ("zero-width no-break space", EF BB BF in hex, 239 187 191 in decimal), is a mark that is saved at the beginning of a text-document to tell editors, browsers and other programs that the text file is UTF-8 encoded (or UTF-16, 32). Many editors will automatically add a BOM-header once you specify that the encoding is UTF-8. Some editors also have alternatives for the BOM-header, for example "UTF-8 Cookie", where the editor remembers that the document is UTF-8 by setting a cookie on your system.

BOM-headers might give problems with some scripting languages such as PHP (you will see some strange characters -the BOM header- flashing for a fraction of a second before a page is loaded).

Converting between charsets

Find this info on my other site Unicode Tools.