Setting a charset in an Apache .htaccess file
You can use .htaccess to set a default character set for all your documents. Apache's default character set is ISO-8859-1. Apache will use this character set in the HTTP header it sends back to the browser after a request.
To set a default charset for your whole site add the following code to your .htaccess file:
AddDefaultCharset UTF-8
To serve just your .html documents as UTF-8 add the following line:
AddCharset UTF-8 .html or: AddType 'text/html; charset=UTF-8' html
AddCharset specifes just the charset, AddType specifies both MIME-type and charset in one line.
You can also limit with Files, FilesMatch, Directory etc.
<FilesMatch "\.(htm|html|css|js)$">
ForceType 'text/html; charset=UTF-8'
</FilesMatch>
<Files "index.php">
ForceType 'text/html; charset=UTF-8'
</Files>
You can also create a new extension (index.utf8 is served as an Unicode UTF-8 document, index.html is ISO-8859-1):
AddCharset UTF-8 .utf8
Please note that forcing scripting languages (such as PHP) to use another charset does not always work (for example when the content-type is defined in the PHP or PHP is run in CGI mode). See below how to fix your scripts.
Setting a character set in programming languages
Jump to:
PHP
Use the header function to send a HTTP header:
header("Content-Type: text/html; charset=UTF-8");
You must use this function before any output is sent to the browser!
More information here.
HTML
Set the Content-Type META tag:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
Add this META tag in the <head>-section of your HTML document. Some scripts (such as Google Cached pages) will ignore this metatag if you
have specified a charset in your Apache configuration (or .htaccess). Browsers might also ignore this statement if your document has a BOM-header (see bolow).
XML
In the first line of the XML document: <?xml version='1.0' encoding='utf-8'?>
BOM-mark
The BOM-header or Byte Order Mark is a U+FEFF ("zero-width no-break space", EF BB BF in hex, 239 187 191 in decimal), is a mark that is saved at the beginning of a text-document to tell editors, browsers and other programs that the text file is UTF-8 encoded (or UTF-16, 32). Many editors will automatically add a BOM-header once you specify that the encoding is UTF-8. Some editors also have alternatives for the BOM-header, for example "UTF-8 Cookie", where the editor remembers that the document is UTF-8 by setting a cookie on your system.
BOM-headers might give problems with some scripting languages such as PHP (you will see some strange characters -the BOM header- flashing for a fraction of a second before a page is loaded).
Converting between charsets
Find this info on my other site Unicode Tools.