PDA

View Full Version : Scattered "Â" Characters in My WP Pages



skeezix
07-15-2008, 01:48 PM
All of my WordPress Pages and posts contain  characters scattered throughout. They usually occur between a word and its following space. I didn't put them there.

I am using IE6 and WP 2.5.1. These characters are NOT in the mySQL DB, so I assume the browser is putting them in??

The characters do not display, nor do they print if I print a page. I only see them when I do a View Source in IE, which BTW opens in HTML-Kit.

The question is, why do they not display, since they are legitimate characters (Alt+0194) and what are they doing there??

wysiwyg
07-16-2008, 03:30 AM
Briefly, UTF-8 uses two bytes to represent ascii characters after 128. Any character between 128 and 191 start with the byte 0xC2. A space character (ascii 160) is 0xC2,0xA0. This is important because in ISO-8859-1, which is what your browser is interpreting the page as, a space is represented by 0xA0. 0xC2 in ISO-8859-1 happens to be "Â", so when the page is presented in UTF-8 but read as ISO-8859-1 we get "Â " instead of just a space.

There is a documented bug in internet explorer (http://support.microsoft.com/default.aspx/kb/928847) that causes it to ignore explicitly declared character sets and just use whatever it feels like (ISO-8859-1) if the meta tag is not located in the fist 256 bytes of the page. If your content-type meta tag is not the first thing in your head tag then move it there. If your content-type isn't being declared as UTF-8 then it either needs to be changed or have the page generated in the corresponding charset.

skeezix
07-16-2008, 06:22 AM
Thank you for your explanation. Odd that not every space is preceded by the  though. Can't find any particular pattern either, just appears to be random, and only in the text fields I create directly in the <body>.

The first line after <head> is:


<meta http-equiv="Content-Type" content=text/html; charset=UTF-8" />

so I guess there's not much more I can do. (I assume "UTF" equals "utf".)

The pages contain other random instances of extended character set characters too, but fortunately the browser does not display them, at least not IE 6.