When you download XML text from the Web, you may find “garbage characters” in the start of your XML string. For example, I encountered this result when I downloaded an XML string using WebClient.DownloadString method:
<Root><Item>Hello, World</Item></Root>
What you are likely seeing is a Byte Order Mark (BOM), which is a Unicode character that indicates the endian-ness (byte order) of a text file or stream. The BOM is optional and will appear at the start of the text stream, if at all. The BOM may also indicate in which of the several Unicode representations the text is encoded.
The most common BOMs you may see are:
 = EF BB BF in hex = UTF-8
þÿ
= ASCII code 65279 (Zero Width No-Break Space) = FE FF in hex = UTF-16 (Big Endian)
ÿþ
= FF FE in hex = UTF-16 (Little Endian)
□□þÿ
= 00 00 FE FF in hex = UTF-32 (Big Endian)
ÿþ□□
= FF FE 00 00 in hex = UTF-32 (Little Endian)
If you try to parse an XML string with a BOM using an XmlTextReader, for example, you will see an error message such as:
Data at the root level is invalid. Line 1, position 1.
Here is some simple code to strip the BOM from an XML string:
int index = xml.IndexOf( '<' ); if (index > 0) xml = xml.Substring( index, xml.Length - index );
Could you provide some sample code that fails? I was able to process xml having BOM using XmlTextReader (.Net 2.0).
Ah, nice! Very ingenious way of “stripping” out the BOM character! Works great – thanks for that!
Good snippet for avoiding un wanted Characated. But looking fwd to find some Dynamic code to do the same. Lets say by searching that unwanted character with the ASCII Code (65279) and replace those values.
You single-handedly stopped a furious 3-hour debugging session. Mad props for this post. I will never forget ASCII 65279 as long as I live.