How ï»¿ and 65279 and Other Byte Order Marks (BOM) Can Mess Up Your XML : C# 411

When you download XML text from the Web, you may find “garbage characters” in the start of your XML string. For example, I encountered this result when I downloaded an XML string using WebClient.DownloadString method:

ï»¿<Root><Item>Hello, World</Item></Root>

What you are likely seeing is a Byte Order Mark (BOM), which is a Unicode character that indicates the endian-ness (byte order) of a text file or stream. The BOM is optional and will appear at the start of the text stream, if at all. The BOM may also indicate in which of the several Unicode representations the text is encoded.

The most common BOMs you may see are:

ï»¿ = EF BB BF in hex = UTF-8

þÿ = ASCII code 65279 (Zero Width No-Break Space) = FE FF in hex = UTF-16 (Big Endian)

ÿþ = FF FE in hex = UTF-16 (Little Endian)

□□þÿ = 00 00 FE FF in hex = UTF-32 (Big Endian)

ÿþ□□ = FF FE 00 00 in hex = UTF-32 (Little Endian)

If you try to parse an XML string with a BOM using an XmlTextReader, for example, you will see an error message such as:

Data at the root level is invalid. Line 1, position 1.

Here is some simple code to strip the BOM from an XML string:

int index = xml.IndexOf( '<' );
if (index > 0)
    xml = xml.Substring( index, xml.Length - index );

Comments

4 Responses to “How ï»¿ and 65279 and Other Byte Order Marks (BOM) Can Mess Up Your XML”

Tibor on April 27th, 2010 at 4:33 am
Could you provide some sample code that fails? I was able to process xml having BOM using XmlTextReader (.Net 2.0).
Todd on May 20th, 2010 at 7:05 pm
Ah, nice! Very ingenious way of “stripping” out the BOM character! Works great – thanks for that!
Krish4.Net on November 9th, 2010 at 5:07 pm
Good snippet for avoiding un wanted Characated. But looking fwd to find some Dynamic code to do the same. Lets say by searching that unwanted character with the ASCII Code (65279) and replace those values.
Travis Wilson on August 9th, 2011 at 2:14 pm
You single-handedly stopped a furious 3-hour debugging session. Mad props for this post. I will never forget ASCII 65279 as long as I live.

Tibor on April 27th, 2010 at 4:33 am

Could you provide some sample code that fails? I was able to process xml having BOM using XmlTextReader (.Net 2.0).

Todd on May 20th, 2010 at 7:05 pm

Ah, nice! Very ingenious way of “stripping” out the BOM character! Works great – thanks for that!

Krish4.Net on November 9th, 2010 at 5:07 pm

Good snippet for avoiding un wanted Characated. But looking fwd to find some Dynamic code to do the same. Lets say by searching that unwanted character with the ASCII Code (65279) and replace those values.

Travis Wilson on August 9th, 2011 at 2:14 pm

You single-handedly stopped a furious 3-hour debugging session. Mad props for this post. I will never forget ASCII 65279 as long as I live.

C# 411

About C# 411

About C# 411

Subscribe C# 411

Links

Information

How ï»¿ and 65279 and Other Byte Order Marks (BOM) Can Mess Up Your XML

Comments

4 Responses to “How ï»¿ and 65279 and Other Byte Order Marks (BOM) Can Mess Up Your XML”

Leave a Reply

Recent Posts

Search C# 411

Featured Articles

Categories

Archives