Compsoft Flexible Specialists

Compsoft plc

Compsoft Weblog Compsoft Website News Archive Privacy Policy Contact Us  

Thursday, September 10, 2009

Save XDocument to string as UTF-8

I love LINQ to XML, I'll admit it. It has changed the way I work with XML forever, in a great way!

I did however run into my first issue with using it that I felt should have been simpler than it turned out to be. I had created my uber XDocument containing all the data in the world including an XDecleration saying UTF-8 on the encoding.

However, whenever I did a ToString() or Save to StringBuilder it would always come out with:

<?xml version="1.0" encoding="utf-16" standalone="yes"?>

Trying all sorts of combinations got me nowhere. The reason behind this is that strings internally to .net are stored as UTF-16. This meant the XDocument was always picking up the destination of the xml and encoding it appropriately.

In the end I had to go via a MemoryStream to trick XDocument into giving me the correct declaration:

// codesnippet:20B728DE-9E20-11DE-AFE0-B04D56D89593

private static string GetDocumentAsString(XDocument document)
{
   // Crazy way to get UTF8 encoding out to a string
    MemoryStream ms = new MemoryStream();
    using (XmlWriter xw = new XmlTextWriter(ms, Encoding.UTF8))
    {
        document.Save(xw);
        xw.Flush();

        StreamReader sr = new StreamReader(ms);
        ms.Seek(0, SeekOrigin.Begin);
       return sr.ReadToEnd();
    }
}

This now produces:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>