HtmlAgilityPack - How to set custom encoding when loading pages

c# encoding html-agility-pack load wpf

Question

Is it possible to set custom encoding when loading pages with the method below?

HtmlWeb hwWeb = new HtmlWeb();
HtmlDocument hd = hwWeb.load("myurl");

I want to set encoding to "iso-8859-9".

I use C# 4.0 and WPF.

Edit: The question has been answered on MSDN.

Accepted Answer

I suppose you could try overriding the encoding in the HtmlWeb object.

Try this:

var web = new HtmlWeb
{
    AutoDetectEncoding = false,
    OverrideEncoding = myEncoding,
};
var doc = web.Load(myUrl);

Note: It appears that the OverrideEncoding property was added to HTML agility pack in revision 76610 so it is not available in the current release v1.4 (66017). The next best thing to do would be to read the page manually with the encodings overridden.


Popular Answer

var document = new HtmlDocument();

using (var client = new WebClient())
{
    using (var stream = client.OpenRead(url))
    {
        var reader = new StreamReader(stream, Encoding.GetEncoding("iso-8859-9"));
        var html = reader.ReadToEnd();
        document.LoadHtml(html);
    }
}

This is a simple version of the solution answered here (for some reasons it got deleted)




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why