HtmlAgilityPack file loading

c# html-agility-pack

Question

I have a problem, when I trying to load file from filesystem. Issue that in value of some HTML control I have less than sign "<" inside span value

HtmlDocument doc = new HtmlDocument();
doc.OptionReadEncoding = true;

//StreamReader str = new StreamReader(fileName, Encoding.UTF8);
StreamReader str = new StreamReader(@"E:\HTMLS\OEL\1030,1.html",Encoding.UTF8,true);

doc.Load(str.BaseStream, Encoding.ASCII);
//string streamString = str.ReadToEnd().
str.Close();
//all nodes

doc.DocumentNode.Descendants().Where(x => x.Name == "#text" && (x.InnerText == "\r\n\t" || x.InnerText == "\r\n" || x.InnerText == "\r\n\t\t")).ToList().ForEach(x => x.Remove());
List<HtmlNode> listHtmlNode = doc.DocumentNode.Descendants("table").ToList();

Popular Answer

You shouldn't have symbols such as < as content in your HTML. Having them in your html makes the html invalid and will cause the HTMLAgility pack to not perform correctly.

If you need them in your html you need to encode them. < becomes %lt; see here http://www.w3schools.com/html/html_entities.asp




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why