Is there an XmlReader equivalent for HTML in .Net?

.net html html-agility-pack parsing xmlreader

Question

I've used HtmlAgilityPack in the past to parse HTML in .Net but I don't like the fact that it only uses a DOM model.

On large documents and/or those with heavy levels of nesting it is possible to hit stack overflow or out of memory exceptions. Also in general a DOM based parsing model uses significantly more memory than a streaming based approach, typically because the process that wants to consume the HTML may only need a few elements to be available at a time.

Does anyone know of a decent HTML parser for .Net that allows you to parse HTML in a manner similar to the XmlReader class? i.e. in a forward only streaming manner

Popular Answer

I usually use SgmlReader for this: https://github.com/MindTouch/SGMLReader

Like others have said, there are issues in that HTML doesn't follow the same well-formed rules of XML, so it is inherently difficult to parse, but SgmlReader usually does a pretty good job.



Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow