Is there an object in C# that allows for easy management of HTML DOM?

c# dom dom-manipulation html-agility-pack

Question

If I have a string that contains the html from a page I just got returned from an HTTP Post, how can I turn that into something that will let me easily traverse the DOM?

I figured HtmlDocument object would make sense, but it has no constructor. Are there any types that allow for easy management of HTML DOM?

Thanks,
Matt

Accepted Answer

The HtmlDocument is an instance of a document that is already loaded by a WebBrowser control. Thus no ctor.

Html Agility Pack is by far the best library I have used to this purpose

An example from the codeplex wiki

HtmlDocument doc = new HtmlDocument();
doc.Load("file.htm");
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]"))
{
    HtmlAttribute att = link["href"];
    att.Value = FixLink(att);
}
doc.Save("file.htm");

The example shows loading of a file but there are overloads that let you load a string or a stream. 




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why