I have to go get
In a tool I'm developing, I take user controls, master pages, and ASP.NET page components, collect their contents, and then put back changed values to these files.
I could attempt to use regular expressions to choose just these components, but there are a number of problems with that strategy:
LINKIE conditional comments' elements
I looked at HTML parsers for.NET and saw that several SO articles and blogs suggest the Agility Pack for HTML. It may be able to parse broken HTML and HTML fragments, but I haven't used it previously to find out. (For instance, consider a user control that has merely a
content-containing element - no
Although I could study the material, it would save me a lot of time if someone could provide some guidance. (Parsing whole HTML pages is a common task for SO postings.)
Yes, that is what it is best at.
In reality, a lot of the websites you'll see in the wild may be defined as HTML fragments since they lack essential
tags, or incorrectly shut tags.
The HtmlAgilityPack replicates what the browser must do, which is to attempt to make sense out of what is sometimes a disorganized mess of tags. Although it's an imperfect science, HtmlAgilgityPack excels at it.
I am the main creator of the C# jQuery version known as CsQuery, which serves as a substitute for Html Agility Pack. For many folks, it's simpler to utilize CSS selectors and the complete Query API to access and modify the DOM than XPATH. Additionally, it has an HTML parser that was created with a number of objectives in mind and offers a number of choices for parsing HTML: as a full document (missing
Any orphaned text will be relocated within the body, and tags will be appended; as a content block (i.e., it won't be wrapped as a whole document, but optional tags such
immediately added to the DOM, much as browsers do, and as a genuine fragment where no tags are formed (for instance, if you're just using building blocks).
Details may be found in making a fresh DOM.
The HTML parser in CsQuery has also been developed to adhere to the HTML5 specification for optional closing tags. For instance, finishing
Although tags are not required, there are rules that specify when the block should be closed. The parser must use the same principles in order to generate the same DOM that a browser does. CsQuery accomplishes this to provide a high level of DOM compatibility for a certain source.
It is really simple to use CsQuery, for instance.
CQ docFromString = CQ.Create(htmlString); CQ docFromWeb = CQ.CreateFromUrl(someUrl); // there are other methods for asynchronous web gets, creating from files, streams, etc. // css selector: the indexer  is like jQuery $(..) CQ lastCellInFirstRow = docFromString["table tr:first-child td:last-child"]; // Text() is a jQuery method returning text contents of selection string textOfCell = lastCellInFirstRow.Text();
Finally, compared to HTML Agility Pack, selectors are very quick because to CsQuery's indexing of documents based on class, id, attribute, and tag.