Is it possible to parse HTML fragments using Html Agility Pack?

.net c# html-agility-pack

Question

Can a portion of an HTML string be parsed using HTML Agility Pack?

like as

var fragment = "<b>Some code </b>";

then take out every<b> tags? Every single sample I've seen so far has loaded like an HTML page.

1
7
3/7/2019 3:59:37 PM

Accepted Answer

Yes, if it's html.

string str = "<b>Some code</b>";
// not sure if needed
string html = string.Format("<html><head></head><body>{0}</body></html>", str);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

// look xpath tutorials for how to select elements
// select 1st <b> element
HtmlNode bNode = doc.DocumentNode.SelectSingleNode("b[1]");
string boldText = bNode.InnerText;
9
3/29/2010 5:49:23 AM

Popular Answer

This isn't, in my opinion, the finest usage of HTMLAgilityPack.

In most cases, when I see people attempting to use regular expressions to parse vast quantities of HTML, I direct them to HtmlAgilityPack, but in this instance, I believe utilizing a regex would be preferable.

In a blog post, Roy Osherove explains how to remove all the HTML from a snippet:

Even if you used Mika Kolari's example and got the right xpath, the code would only function for snippets that had b> tags and would stop working if the code changed.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow