Parse HTML With C#

c# html html-agility-pack windows-phone

Question

I'd like to parse html page using C#. There are html pages which contain a lot of html tags, here's a sample of one of them :

<span class=text14 id="article_content"><!-- RELEVANTI_ARTICLE_START --><span ></b>The 
     most important component for <a
     class=bluelink href="http://www.ynetnews.com/articles/0,7340,L-
     3284752,00.html%20"' onmouseover='this.href=unescape(this.href)' 
     target=_blank>Israel</a>'s
     security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the <a  ...

but I'd only like to get the content wrapped by the <span class=text14 id="article_content"> tag. At first I've thought about using preg match, but then realized it's not efficient at all. I've later read about Html Agility Pack and FizzlerEx - i'd like to know whether it's possible to get the text wrapped by the specific tag i've mentioned using these tools, and i'd be grateful if someone could tell me how fast this task could be performed.

Accepted Answer

It's pretty straight forward using Html Agility Pack:

var markup = @"<span class=text14 id=""article_content""><!-- RELEVANTI_ARTICLE_START --><span ></b>The most important component for <a class=bluelink href=""http://www.ynetnews.com/articles/0,7340,L-3284752,00.html%20""' onmouseover='this.href=unescape(this.href)' target=_blank>Israel</a>'s security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the</span>";

var doc = new HtmlDocument();
doc.LoadHtml(markup);

var content = doc.GetElementbyId("article_content").InnerText;

Console.WriteLine(content);



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why