Parse HTML With C#

c# html html-agility-pack windows-phone

Question

I want to use C# to parse an HTML page. Here is an example of one of the many html tags seen in certain html pages:

<span class=text14 id="article_content"><!-- RELEVANTI_ARTICLE_START --><span ></b>The 
     most important component for <a
     class=bluelink href="http://www.ynetnews.com/articles/0,7340,L-
     3284752,00.html%20"' onmouseover='this.href=unescape(this.href)' 
     target=_blank>Israel</a>'s
     security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the <a  ...

nevertheless, I simply want to retrieve the material packaged by the<span class=text14 id="article_content"> tag. I first considered utilizing Preg Match, but I quickly discovered it wasn't very effective. Later, I read about Agility Pack for HTML and FizzlerEx; I'd want to know whether using these tools, I can obtain the text wrapped by the particular tag I've stated. I'd also appreciate knowing how quickly this operation might be completed.

1
3
11/27/2013 10:33:41 PM

Accepted Answer

Using Agility Pack for HTML is rather simple:

var markup = @"<span class=text14 id=""article_content""><!-- RELEVANTI_ARTICLE_START --><span ></b>The most important component for <a class=bluelink href=""http://www.ynetnews.com/articles/0,7340,L-3284752,00.html%20""' onmouseover='this.href=unescape(this.href)' target=_blank>Israel</a>'s security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the</span>";

var doc = new HtmlDocument();
doc.LoadHtml(markup);

var content = doc.GetElementbyId("article_content").InnerText;

Console.WriteLine(content);
5
11/27/2017 1:05:43 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow