我想用C#解析html頁面。有些html頁麵包含很多html標籤,以下是其中一個的示例:
<span class=text14 id="article_content"><!-- RELEVANTI_ARTICLE_START --><span ></b>The
most important component for <a
class=bluelink href="http://www.ynetnews.com/articles/0,7340,L-
3284752,00.html%20"' onmouseover='this.href=unescape(this.href)'
target=_blank>Israel</a>'s
security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the <a ...
但我只想把內容包含在<span class=text14 id="article_content">
標籤中。起初我曾考慮使用preg匹配,但後來意識到它根本沒有效率。我後來讀到了關於Html Agility Pack和FizzlerEx的內容 - 我想知道是否可以通過使用這些工具提到的特定標籤來包含文本,如果有人能告訴我如何知道我會很感激快速完成這項任務。
使用Html Agility Pack非常簡單:
var markup = @"<span class=text14 id=""article_content""><!-- RELEVANTI_ARTICLE_START --><span ></b>The most important component for <a class=bluelink href=""http://www.ynetnews.com/articles/0,7340,L-3284752,00.html%20""' onmouseover='this.href=unescape(this.href)' target=_blank>Israel</a>'s security is its special relations with the American administration, and especially with its generous purse. When the Netanyahu government launches a great outcry against the</span>";
var doc = new HtmlDocument();
doc.LoadHtml(markup);
var content = doc.GetElementbyId("article_content").InnerText;
Console.WriteLine(content);