How do I access the content of multiple
tags with HTMLAgilityPack?

c# html html-agility-pack wpf

Question

I cannot locate the supporting documentation for theHTMLAgilityPack on the webpage for Codeplex. To access a div on the Amazon website and scrape text data for a WPF application, that is what I now want to achieve.

var getWeb = new HtmlWeb();                     
var doc = getWeb.Load(uri);
HtmlNode ourNode = doc.DocumentNode.SelectSingleNode("//div[@id = 'zg_centerListWrapper']");

About 12 additional divs, each of which is a component of thebest sellers category.

It would seem laborious to access the characteristics of each one (and I'm also not exactly sure how I'd accomplish it at first sight). So, should I replace withDocumentNode.SelectNodes() ? And how would I go about doing it? In addition, I find it difficult to imagine that there isn't documentation for theHTMLAgilityPack ... Considering that YouTube now seems to be my greatest source, maybe I'm searching in the wrong areas.

1
3
6/8/2015 7:16:10 AM

Accepted Answer

in fact, a parameter ofSelectNodes() and SelectSingleNode() is a expression for xpath, specifically an xpath version 1.0. (see this is the xpath 1.0 specification).

Another technique is XPath, which has its own specification, debate, and documentation. Instead of looking for HtmlAgilityPack (HAP) specific information, you should often seek for xpath tutorials or articles to obtain a better understanding of the kind of expression you should send to HAP to access certain HTML components.

Let's say your HTML looks like this for the purpose of illustration:

<div id="zg_centerListWrapper">
    <div>I want this</div>
    <div>..and this</div>
    <div>..and this one too</div>
</div>

view thatdiv s you're considering are any of the following:div[@id = 'zg_centerListWrapper'] , you may use the xpath shown below to get them:

var xpath = "//div[@id = 'zg_centerListWrapper']/div";
HtmlNodeCollection ourNodes = doc.DocumentNode.SelectNodes(xpath);
1
6/8/2015 1:21:06 AM

Popular Answer

You may utilizeDocumentNode.Descendants("div") then something like that

.Where(div => div.Attributes.Contains("class") && div.Attributes["class"].Value.Contains("best category"))

However, documentation would undoubtedly be helpful.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow