How do I access the content of multiple
tags with HTMLAgilityPack?

c# html html-agility-pack wpf

Question

I am unable to find the documentation for the HTMLAgilityPack on the codeplex website. Currently what I want to do is access a div on the Amazon website, and scrape text information for use in a WPF application.

var getWeb = new HtmlWeb();                     
var doc = getWeb.Load(uri);
HtmlNode ourNode = doc.DocumentNode.SelectSingleNode("//div[@id = 'zg_centerListWrapper']");

This div contains about 12 other divs, each one is an item in the best sellers category.

In order to access the properties of each one would appear to be painstaking (and I'm also not entirely sure how I'd do it on first glance). So should I instead use DocumentNode.SelectNodes()? And how would I implement it? Also I find it hard to believe that after such a time there isn't documentation for the HTMLAgilityPack... Maybe I'm looking in the wrong places because youtube seems to be my best source at the moment.

Accepted Answer

Actually, parameter of SelectNodes() and SelectSingleNode() is an xpath expression, xpath version 1.0 to be precise (see xpath 1.0 spec here).

XPath is another technology with it's own specification, documentation, and discussion. You can generally search for xpath tutorials or articles instead of HtmlAgilityPack (HAP) specifics, to have a better idea of what kind of expression should you pass to HAP to get particular HTML elements.

For the sake of example, assume that your HTML looks like this :

<div id="zg_centerListWrapper">
    <div>I want this</div>
    <div>..and this</div>
    <div>..and this one too</div>
</div>

see that divs you're interested in are direct children of the div[@id = 'zg_centerListWrapper'], then you can use the following xpath to get them :

var xpath = "//div[@id = 'zg_centerListWrapper']/div";
HtmlNodeCollection ourNodes = doc.DocumentNode.SelectNodes(xpath);

Popular Answer

You can use DocumentNode.Descendants("div") and then something like

.Where(div => div.Attributes.Contains("class") && div.Attributes["class"].Value.Contains("best category"))

But yeah, documentation would certainly help..



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why