I am unable to find the documentation for the
HTMLAgilityPack on the codeplex website. Currently what I want to do is access a div on the Amazon website, and scrape text information for use in a WPF application.
var getWeb = new HtmlWeb(); var doc = getWeb.Load(uri); HtmlNode ourNode = doc.DocumentNode.SelectSingleNode("//div[@id = 'zg_centerListWrapper']");
This div contains about 12 other divs, each one is an item in the
best sellers category.
In order to access the properties of each one would appear to be painstaking (and I'm also not entirely sure how I'd do it on first glance). So should I instead use
DocumentNode.SelectNodes()? And how would I implement it? Also I find it hard to believe that after such a time there isn't documentation for the
HTMLAgilityPack... Maybe I'm looking in the wrong places because youtube seems to be my best source at the moment.
Actually, parameter of
SelectSingleNode() is an xpath expression, xpath version 1.0 to be precise (see xpath 1.0 spec here).
XPath is another technology with it's own specification, documentation, and discussion. You can generally search for xpath tutorials or articles instead of HtmlAgilityPack (HAP) specifics, to have a better idea of what kind of expression should you pass to HAP to get particular HTML elements.
For the sake of example, assume that your HTML looks like this :
<div id="zg_centerListWrapper"> <div>I want this</div> <div>..and this</div> <div>..and this one too</div> </div>
divs you're interested in are direct children of the
div[@id = 'zg_centerListWrapper'], then you can use the following xpath to get them :
var xpath = "//div[@id = 'zg_centerListWrapper']/div"; HtmlNodeCollection ourNodes = doc.DocumentNode.SelectNodes(xpath);
You can use
DocumentNode.Descendants("div") and then something like
.Where(div => div.Attributes.Contains("class") && div.Attributes["class"].Value.Contains("best category"))
But yeah, documentation would certainly help..