C# HTML Agility Pack SelectSingleNode and SelectNodes XPath syntax

c# html-agility-pack web-scraping xpath

Question

My question is very similar to this one XmlNode.SelectSingleNode syntax to search within a node in C#

I'm trying to use HTML Agility Pack to pull price/condition/ship price... Here's the URL I am scraping: http://www.amazon.com/gp/offer-listing/0470108541/ref=dp_olp_used?ie=UTF8&condition=all

Here's a snippet of my code:


    string results = "";
    var w = new HtmlWeb();
    var doc = w.Load(url);
    var nodes = doc.DocumentNode.SelectNodes("//div[@class='a-row a-spacing-medium olpOffer']");

    if (nodes != null)
    {
         foreach (HtmlNode item in nodes)
         {
              var price = item.SelectSingleNode(".//span[@class='a-size-large a-color-price olpOfferPrice a-text-bold']").InnerText;
              var condition = item.SelectSingleNode(".//h3[@class='a-spacing-small olpCondition']").InnerText;
              var price_shipping = item.SelectSingleNode("//span[@class='olpShippingPrice']").InnerText;
              results += "price " + price + " condition " + condition + " ship " + price_shipping + "\r\n";
         }
    }
    return results;

No matter what combination I try of .// and . and ./ and / etc... I cannot get what I want (just now trying to learn xpaths), also currently it is returning just the 1st item over and over and over, just like the original question I referenced earlier. I think I'm missing a fundamental understanding of how selecting nodes work and/or what is considered a node.


UPDATE


Ok, I've changed the URL to point to a different book and the first two items are working as expected... When I try to change the third item (price_shipping) to a ".//" Absolutely no information is being pulled from anything. This must be due to sometime there is not even a shipping price and that span is omitted. How do I handle this? I tried if price_shipping !=null.


UPDATE


Solved. I removed the ".InnerText" from the price_shipping that causing issues when it was null... then I did the null check and Then it was safe to use .InnerText.

Popular Answer

Solved. I removed the ".InnerText" from the price_shipping that causing issues when it was null... then I did the null check and Then it was safe to use .InnerText.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why