Using HTML Agility Pack to solve an XPath query problem

c# html-agility-pack xpath

Question

I'm trying to scrape the price field from this website using the HTML Agility Pack.

My code is as follows;

var web = new HtmlWeb();
var doc = web.Load(String.Format(overClockersURL, componentID));
var priceContent = doc.DocumentNode.SelectSingleNode("//*[@id=\"prodprice\"]");

I obtained the XPath query by using Firebug's "Copy as XPath" feature.

The problem I'm having is that SelectSingleNode is returning null - it doesn't seem to find the element specified by the query. I'm a bit stumped as to why, but I don't have much experience with XPath, so would appreciate some pointers as to what I've done wrong.

1
2
5/12/2011 3:43:22 PM

Accepted Answer

When that happens, you should check if the page is being loaded correctly (you said you're through a HTTP Proxy?)

Try writing the content of doc.DocumentNode.OuterHtml to a text file so you can see if the page is being loaded correctly. Maybe you're getting an error page instead of the original page.

3
5/18/2011 1:08:27 PM

Popular Answer

If I run this code:

    var web = new HtmlWeb();
    var doc = web.Load("http://www.overclockers.co.uk/showproduct.php?prodid=GX-033-HS");
    var priceContent = doc.DocumentNode.SelectSingleNode("//*[@id=\"prodprice\"]");
    Console.WriteLine("price=" + priceContent.InnerHtml);

It outputs:

price=529.99

So it seems to be working. You can also use //span[@id=\"prodprice\"]" which is better as it avoids all non SPAN tags.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow