Can't get XPATH working with Html Agility Pack

.net c# html-agility-pack xpath

Question

I'm trying to scrape the "Today's featured article" on Wikipedia by getting the XPATH value using firebug.

enter image description here

And then pasting it into my code:

string result = wc.DownloadString("http://en.wikipedia.org/wiki/Main_Page");

            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            doc.LoadHtml(result);

            var featuredArticle = doc.DocumentNode.SelectSingleNode("/html/body/div[3]/div[3]/div[4]/table[2]/tbody/tr/td/table/tbody/tr[2]/td/div/p");

However, featuredArticle always returns null. What am I doing wrong?

Popular Answer

Because what Firebug shows the XPath like Firefox made the Html, that may or may not be what the Html from the server is. Also, the Path from Firebug is absolute, and every little change can break it.

And easier way is to just look at the Html, the p-Tag you are looking for is in a div with the id mp-tfa, so it's easier to make the XPath look for the div and the just get the first p inside.

Like this:

var wc = new WebClient();
var doc = new HtmlDocument();
doc.Load(wc.OpenRead("http://en.wikipedia.org/wiki/Main_Page"));
var featuredArticle = doc.DocumentNode.SelectSingleNode("//div[@id='mp-tfa']/p");
Console.WriteLine(featuredArticle.InnerText);

The best place to learn how to use XPath is w3schools.com.

Or you could use Linq, though i feel XPath is a bit more clear.

var featuredArticle=   doc.DocumentNode.Descendants("div")
 .First(n => n.Id == "mp-tfa")
 .Descendants("p").FirstOrDefault();



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why