Can't get XPATH working with Html Agility Pack

.net c# html-agility-pack xpath

Question

I'm attempting to use Firebug to get the XPATH value in order to scrape Wikipedia's "Today's highlighted article."

enter image description here

after which I pasted it into my code:

string result = wc.DownloadString("http://en.wikipedia.org/wiki/Main_Page");

            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

            doc.LoadHtml(result);

            var featuredArticle = doc.DocumentNode.SelectSingleNode("/html/body/div[3]/div[3]/div[4]/table[2]/tbody/tr/td/table/tbody/tr[2]/td/div/p");

But featuredArticle always gives back null. Why am I misusing this?

1
1
8/8/2012 7:08:29 PM

Popular Answer

Because Firebug displays the XPath as if Firefox created the HTML, this may or may not be the same as what the server-generated HTML is. Also, since the Path from Firebug is absolute, even the smallest modification may cause it to fail.

The p-Tag you're searching for is in a div with the id, thus a simpler method is to just look at the HTML.mp-tfa , making it simpler to use XPath to find the div and then just grab the first p within.

akin to this

var wc = new WebClient();
var doc = new HtmlDocument();
doc.Load(wc.OpenRead("http://en.wikipedia.org/wiki/Main_Page"));
var featuredArticle = doc.DocumentNode.SelectSingleNode("//div[@id='mp-tfa']/p");
Console.WriteLine(featuredArticle.InnerText);

w3schools.com is the finest resource for learning how to use XPath.

Linq is another option, however I think XPath is a little more clearer.

var featuredArticle=   doc.DocumentNode.Descendants("div")
 .First(n => n.Id == "mp-tfa")
 .Descendants("p").FirstOrDefault();
2
8/11/2014 8:44:05 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow