I'm trying to scrape the "Today's featured article" on Wikipedia by getting the XPATH value using firebug.
And then pasting it into my code:
string result = wc.DownloadString("http://en.wikipedia.org/wiki/Main_Page"); HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); doc.LoadHtml(result); var featuredArticle = doc.DocumentNode.SelectSingleNode("/html/body/div/div/div/table/tbody/tr/td/table/tbody/tr/td/div/p");
However, featuredArticle always returns null. What am I doing wrong?
Because what Firebug shows the XPath like Firefox made the Html, that may or may not be what the Html from the server is. Also, the Path from Firebug is absolute, and every little change can break it.
And easier way is to just look at the Html, the p-Tag you are looking for is in a div with the id
mp-tfa, so it's easier to make the XPath look for the div and the just get the first p inside.
var wc = new WebClient(); var doc = new HtmlDocument(); doc.Load(wc.OpenRead("http://en.wikipedia.org/wiki/Main_Page")); var featuredArticle = doc.DocumentNode.SelectSingleNode("//div[@id='mp-tfa']/p"); Console.WriteLine(featuredArticle.InnerText);
The best place to learn how to use XPath is w3schools.com.
Or you could use Linq, though i feel XPath is a bit more clear.
var featuredArticle= doc.DocumentNode.Descendants("div") .First(n => n.Id == "mp-tfa") .Descendants("p").FirstOrDefault();