Get specific data from a webpage with HTMLAgilityPack

c# html-agility-pack xpath

Question

I've been attempting to use the HTML Agility Pack in C# to get data from a website. I have successfully retrieved data from other websites, however this one is returning aNullReferenceException and the only thing I can surmise is that it has to do with XPath.

Here is my code in an attempt to locate the words "Limbo Wand."

string url = "https://www.dofus.com/en/mmorpg/encyclopedia/weapons/180-limbo-wand";
HtmlWeb htmlWeb = new HtmlWeb();
HtmlDocument doc = htmlWeb.Load(url);

string weaponName = doc.DocumentNode.SelectNodes("/html/body/div[2]/div[2]/div/div/div/main/div[2]/div/div[2]/h1/text()")[0].InnerText; // <-- NullReferenceException here

Taking thetext() Even attempting to obtain the text from in my XPath code doesn't work./html/head/title fails to function.

What's wrong with my XPath, exactly? Or is HTML Agility Pack unable to correctly utilise the site because of an issue with it?

Anybody who may be able to offer me some pointers, please let me know in advance.

1
0
11/7/2017 1:29:38 PM

Popular Answer

HtmlWeb is useless for obtaining a website's source code. mostly because it is unable at handling redirects. But I'm not certain that this situation's core issue is that. Instead, send a Web request. like this:

HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
try
{
    var request = (HttpWebRequest)WebRequest.Create("https://www.dofus.com/en/mmorpg/encyclopedia/weapons/180-limbo-wand");
    request.Method = "GET";

    using (var response = (HttpWebResponse)request.GetResponse())
    {
        using (var stream = response.GetResponseStream())
        {
            doc.Load(stream, Encoding.GetEncoding("iso-8859-9"));
        }
    }
}
catch (WebException ex)
{
    Console.WriteLine(ex.Message);
}

You now have an HTML document. Since there is just one title tag, you can simply get the title as follows:

Console.WriteLine(doc.DocumentNode.SelectNodes("/title")[0].InnerText);

Now, the easiest and most straightforward xpath to use to get the weapon name would be as follows:

Console.WriteLine(doc.DocumentNode.SelectSingleNode("//h1[@class='ak-return-link']").InnerText.Trim());

Just the whitespace at the beginning and end of the text is removed by the Trim() function at the end.

0
11/11/2017 2:59:52 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow