Web Scraping with c# and HTMLAgilityPack

c# html-agility-pack web-scraping xpath

Question

Screenshot of the code and error message+variable values So, the goal is to take a word and get the part of speech of the word from its google definition.

I've tried a few different approaches but I'm getting a null reference error every time. Is my code failing to access the webpage? Is it a firewall issue, a logic issue, an {insert-issue-here} problem? I really wish i had a vague idea of what is wrong.

Thanks for your time.

Addendum: I've tried "//[@id=\"source - luna\"]//div" and "//[@id=\"source - luna\"]/div1" as XPath values.

//attempt 1////////////////////////////////////////////////////////////////////////
            var term = "Hello";
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.urbandictionary.com/define.php?term=" + term);
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            StreamReader stream = new StreamReader(response.GetResponseStream());
            string final_response = stream.ReadToEnd();

            MessageBox.Show(final_response); //doesn't execute

//attempt 2////////////////////////////////////////////////////////////////////////
            var url = "https://www.google.co.za/search?q=define+position";
            var content = new System.Net.WebClient().DownloadString(url);
            var webGet = new HtmlWeb();
            var doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(content);
     //doc is null at runtime
            HtmlNode ourNode = doc.DocumentNode.SelectSingleNode("//*[@id=\"uid_0\"]/div[1]/div/div[1]/div[2]/div[2]/div[1]/i/span");
            if (ourNode != null)
            {
                richTextBox1.AppendText(ourNode.InnerText);
            }
            else
                richTextBox1.AppendText("null");

//attempt 3////////////////////////////////////////////////////////////////////////
var webGet = new HtmlWeb();
            var doc = webGet.Load("https://www.google.co.za/search?q=define+position");
     //doc is null at runtime
            HtmlNode ourNode = doc.DocumentNode.SelectSingleNode("//*[@id=\"uid_0\"]/div[1]/div/div[1]/div[2]/div[2]/div[1]/i/span");
            if (ourNode != null)
            {
                richTextBox1.AppendText(ourNode.InnerText);
            }
            else
                richTextBox1.AppendText("null");

//attempt 4////////////////////////////////////////////////////////////////////////
string Url = "http://www.metacritic.com/game/pc/halo-spartan-assault";
            HtmlWeb web = new HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = web.Load(Url);
     //doc is null at runtime
            string metascore = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[1]/div[1]/div/div/div[2]/a/span[1]")[0].InnerText;
            string userscore = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[1]/div[2]/div[1]/div/div[2]/a/span[1]")[0].InnerText;
            string summary = doc.DocumentNode.SelectNodes("//*[@id=\"main\"]/div[3]/div/div[2]/div[2]/div[1]/ul/li/span[2]/span/span[1]")[0].InnerText;
            richTextBox1.AppendText(metascore + " " + userscore + " " + summary);

//attempt 5////////////////////////////////////////////////////////////////////////
             HtmlWeb web = new HtmlWeb();
             HtmlAgilityPack.HtmlDocument html = web.Load("https://www.google.co.za/search?q=define+position");
     //html is null
             var div = html.DocumentNode.SelectNodes("//*[@id=\"uid_0\"]/div[1]/div/div[1]/div[2]/div[2]/div[1]/i/span");
             richTextBox1.AppendText(Convert.ToString(div));

Popular Answer

You are getting null because your XPATHs aren't correct or it couldn't find any node based on those XPATHs. What are you trying to achieve here?



Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why