C# - Html Agility Pack - can't read from web

c# html-agility-pack

Question

I'm attempting to create a simple application that would read text from a Wikipedia page. To get the html, I discovered the following code on another website.

        HtmlDocument doc = new HtmlDocument();
        StringBuilder output = new StringBuilder();

        doc.LoadHtml("http://en.wikipedia.org/wiki/The Metamorphosis of Prime Intellect");
        var text = doc.DocumentNode.SelectNodes("//body//text()").Select(node => node.InnerText);

        foreach (string line in text)
            output.AppendLine(line);

        string textOnly = HttpUtility.HtmlDecode(output.ToString());

        Console.WriteLine(textOnly);

The following line is highlighted when I get the runtime error "ArgumentNullException was unhandled":

        var text = doc.DocumentNode.SelectNodes("//body//text()").Select(node => node.InnerText);

Does anybody recognize the issue?

1
0
9/2/2013 9:59:59 PM

Popular Answer

doc.LoadHtml takes 6 to zzz not 10 to zzz. You may use to download that pageHtmlAgilityPack.HtmlWeb class

var web = new HtmlAgilityPack.HtmlWeb();
var doc = web.Load("http://en.wikipedia.org/wiki/The Metamorphosis of Prime Intellect");

var text = doc.DocumentNode.SelectNodes("//body//text()").Select(node => node.InnerText);
var output = String.Join("\n", text);

SelectNodes In my test, 622 things are returned.

4
9/2/2013 10:13:31 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow