HTML Agility Pack issue finding divs

c# html-agility-pack

Question

I'm trying to extract everything that resides within a div with a certain class/id name. I'm using the following code:

var webGet = new HtmlWeb();
var document = webGet.Load("http://www.4guysfromrolla.com/articles/011211-1.aspx");

var partOfWebpage = from completeWebpage in document.DocumentNode.Descendants("div")
                             where
                                 completeWebpage.Attributes["class"].Value == "content" &&
                                 completeWebpage.Attributes["class"].Value != null
                             select completeWebpage.InnerHtml;

foreach (var s in partOfWebpage)
{
    textBox1.AppendText(s);
}

I'm recieving an "NullReferenceException was unhandled - Object reference not set to an instance of an object" error.

Apparently it doesn't find the div at all. When I put "table" instead of "div" in the Descendants() method everthing works fine and I am able to pick a table of my choice with the class/id definition.

What am I doing wrong?


Accepted Answer

Try checking whether the Attribute exists before dereferencing it:

from completeHomepage in document.DocumentNode.Descendants("div")
where completeHomepage.Attributes["class"] != null &&
      completeHomepage.Attributes["class"].Value == "content" &&
      completeHomepage.Attributes["class"].Value != null
select completeHomepage.InnerHtml;

Popular Answer

You can use XPath to select the div by class (or by Id if you need to).

var results = document.DocumentNode.SelectNodes("//div[@class='content']");



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why