HTML Agility Pack issue finding divs

c# html-agility-pack

Question

I'm attempting to extract all of the content from a div that has a certain class or ID name. I'm use the code below:

var webGet = new HtmlWeb();
var document = webGet.Load("http://www.4guysfromrolla.com/articles/011211-1.aspx");

var partOfWebpage = from completeWebpage in document.DocumentNode.Descendants("div")
                             where
                                 completeWebpage.Attributes["class"].Value == "content" &&
                                 completeWebpage.Attributes["class"].Value != null
                             select completeWebpage.InnerHtml;

foreach (var s in partOfWebpage)
{
    textBox1.AppendText(s);
}

The error code is "Object reference not assigned to an instance of an object" NullReferenceException was not handled..

Evidently, it finds nothing at all about the div. Everything works as intended and I am able to choose a table of my choosing with the class/id definition when I substitute "table" for "div" in the Descendants() function.

Why am I misusing this?


1
1
6/12/2012 10:43:51 PM

Accepted Answer

Before dereferencing an attribute, try checking to see whether it already exists:

from completeHomepage in document.DocumentNode.Descendants("div")
where completeHomepage.Attributes["class"] != null &&
      completeHomepage.Attributes["class"].Value == "content" &&
      completeHomepage.Attributes["class"].Value != null
select completeHomepage.InnerHtml;
1
6/12/2012 10:19:26 PM

Popular Answer

To choose the div by class, use XPath (or by Id if you need to).

var results = document.DocumentNode.SelectNodes("//div[@class='content']");


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow