I'm trying to extract everything that resides within a div with a certain class/id name. I'm using the following code:
var webGet = new HtmlWeb();
var document = webGet.Load("http://www.4guysfromrolla.com/articles/011211-1.aspx");
var partOfWebpage = from completeWebpage in document.DocumentNode.Descendants("div")
where
completeWebpage.Attributes["class"].Value == "content" &&
completeWebpage.Attributes["class"].Value != null
select completeWebpage.InnerHtml;
foreach (var s in partOfWebpage)
{
textBox1.AppendText(s);
}
I'm recieving an "NullReferenceException was unhandled - Object reference not set to an instance of an object" error.
Apparently it doesn't find the div at all. When I put "table" instead of "div" in the Descendants() method everthing works fine and I am able to pick a table of my choice with the class/id definition.
What am I doing wrong?
Try checking whether the Attribute exists before dereferencing it:
from completeHomepage in document.DocumentNode.Descendants("div")
where completeHomepage.Attributes["class"] != null &&
completeHomepage.Attributes["class"].Value == "content" &&
completeHomepage.Attributes["class"].Value != null
select completeHomepage.InnerHtml;
You can use XPath to select the div by class (or by Id if you need to).
var results = document.DocumentNode.SelectNodes("//div[@class='content']");