C# HTML Agility Pack Single Select Node returning null

c# html-agility-pack web-scraping winforms

Question

I have a web scraper developed using C#, windows forms and the HTML Agility Pack.

I had it all working great when the site changed it's code and broke it. I know it happens often with web scrapers but now I am having trouble figuring out how to correct the issue.

At this time my scraper loops thru multiple URL's and scrapes data from each page.

The problem I am running into is that the template of the site it loops thru will randomly show the newer template which does not have the same HTML classes and ID's that I have defined in the program. What I am trying to do is run a simple if that checks if a single node if null and if it is runs a separate set of code for the new template.

The problem I am having is that my program throws a NullReferenceException on my if statement.

Here is the statement I am using to check if it is null:

var varitem = doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']").InnerText;

 if (varitem == null) MessageBox.Show("no titles");

It throws the exception at the first line defining the varitem and doesn't even make it to the if statement.

Any advise appreciated!

Accepted Answer

First you should check whether

 doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']")

returns null.

If it is null you'll get the the NullReferenceException from null.InnerText


Popular Answer

try below

var varitem = doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']");

SelectSingleNode can return null and also you better check whether InnerText also not null or empty as well

if (varitem == null || string.IsNullOrEmpty(varitem.InnerText))
              MessageBox.Show("no titles");


Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow