I have a web scraper developed using C#, windows forms and the HTML Agility Pack.
I had it all working great when the site changed it's code and broke it. I know it happens often with web scrapers but now I am having trouble figuring out how to correct the issue.
At this time my scraper loops thru multiple URL's and scrapes data from each page.
The problem I am running into is that the template of the site it loops thru will randomly show the newer template which does not have the same HTML classes and ID's that I have defined in the program. What I am trying to do is run a simple if that checks if a single node if null and if it is runs a separate set of code for the new template.
The problem I am having is that my program throws a NullReferenceException on my if statement.
Here is the statement I am using to check if it is null:
var varitem = doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']").InnerText;
if (varitem == null) MessageBox.Show("no titles");
It throws the exception at the first line defining the varitem and doesn't even make it to the if statement.
Any advise appreciated!
First you should check whether
doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']")
returns null.
If it is null you'll get the the NullReferenceException
from null.InnerText
try below
var varitem = doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']");
SelectSingleNode
can return null and also you better check whether InnerText
also not null or empty as well
if (varitem == null || string.IsNullOrEmpty(varitem.InnerText))
MessageBox.Show("no titles");