I have a web scraper developed using C#, windows forms and the HTML Agility Pack.
I had it all working great when the site changed it's code and broke it. I know it happens often with web scrapers but now I am having trouble figuring out how to correct the issue.
At this time my scraper loops thru multiple URL's and scrapes data from each page.
The problem I am running into is that the template of the site it loops thru will randomly show the newer template which does not have the same HTML classes and ID's that I have defined in the program. What I am trying to do is run a simple if that checks if a single node if null and if it is runs a separate set of code for the new template.
The problem I am having is that my program throws a NullReferenceException on my if statement.
Here is the statement I am using to check if it is null:
var varitem = doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']").InnerText; if (varitem == null) MessageBox.Show("no titles");
It throws the exception at the first line defining the varitem and doesn't even make it to the if statement.
Any advise appreciated!
First you should check whether
If it is null you'll get the the
var varitem = doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']");
SelectSingleNode can return null and also you better check whether
InnerText also not null or empty as well
if (varitem == null || string.IsNullOrEmpty(varitem.InnerText)) MessageBox.Show("no titles");