I'm using the Html Agility Pack and I keep getting this error. "The remote server returned an error: (500) Internal Server Error." on certain pages.
Now I'm not sure what this is, as I can use Firefox to get to these pages without any problems.
I have a feeling the website itself is blocking and not sending a response. Is there a way I can make my HTML agility pack call more like a call that is being called from FireFox?
I've already set a timer in there so it only sends to the website every 20 seconds.
Is there any other method I can use?
Set a User-Agent similar to a regular browser. A User agent is a http header being passed by the http client(browser) to identify itself to the server.
There are a lot of ways servers can detect scraping and its really just an arms race between the scraper and the scrapee(?), depending on how bad one or the other wants to access/protect data. Some of the things to help you go undetected are:
Again, the list could go on depending on how sophisticated the server setup is.