I want to code some algorithm or parser which should get site position in google search results. The issue is every time google page layout will change I should correct/change the algorithm. How do you think guys is will really often change? Is there any techniques/advices/tricks about determining Google' site position?
How can I make robust position detection algorithm?
I want to use C#, .NET 2.0 and HtmlAgilityPack for that purpose. Any advices or proposes will be very appreciated. Thanks in advance, guys!
I know that google will show captcha to prevent machine queries. I got special service for that, that will recognise any captcha. Could you guys tell me about your experience in exact scraping results?
I asked about this a year ago and got some good answers. Definitely the Agility Pack is the way to go.
In the end we did code up a rough scraper which did the job and ran without any problems. We were hitting Google relatively lightly (about 25 queries per day). We took the precaution of randomising 1) the order and 2) time of day and 3) time paused between queries. I don't know if any of that helped, but we were never hit by a captcha.
We don't bother with it much now.
Its main weaknesses were/are:
we only bothered to check the first page (we perhaps could have coded an enhanced version which looked at the first X pages, but maybe that would be a higher risk - in terms of being detected by Google).
its results were unreliable and jumped around. You could be 8th every day for weeks, except for a single random day when you were 3rd. Perhaps ... the whole idea of carefully taking a daily or weekly reading and logging our ranking is too flawed
To answer your question about Google breaking your code: Google didn't make a fundamentally breaking change in all the months we ran it but they changed something which broke the "snapshot" we were saving of the result (maybe a CSS change?) which did nothing to improve the credibility of the results.