Parsing web page with HtmlAgilityPack and simulate a click

c# html-agility-pack

Question

I am scraping a certain web page using HAP, and I want to access the submit button on the page but the problem is I don't know how it could be done in HAP and C#, is there a way I could do this?

Accepted Answer

The HTML Agility Pack is not a browser, so while it can parse an HTML file, there is no way to really interact with it. You can find the submit object, read its properties and so forth, but you can't make it do anything.

You have two options:

  • Either read the form, build a Http Request object that matches the forms fields and post method and send it to the server. This is all manual work. The Agility Pack only helps you list the fields on the form and their properties

  • If you need to interact with the page you'll need a browser. There are headless browsers, like PhantomJS, that will actually load the page, parse the Javascript and run what's sent by the server. There are wrappers around those wrappers for C#, one of such examples is Awesonium. It's similar to the HTML Agility Pack in that it allows you to parse HTML documents, but it takes it one step further, actually running it without ever showing a browser screen.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why