HTML Scraping with HTML Agility Pack

ajax c# html-agility-pack web-scraping

Question

Can some one tell me the best way to get the contents using HTMLAgilityPack that i mention below from the html.

In the HTML provided i need to scrape value of the ID "img" and set the values for x and y for them to be used in another function.

The relevant HTML is

<div id="values">
<input type="hidden" id="x" name="x" value='0' />
<input type="hidden" id="y" name="y" value='0' />
<input type="hidden" id="img" name="img" value="86932" />
<input type="hidden" id="source" name = "source" value="center" />

These values are being sent to the function in the javascript displayed below

submitClick(document.getElementById("img").getAttribute("value"), 
              document.getElementById("x").getAttribute("value"), 
              document.getElementById("y").getAttribute("value"), 
              'tiled'  );

Can some body help me out by telling how i should proceed ...

I have written the following code that gets me the html data for the page

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);
request.Method = "GET";
using (var stream = request.GetResponse().GetResponseStream())
using (var reader = new StreamReader(stream, Encoding.UTF8))
{
    result = reader.ReadToEnd();
}
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(new StringReader(result));
HtmlNode root = doc.DocumentNode;

Now that i have got the root how should i search for the parameters and then send them by GET.

Accepted Answer

Picking up where you left off in your sample code above you could just grab the values like this

string imgValue = doc.DocumentNode.SelectSingleNode("//input[@id = \"img\"]").GetAttributeValue("value", "0");
string xValue = doc.DocumentNode.SelectSingleNode("//input[@id = \"x\"]").GetAttributeValue("value", "0");
string yValue = doc.DocumentNode.SelectSingleNode("//input[@id = \"y\"]").GetAttributeValue("value", "0");

Where the first example above is basically saying find the first node of type "input" that has the "id" attribute that equals "img" and get me the value of it's "value" attribute.

And then just append to the dest URL and send the Get Request as you did to get the initial HTML.


Popular Answer

I wouldn't use the Html Agility Pack for this because I don't know how to make it feed back to the originating website. Instead, I'd use WatiN. WatiN is built for driving a browser for testing purposes, but I've found it extremely useful when I have to scrape websites that are outside my control (such as Facebook or Wal-Mart). Downside is that it is driving an actual browser window so it's not something you hide from a user. Upside is that you can easily simulate mouse clicks and form field text entries.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why