Extract a value out of html using HtmlAgilityPack

c# html-agility-pack xpath

Question

I'm new to c# and htmlagilitypack and have been attempting to get the value of the 2079787163 signup form id.

<form name="setupform" id="setupform" method="post" action="/signup/" target="_top">
<input type="hidden" name="form_type" value="blog" />
<input type="hidden" name="stage" value="" />
<input type="hidden" name="loc" value="signup" />
<input type='hidden' name='signup_form_id' value='2079787163' /><input type="hidden" id="_signup_form" name="_signup_form" value="9783b65654" />

This is how I coded.

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load("https://signup.wordpress.com/signup/");
var value = doc.DocumentNode.SelectSingleNode("//form[@name='signup_form_id'");
Console.WriteLine(value.InnerText);

My xpath expression is clearly flawed, but I have no idea what the problem is. Can any good person provide any recommendations? Many thanks!

1
2
9/1/2013 11:33:13 AM

Accepted Answer

Your code first fails on thedoc.Load line, since the 'Load' function only supports a file path and not a URI. You must useHtmlWeb 's Load a way to download HTML.

The shortcomings of your XPath, second

  • The closing bracket was missed.]
  • No form exists with the name specified tosignup_form_id

Finally, you should change your code to read as follows:

var url = "http://signup.wordpress.com/signup/";

var htmlWeb = new HtmlWeb();
var doc = htmlWeb.Load(url);

var value = doc.DocumentNode.SelectSingleNode("//form[@id='setupform']");
Console.WriteLine(value.OuterHtml);

Update: I'm glad you answered the question since my previous understanding of the issue was incorrect.

You seem to be searching for aninput not the tagform . Therefore, you need adjust your XPath to comply with this criterion.

The code that reads the necessary data is provided here:

var url = "http://signup.wordpress.com/signup/";

var htmlWeb = new HtmlWeb();
var doc = htmlWeb.Load(url);

var signupFormIdElement = doc.DocumentNode
    .SelectSingleNode("//input[@name='signup_form_id']");

var signupFormId = signupFormIdElement.GetAttributeValue("value", "");

Console.WriteLine(signupFormId);
3
9/1/2013 8:54:44 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow