WebDriver can use xpath to locate elements, but Html Agility Pack can't.

c# html-agility-pack visual-studio-2010 webdriver xpath

Question

My XPath queries have never worked with HTML Agility Pack; they only ever function with the simplest of queries:

//*[@id='some_id']

or

//input

However, Html Agility Pack is unable to manage them if they get more complex. The following example shows the issue: I browse to Google using WebDriver and return the page source, which is then sent to HTML Agility Pack, and both WebDriver and HTML Agility Pack make an effort to find the element/node (C#):

//The XPath query
const string xpath = "//form//tr[1]/td[1]//input[@name='q']";

//Navigate to Google and get page source
var driver = new FirefoxDriver(new FirefoxProfile()) { Url = "http://www.google.com" };
Thread.Sleep(2000);

//Can WebDriver find it?
var e = driver.FindElementByXPath(xpath);
Console.WriteLine(e!=null ? "Webdriver success" : "Webdriver failure");

//Can Html Agility Pack find it?
var source = driver.PageSource;
var htmlDoc = new HtmlDocument { OptionFixNestedTags = true };
htmlDoc.LoadHtml(source);
var nodes = htmlDoc.DocumentNode.SelectNodes(xpath);
Console.WriteLine(nodes!=null ? "Html Agility Pack success" : "Html Agility Pack failure");

driver.Quit();

In this instance, WebDriver was able to discover the object whereas HTML Agility Pack was unable to.

I'm aware that changing the xpath in this instance to //input[@name='q'] will solve the problem, but that will only solve the problem in this particular example, which isn't the point. Instead, I need something that will behave in a manner similar to that of WebDriver's xpath engine, or even that of the FirePath or FireFinder add-ons for Firefox.

Why can't Html Agility Pack locate it too if WebDriver can?

1
3
5/25/2011 4:30:43 PM

Accepted Answer

You are having trouble with the FORM element. By default, HTML Agility Pack the piece is handled differently never reports having children.

In the specific instance you provided, this search does indeed locate the desired element:

.//div/div[2]/table/tr/td/table/tr/td/div/table/tr/td/div/div[2]/input

This, however, does not, making it obvious that the form element is confusing the parser:

.//form/div/div[2]/table/tr/td/table/tr/td/div/table/tr/td/div/div[2]/input

However, that behavior may be altered. The form will provide you with child nodes if you add this line before processing the HTML:

HtmlNode.ElementsFlags.Remove("form");
8
5/25/2011 6:20:06 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow