HtmlAgilityPack C#--- Selectnodes Always returns a Null

c# html-agility-pack html-parsing xpath


This is the xpath text i tried to use along with HtmlAgilityPack C# parser.

//div[@id = 'sc1']/table/tbody/tr/td/span[@class='blacktxt']

I tried to evaluate the xpath expression with firefox xpath add=on and sucessfully got the required items. But the c# code returns an Null exception.

HtmlAgilityPack.HtmlNodeCollection node = htmldoc.DocumentNode.SelectNodes("//div[@id ='sc1']/table/tbody/tr/td/span[@class='blacktxt']");            

the node always contains null value... Please help me to find the way to get around this problem... Thank you..

8/4/2013 3:09:09 PM

Popular Answer

DOM Requires <tbody/> Tags to be Inserted

All common browser extensions for building XPath expressions work on the DOM. Opposite to the HTML specs, the DOM specs require <tr/> elements to be inside <tbody/> elements, so browsers add such elements if missing. You can easily see the difference if looking at the HTML source using Firebug (or similar developer tools working on the DOM) versus displaying the page source (using wget or similar tools that do not interpret anything if necessary).

The Solution

Remove the /tbody axis step, and your XPath expression will probably work.

//div[@id = 'sc1']/table/tr/td/span[@class='blacktxt']

If you Need to Support Both HTML With and Without <tbody/> Tags

For a more general solution, you could replace the /tbody axis step by a decendant-or-self step //, but this could jump into "inner tables":

//div[@id = 'sc1']/table//tr/td/span[@class='blacktxt']

Better would be to use alternative XPath expressions:

//div[@id = 'sc1']/table/tr/td/span[@class='blacktxt'] | //div[@id = 'sc1']/table/tbody/tr/td/span[@class='blacktxt'] 

A cleaner XPath 2.0 only solution would be

//div[@id = 'sc1']/table/(tbody, self::*)/tr/td/span[@class='blacktxt']
8/4/2013 9:04:43 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow