I am trying to get all the text nodes of an element including its children, but for some reason it is giving me the entire documents HTML.
This is what I've came up with:
HtmlAgilityPack.HtmlNode el = htmlDoc.DocumentNode.SelectSingleNode("(//div[@class='TableContainer'])[" + index + "]");
if (el != null)
{
foreach (HtmlNode node in el.SelectNodes("//text()"))
{
Debug.WriteLine("text=" + node.InnerText.Replace(" ", " "));
}
}
It will print text=line
of the whole document. I'm sure there's something wrong with the //text()
, which is a snippet I found here at SO, but I don't know another way of doing it and I've been going crazy with it.
You should use a relative XPath expression, that is, relative to your el
context node
HtmlAgilityPack.HtmlNode el = htmlDoc.DocumentNode.SelectSingleNode("(//div[@class='TableContainer'])[" + index + "]");
if (el != null)
{
foreach (HtmlNode node in el.SelectNodes(".//text()"))
{
Debug.WriteLine("text=" + node.InnerText.Replace(" ", " "));
}
}
"//text()"
will select all descendant text nodes of the document root node
See Location Paths and Abbreviated Syntax from XPath specifications for details.
//para
selects all the para descendants of the document root and thus selects all para elements in the same document as the context node
.//para
selects the para element descendants of the context node