Retrieve all text nodes of element including children using HtmlAgilityPack in C#

.net .net-2.0 c# html-agility-pack xpath

Question

I am trying to get all the text nodes of an element including its children, but for some reason it is giving me the entire documents HTML.

This is what I've came up with:

HtmlAgilityPack.HtmlNode el = htmlDoc.DocumentNode.SelectSingleNode("(//div[@class='TableContainer'])[" + index + "]");
if (el != null)
{
    foreach (HtmlNode node in el.SelectNodes("//text()"))
    {
        Debug.WriteLine("text=" + node.InnerText.Replace(" ", " "));
    }
}

It will print text=line of the whole document. I'm sure there's something wrong with the //text(), which is a snippet I found here at SO, but I don't know another way of doing it and I've been going crazy with it.

Accepted Answer

You should use a relative XPath expression, that is, relative to your el context node

HtmlAgilityPack.HtmlNode el = htmlDoc.DocumentNode.SelectSingleNode("(//div[@class='TableContainer'])[" + index + "]");
if (el != null)
{
    foreach (HtmlNode node in el.SelectNodes(".//text()"))
    {
        Debug.WriteLine("text=" + node.InnerText.Replace(" ", " "));
    }
}

"//text()" will select all descendant text nodes of the document root node

See Location Paths and Abbreviated Syntax from XPath specifications for details.

  • //para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

  • .//para selects the para element descendants of the context node




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why