Selecting all nodes containing text with XPath

c# html-agility-pack xpath

Question

I have been struggling to resolve this problem I am having over the past couple of days. Say, I want to get all the text() from a HTML document, however I only want to know of and retrieve of the XPath of the node that contains text data. Example:

 foreach (var textNode in node.SelectNodes(".//text()")) 
 //do stuff here 

However, when it comes to retrieving the XPath of the textNode using textNode.XPath, I get the full XPath including the #text node:

/html[1]/body[1]/div[1]/a[1]/#text

Yet I only want the containing node of the text, for example:

/html[1]/body[1]/div[1]/a[1]

Could anyone point me toward a better XPath solution to retrieve all nodes that contains text but only retrieve the XPath up until the containing node?

Accepted Answer

Why don't you

string[] elements = getXPath(textNode).Split(new char[1] { '/' });
return String.Join("/", elements, 0, elements.Length-2);

Popular Answer

Instead of:

.//text() 

use:

.//*[normalize-space(text())]

This selects all "leaf-elements"-descendants of the context (current) node that have at least one non-whitespace-only text node child.



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why