Inner text of Node ignoring inner text of children

c# html-agility-pack xpath

Question

Pardon me if it sounds too simple to be asked here but since this is my very first day with html-agility-pack, I am unable to sort out a way to select the inner text of a node which is the direct child of the node and ignoring inner text of the children nodes.

For example

<div id="div1">
   <div class="h1"> this needs to be selected
   <small> and not this</small>
   </div>
</div>

currently I am trying this

HtmlDocument page = new HtmlWeb().Load(url);
var s = page.DocumentNode.SelectSingleNode("//div[@id='div1']//div[@class='h1']");
string selText = s.innerText;

which returns the whole text (e.g- this needs to be selected and not this). Any suggestions??

Accepted Answer

You can use the /text() option to get all text nodes directly under a specific tag. If you only need the first one, add [1] to it:

page.LoadHtml(text);
var s = page.DocumentNode.SelectSingleNode("//div[@id='div1']//div[@class='h1']/text()[1]");
string selText = s.InnerText; 

Popular Answer

The div could possibly have multiple text nodes if there is text before and after its children. As I similarly indicated here, I think the best way to get all the direct text content of a node is to do something like:

HtmlDocument page = new HtmlWeb().Load(url);
var nodes = page.DocumentNode.SelectNodes("//div[@id='div1']//div[@class='h1']/text()");

StringBuilder sb  = new StringBuilder();
foreach(var node in nodes)
{
   sb.Append(node.InnerText);
}

string content = sb.ToString();



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why