Using Xpath and HtmlAgilityPack to find all elements with innertext containing a specific word or words

html-agility-pack xpath

Question

I am trying to build a simple search-engine using HtmlAgilityPack and Xpath with C# (.NET 4). I want to find every node containing a userdefined searchword, but I can't seem to get the XPath right. For Example:

<HTML>
 <BODY>
  <H1>Mr T for president</H1>
   <div>We believe the new president should be</div>
   <div>the awsome Mr T</div>
   <div>
    <H2>Mr T replies:</H2>
     <p>I pity the fool who doesn't vote</p>
     <p>for Mr T</p>
   </div>
  </BODY>
</HTML>

If the specified searchword is "Mr T" I'd want the following nodes: <H1>, The second <div>, <H2> and the second <p>. I have tried numerous variants of doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); but I always seem to wind up with every single node in the entire DOM.

Any hints to get me in the right direction would be very appreciated.

Accepted Answer

Use:

//*[text()[contains(., 'Mr T')]]

This selects all elements in the XML document that have a text-node child which contains the string 'Mr T'.

This can also be written shorter as:

//text()[contains(., 'Mr T')]/..

This selects the parent(s) of any text node that contains the string 'Mr T'.


Popular Answer

According to Xpath, if you want to find a specific keyword you need to follow the format ("keyword" is the word you like to search) :

//*[text()[contains(., 'keyword')]]

You have to follow the same format as above in C#, keyword is the string variable you call:

doc.DocumentNode.SelectNodes("//*[text()[contains(., '" + keyword + "')]]");



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why