To identify all elements with innertext containing a certain phrase or words, use Xpath and HtmlAgilityPack.

html-agility-pack xpath

Question

I am trying to build a simple search-engine using HtmlAgilityPack and Xpath with C# (.NET 4). I want to find every node containing a userdefined searchword, but I can't seem to get the XPath right. For Example:

<HTML>
 <BODY>
  <H1>Mr T for president</H1>
   <div>We believe the new president should be</div>
   <div>the awsome Mr T</div>
   <div>
    <H2>Mr T replies:</H2>
     <p>I pity the fool who doesn't vote</p>
     <p>for Mr T</p>
   </div>
  </BODY>
</HTML>

If the specified searchword is "Mr T" I'd want the following nodes: <H1>, The second <div>, <H2> and the second <p>. I have tried numerous variants of doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); but I always seem to wind up with every single node in the entire DOM.

Any hints to get me in the right direction would be very appreciated.

1
5
1/20/2012 11:05:26 PM

Accepted Answer

Use:

//*[text()[contains(., 'Mr T')]]

This selects all elements in the XML document that have a text-node child which contains the string 'Mr T'.

This can also be written shorter as:

//text()[contains(., 'Mr T')]/..

This selects the parent(s) of any text node that contains the string 'Mr T'.

12
1/20/2012 11:30:20 PM

Popular Answer

According to Xpath, if you want to find a specific keyword you need to follow the format ("keyword" is the word you like to search) :

//*[text()[contains(., 'keyword')]]

You have to follow the same format as above in C#, keyword is the string variable you call:

doc.DocumentNode.SelectNodes("//*[text()[contains(., '" + keyword + "')]]");


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow