I am trying to build a simple search-engine using HtmlAgilityPack and Xpath with C# (.NET 4). I want to find every node containing a userdefined searchword, but I can't seem to get the XPath right. For Example:
<HTML> <BODY> <H1>Mr T for president</H1> <div>We believe the new president should be</div> <div>the awsome Mr T</div> <div> <H2>Mr T replies:</H2> <p>I pity the fool who doesn't vote</p> <p>for Mr T</p> </div> </BODY> </HTML>
If the specified searchword is "Mr T" I'd want the following nodes:
<H1>, The second
<H2> and the second
I have tried numerous variants of
doc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); but I always seem to wind up with every single node in the entire DOM.
Any hints to get me in the right direction would be very appreciated.
//*[text()[contains(., 'Mr T')]]
This selects all elements in the XML document that have a text-node child which contains the string
This can also be written shorter as:
//text()[contains(., 'Mr T')]/..
This selects the parent(s) of any text node that contains the string
According to Xpath, if you want to find a specific keyword you need to follow the format ("keyword" is the word you like to search) :
You have to follow the same format as above in C#,
keyword is the string variable you call:
doc.DocumentNode.SelectNodes("//*[text()[contains(., '" + keyword + "')]]");