To identify all elements with innertext containing a certain phrase or words, use Xpath and HtmlAgilityPack.

html-agility-pack xpath

Question

I'm attempting to create a simple search engine in C# using HTML Agility Pack and Xpath (.NET 4). I'm trying to utilize XPath to discover every node that has a user-defined search term. For instance:

<HTML>
 <BODY>
  <H1>Mr T for president</H1>
   <div>We believe the new president should be</div>
   <div>the awsome Mr T</div>
   <div>
    <H2>Mr T replies:</H2>
     <p>I pity the fool who doesn't vote</p>
     <p>for Mr T</p>
   </div>
  </BODY>
</HTML>

I'd want the following nodes if "Mr T" is the search term specified:<H1> The following<div> , <H2> and the following<p> . I have experimented with a variety ofdoc.DocumentNode.SelectNodes("//text()[contains(., "+ searchword +")]"); yet, I consistently seem to end up with every node in the whole DOM.

I would be grateful for any pointers that may lead me in the correct way.

1
5
1/20/2012 11:05:26 PM

Accepted Answer

Use:

//*[text()[contains(., 'Mr T')]]

By doing this, the XML document's elements that have a text-node child that contains the string are selected.'Mr T' .

This may also be expressed succinctly as:

//text()[contains(., 'Mr T')]/..

This decides which text node's parent(s) should be chosen if it contains the string.'Mr T' .

12
1/20/2012 11:30:20 PM

Popular Answer

According to Xpath, you must use the syntax shown below to discover a certain keyword ("keyword" is the term you want to search for):

//*[text()[contains(., 'keyword')]]

In C#, you must adhere to the same format.keyword is the string you refer to as:

doc.DocumentNode.SelectNodes("//*[text()[contains(., '" + keyword + "')]]");


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow