Find the parent div of a specific text in a HTMLDocument

.net html-agility-pack

Question

I want to get the whole InnerText of a div that contains a certain text.

For instance: I want to retrieve the whole InnerText of the div where "hello world" was discovered when I search for "hello world" in the HTMLDocument (HTMLAgilityPack).

I tried what follows:

HtmlNode textNode = doc.DocumentNode.SelectNodes("//text()[contains(., 'hello world')]/..")

The HtmlNode containing the specified text was returned by this.

To retrieve the whole InnerText, I now want to locate the textNode's first parent div.

I appreciate you.

1
0
1/11/2013 11:13:42 PM

Accepted Answer

I believe this will work...

var nodes2 = doc.DocumentNode.SelectNodes("//div[contains(.//text(), 'Hello World')]");

And maybe even better than that is this:

var nodes3 = doc.DocumentNode.SelectNodes("//text()[contains(., 'Hello World')]/ancestor::div[1]");

Alternatively use the notation for Linq-to-XML:

        var nodes =
            doc.DocumentNode.Descendants("div")
               .Where(
                   div =>
                   div.DescendantsAndSelf("text()").Where(text => text.InnerHtml.Contains("Hello World")).Any())

Or

        var nodes4 = from div in doc.DocumentNode.Descendants("div")
                     from text in div.DescendantsAndSelf("text()")
                     where text.InnerText.Contains("Hello World")
                     let firstParent = text.AncestorsAndSelf("div").First()
                     select firstParent;
1
1/12/2013 12:14:14 AM

Popular Answer

Although I'm unfamiliar with HTMLAgilityPack, its syntax resembles XPath. In such scenario, the ".." function should return the discovered element's parent.

I tested this with the following website: http://ponderer.org/download/xpath/

If you enter

//li[contains(., 'about')]/../..

The div holding the ul element (which includes the li element) will be highlighted in green in the textbox.

Does this match your search criteria?



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow