Find the parent div of a specific text in a HTMLDocument

.net html-agility-pack


I would like to return the complete InnerText of a div where a specific text is inside.

For example: I am searching for "hello world" in the HTMLDocument (HTMLAgilityPack) and want to return the complete InnerText of the div where "hello world" was found.

This is what I tried:

HtmlNode textNode = doc.DocumentNode.SelectNodes("//text()[contains(., 'hello world')]/..")

This returned the HtmlNode where the specific text was found.

Now I want to get the first parent div of the textNode to return the complete InnerText.

Thanks in advance

1/11/2013 11:13:42 PM

Accepted Answer

This would do it I think...

var nodes2 = doc.DocumentNode.SelectNodes("//div[contains(.//text(), 'Hello World')]");

And this is probably an even better solution:

var nodes3 = doc.DocumentNode.SelectNodes("//text()[contains(., 'Hello World')]/ancestor::div[1]");

or use the Linq-to-XML notation:

        var nodes =
                   div =>
                   div.DescendantsAndSelf("text()").Where(text => text.InnerHtml.Contains("Hello World")).Any())


        var nodes4 = from div in doc.DocumentNode.Descendants("div")
                     from text in div.DescendantsAndSelf("text()")
                     where text.InnerText.Contains("Hello World")
                     let firstParent = text.AncestorsAndSelf("div").First()
                     select firstParent;
1/12/2013 12:14:14 AM

Popular Answer

Although I don´t have experience with HTMLAgilityPack this does look like XPath syntax. In which case the ".." should be returning the parent of the element found.

I used this website for testing this:

If you type in

//li[contains(., 'about')]/../..

in the textbox it will highlight the div containing the ul element (which contains the li element) in green.

Is this what you were looking for?

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow