I would like to return the complete InnerText of a div where a specific text is inside.
For example: I am searching for "hello world" in the HTMLDocument (HTMLAgilityPack) and want to return the complete InnerText of the div where "hello world" was found.
This is what I tried:
HtmlNode textNode = doc.DocumentNode.SelectNodes("//text()[contains(., 'hello world')]/..")
This returned the HtmlNode where the specific text was found.
Now I want to get the first parent div of the textNode to return the complete InnerText.
Thanks in advance
This would do it I think...
var nodes2 = doc.DocumentNode.SelectNodes("//div[contains(.//text(), 'Hello World')]");
And this is probably an even better solution:
var nodes3 = doc.DocumentNode.SelectNodes("//text()[contains(., 'Hello World')]/ancestor::div[1]");
or use the Linq-to-XML notation:
var nodes =
doc.DocumentNode.Descendants("div")
.Where(
div =>
div.DescendantsAndSelf("text()").Where(text => text.InnerHtml.Contains("Hello World")).Any())
Or
var nodes4 = from div in doc.DocumentNode.Descendants("div")
from text in div.DescendantsAndSelf("text()")
where text.InnerText.Contains("Hello World")
let firstParent = text.AncestorsAndSelf("div").First()
select firstParent;
Although I don´t have experience with HTMLAgilityPack this does look like XPath syntax. In which case the ".." should be returning the parent of the element found.
I used this website for testing this: http://ponderer.org/download/xpath/
If you type in
//li[contains(., 'about')]/../..
in the textbox it will highlight the div containing the ul element (which contains the li element) in green.
Is this what you were looking for?