Html Agility Pack get specific content from a div

c# html html-agility-pack

Question

I'm trying to pull text from a "div" and to exclude everything else. Can you help me please ?!

<div class="article">
   <div class="date">01.01.2000</div>
   <div class="news-type"><a href="../link/page01">Breaking News</a></div>

   "Here is the location of the text i would like to pull"

</div>

When I pull "article" class i get everything, but i'm unable/don't know how to exclude class="date", class="news-type", and everything in it.

Here is the code i use:

HtmlDocument doc = web.Load(url);
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div[contains(@class,'article')]"))
{
    name_text.text += node.InnerHtml.Trim();
}

Thank you!

Accepted Answer

Another way would be using XPath /text()[normalize-space()] to get non-empty, direct-child text nodes from the div elements :

var divs = doc.DocumentNode.SelectNodes("//div[contains(@class,'article')]");
foreach (HtmlNode div in divs)
{
    var node = div.SelectSingleNode("text()[normalize-space()]");
    Console.WriteLine(node.InnerText.Trim());
}

dotnetfiddle demo

output :

"Here is the location of the text i would like to pull"

Popular Answer

You want the ChildNodes that are type HtmlTextNode. Untested suggested code:

var textNodes = node.ChildNodes.OfType<HtmlTextNode>();
if (textNodes.Any())
{
    name_text.text += string.Join(string.Empty, textNodes.Select(tn => tn.InnerHtml));
}


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why