How to extract text inside a div tag using htmlagilitypack

c# html html-agility-pack winforms

Question

I want to extract the text "Some text goes here" between the div class. I am using html agility pack, and c#

<div class="productDescriptionWrapper">
Some Text Goes here...
<div class="emptyClear"> </div>
</div>

this is what I have :

Description = doc.DocumentNode.SelectNodes("//div[@class=\"productDescriptionWrapper\").Descendants("div").Select(x => x.InnerText).ToList();

I get this error :

An unhandled exception of type 'System.NullReferenceException' 

I know how to extract if the text is b/w a <h1> or <p> instead of "div" in Descendants i will have to give "h1" or "p".

Somebody please assist.

Accepted Answer

Use single quotes such as

//div[@class='productDescriptionWrapper']

to get all descendants of all types use:

//div[@class='productDescriptionWrapper']//*,

to get all descendants of a specific type such as a p then use //div[@class='productDescriptionWrapper']//p.

to get all descendants that are either a div or a p:

//div[@class='productDescriptionWrapper']//*[self::div or self::p] 

say you wanted to get all non blank descendant text nodes then use:

//div[@class='productDescriptionWrapper']//text()[normalize-space()]

Popular Answer

There is no way you can get null reference exception given doc is created from HTML snippet you posted. Anyway, if you meant to get text within the outer <div>, but not from the inner one, then use xpath /text() which mean get direct child text nodes.

For example, given this HTML snippet :

var html = @"<div class=""productDescriptionWrapper"">
Some Text Goes here...
<div class=""emptyClear"">Don't get this one</div>
</div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);

..this expression return text from the outer <div> only :

var Description = doc.DocumentNode
                     .SelectNodes("//div[@class='productDescriptionWrapper']/text()")
                     .Select(x => x.InnerText.Trim())
                     .First();
//Description : 
//"Some Text Goes here..."

..while in contrast, the following return all the text :

var Description = doc.DocumentNode
                     .SelectNodes("//div[@class='productDescriptionWrapper']")
                     .Select(x => x.InnerText.Trim())
                     .First();
//Description :
//"Some Text Goes here...
//Don't get this one"


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why