How can HtmlAgilityPack pull text from an html node with a dynamically added class attribute?



Dear friends,I want to extract text 平均3.6 星 from this code segment excerpted from

<div class="content"><ul>
<span class="crAvgStars" style="white-space:no-wrap;">
<span class="asinReviewsSummary" ref="dp_db_cm_cr_acr_pop_" name="B004GUSIKO">
  <span class="swSprite s_star_3_5 " title="平均3.6 星">
  <span>平均3.6 星</span>

My question is span class tag value "s_star_3_5 " vary from different customer's rating level and appended dynamically. So I attempt to use doc.DocumentNode.SelectSingleNode(" //span[@class='swSprite']").InnerText or //span[@class='swSprite s_star_3_5 '], but the result is an error or not what my want !

Any suggestions?

5/31/2011 2:20:07 AM

Accepted Answer

First of all, I suggest you saving the value of doc.DocumentNode.OuterHtml to a local .html file and see if the code you're obtaining is that code. The thing is that sometimes you start parsing a website using HtmlAgilityPack, but the very first problem is that you're not getting the valid HTML correctly. Maybe you're getting a 404 error, or a redirection, etc.

I'm suggesting this because I tested //span[@class='swSprite s_star_3_5 '] and worked correctly.

That was the issue in the following questions:

If that doesn't help, post the HTML code and I'll help you ;)

5/23/2017 12:31:02 PM

Popular Answer

This works for me:

HtmlDocument doc = new HtmlDocument();
HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[starts-with(@class, 'swSprite')]");
Console.WriteLine("Text=" + node.InnerText.Trim());

and outputs

平均3.6 星

Note I use the XPATH starts-with function.

