How do HtmlAgilityPack extract text from html node whose class attribute appended dynamically

html-agility-pack

Question

Dear friends,I want to extract text 平均3.6 星 from this code segment excerpted from amazon.cn.

<div class="content"><ul>
<li><b>用户评分:</b>
<span class="crAvgStars" style="white-space:no-wrap;">
<span class="asinReviewsSummary" ref="dp_db_cm_cr_acr_pop_" name="B004GUSIKO">
<a>
  <span class="swSprite s_star_3_5 " title="平均3.6 星">
  <span>平均3.6 星</span>
  </span>
</a>

My question is span class tag value "s_star_3_5 " vary from different customer's rating level and appended dynamically. So I attempt to use doc.DocumentNode.SelectSingleNode(" //span[@class='swSprite']").InnerText or //span[@class='swSprite s_star_3_5 '], but the result is an error or not what my want !

Any suggestions?

Accepted Answer

First of all, I suggest you saving the value of doc.DocumentNode.OuterHtml to a local .html file and see if the code you're obtaining is that code. The thing is that sometimes you start parsing a website using HtmlAgilityPack, but the very first problem is that you're not getting the valid HTML correctly. Maybe you're getting a 404 error, or a redirection, etc.

I'm suggesting this because I tested //span[@class='swSprite s_star_3_5 '] and worked correctly.

That was the issue in the following questions:

If that doesn't help, post the HTML code and I'll help you ;)


Popular Answer

This works for me:

HtmlDocument doc = new HtmlDocument();
doc.Load(myHtml);
HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[starts-with(@class, 'swSprite')]");
Console.WriteLine("Text=" + node.InnerText.Trim());

and outputs

平均3.6 星

Note I use the XPATH starts-with function.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why