Dear friends,I want to extract text
å¹³å‡3.6 æ˜Ÿ from this code segment excerpted from amazon.cn.
<div class="content"><ul> <li><b>ç”¨æˆ·è¯„åˆ†:</b> <span class="crAvgStars" style="white-space:no-wrap;"> <span class="asinReviewsSummary" ref="dp_db_cm_cr_acr_pop_" name="B004GUSIKO"> <a> <span class="swSprite s_star_3_5 " title="å¹³å‡3.6 æ˜Ÿ"> <span>å¹³å‡3.6 æ˜Ÿ</span> </span> </a>
My question is span class tag value
"s_star_3_5 " vary from different customer's rating level and appended dynamically. So I attempt to use
doc.DocumentNode.SelectSingleNode(" //span[@class='swSprite']").InnerText or
//span[@class='swSprite s_star_3_5 '], but the result is an error or not what my want !
First of all, I suggest you saving the value of
doc.DocumentNode.OuterHtml to a local
.html file and see if the code you're obtaining is that code. The thing is that sometimes you start parsing a website using HtmlAgilityPack, but the very first problem is that you're not getting the valid HTML correctly. Maybe you're getting a 404 error, or a redirection, etc.
I'm suggesting this because I tested
//span[@class='swSprite s_star_3_5 '] and worked correctly.
That was the issue in the following questions:
If that doesn't help, post the HTML code and I'll help you ;)
This works for me:
HtmlDocument doc = new HtmlDocument(); doc.Load(myHtml); HtmlNode node = doc.DocumentNode.SelectSingleNode("//span[starts-with(@class, 'swSprite')]"); Console.WriteLine("Text=" + node.InnerText.Trim());
Note I use the XPATH starts-with function.