HtmlAgiltyPack parse HTML and take value out of span tag and class name

asp.net asp.net-mvc c# html html-agility-pack

Question

I download an HTML file using my web request client. And I just want to parse this portion of HTML out of the complete document:

<span class="sku">
<span class="fb">SKU                            :</span>118880101
</span>

To get this value, I'm using HTML Agility Pack: 118880101

And I wrote something along these lines:

 HtmlDocument htmlDoc = new HtmlDocument();
 htmlDoc.LoadHtml(html);
 return htmlDoc.DocumentNode.SelectNodes("//span[@class='sku']").ElementAt(0).InnerText;

And this gives me the following HTML value:

SKU                            :118880101

literally, with gaps in between... How can I modify HTML Agility Pack's logic such that I can just remove the value 118880101?

Can someone please assist me?

Edit: This regex might be effective:

Substring(skuRaw.LastIndexOf(':') + 1);

This would entail taking everything in the received string after the ":" symbol... However, I'm unsure whether using regex in this way is secure.

1
0
3/19/2017 2:51:52 PM

Accepted Answer

Do This

     HtmlDocument htmlDoc = new HtmlDocument();
     htmlDoc.LoadHtml(html);
     var innerText=htmlDoc.DocumentNode.SelectNodes("//span[@class='sku']")
                          .ElementAt(0).InnerText;
     return innerText.replace(/\D/g,'');

if you want to use only Html Agility pack try this

       var child = htmlDoc.DocumentNode.SelectNodes("//span[@class='fb']")
                           .FirstOrDefault();
        if (child != null)
        {
            var parent = child.ParentNode;
            parent.RemoveChild(child);
            var innerText = parent.InnerText;              
        }
1
3/19/2017 3:21:27 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow