HtmlAgiltyPack parse HTML and take value out of span tag and class name

asp.net asp.net-mvc c# html html-agility-pack

Question

I have an HTML that I download via my webrequest client. And out of entire html I want to parse only this part of HTML:

<span class="sku">
<span class="fb">SKU                            :</span>118880101
</span>

I'm using HTML agilty pack to retrieve this value: 118880101

And I've written something like this:

 HtmlDocument htmlDoc = new HtmlDocument();
 htmlDoc.LoadHtml(html);
 return htmlDoc.DocumentNode.SelectNodes("//span[@class='sku']").ElementAt(0).InnerText;

And this returns me this value from HTML:

SKU                            :118880101

Literally like this, spaces included... How can I fix this logic with HTML Agilty pack so that I can only take out this 118880101 value?

Can someone help me out?

Edit: a regex like this would do the thing:

Substring(skuRaw.LastIndexOf(':') + 1);

which would mean to take everything after ":' sign in string that I receive... But I'm not sure if it's safe to use regex like this ?

Accepted Answer

Try This

     HtmlDocument htmlDoc = new HtmlDocument();
     htmlDoc.LoadHtml(html);
     var innerText=htmlDoc.DocumentNode.SelectNodes("//span[@class='sku']")
                          .ElementAt(0).InnerText;
     return innerText.replace(/\D/g,'');

if you want to use only Html Agility pack try this

       var child = htmlDoc.DocumentNode.SelectNodes("//span[@class='fb']")
                           .FirstOrDefault();
        if (child != null)
        {
            var parent = child.ParentNode;
            parent.RemoveChild(child);
            var innerText = parent.InnerText;              
        }


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why