I have an HTML that I download via my webrequest client. And out of entire html I want to parse only this part of HTML:
<span class="sku">
<span class="fb">SKU :</span>118880101
</span>
I'm using HTML agilty pack to retrieve this value: 118880101
And I've written something like this:
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
return htmlDoc.DocumentNode.SelectNodes("//span[@class='sku']").ElementAt(0).InnerText;
And this returns me this value from HTML:
SKU :118880101
Literally like this, spaces included... How can I fix this logic with HTML Agilty pack so that I can only take out this 118880101 value?
Can someone help me out?
Edit: a regex like this would do the thing:
Substring(skuRaw.LastIndexOf(':') + 1);
which would mean to take everything after ":' sign in string that I receive... But I'm not sure if it's safe to use regex like this ?
Try This
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var innerText=htmlDoc.DocumentNode.SelectNodes("//span[@class='sku']")
.ElementAt(0).InnerText;
return innerText.replace(/\D/g,'');
if you want to use only Html Agility pack try this
var child = htmlDoc.DocumentNode.SelectNodes("//span[@class='fb']")
.FirstOrDefault();
if (child != null)
{
var parent = child.ParentNode;
parent.RemoveChild(child);
var innerText = parent.InnerText;
}