Get value between html tags Xpath and HtmlAgility

c# html html-agility-pack html-parsing xpath

Question

I'm now attempting to obtain the content contained inside HTML elements for a certain website.

I am getting an error that reads, "The object reference not set to an instance of an object," so let's say I need to extract the text between these span tags. How would I go about doing that?

I'm not sure whether the fact that there is HTML Code here before this section should matter.

<div class="thumbnail-details">
<ul>
    <li> … </li>
    <li class="product-title">
        <span class="thumbnail-details-grey">The Blaster Portable Wireless Speaker in Black</span>
    </li>
    <li> … </li>
</ul>
</div>

My C# code is currently

    HtmlWeb hw = new HtmlWeb();
        HtmlAgilityPack.HtmlDocument htmlDoc = hw.Load(@"http://www.karmaloop.com/Browse.htm#Pgroup=1");
        if (htmlDoc.DocumentNode != null)
        {
            foreach (HtmlNode text in htmlDoc.DocumentNode.SelectNodes("//span[@class='thumbnail-details-grey']/text()"))
            {
                Console.WriteLine(text.InnerText);
            }

Could you please assist me with this? I want to remove "The Blaster Portable Wireless Speaker in Black."

1
2
1/2/2019 10:41:10 AM

Accepted Answer

Your code is perfect, but in order for it to function, the appropriate page has to be loaded. The results you see in your browser are loaded via an ajax request on the website you are loading.

As a result, you must use: in place of the URL you are presently using.

HtmlDocument htmlDoc = hw.Load(@"http://www.karmaloop.com/Browse?Pgroup=1&ajax=true&version=2");

Your code then runs successfully. I'm still seeking for the location where this request is assembled.

But it seems that the answer is rather obvious. For instance, the pagehttp://www.karmaloop.com/Browse.htm#Pdept=11&PageSize=30&Pgroup=1 ask for the URLhttp://www.karmaloop.com/Browse?Pdept=11&PageSize=30&Pgroup=1&ajax=true&version=2 . Therefore, all you have to do is establish a new URL using your existing one as a starting point.# .

0
10/7/2013 7:44:02 PM

Popular Answer

I'd advise utilizing CsQuery (https://www.nuget.org/packages/CsQuery/1.3.4), and after that, it's as easy as

var doc = CQ.CreateFromUrl(@"http://www.karmaloop.com/Browse.htm");
var nodes = doc.Find("span.thumbnail-details-grey");
foreach(var node in nodes)
    Console.WriteLine(node.InnerText);


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow