How do I correctly grab the images I scrape with HtmlAgilityPack?

c# html-agility-pack web-scraping xpath

Question

I'm presently working on a project and picking up HAP along the way. I get the gist of it, and it seems to have a lot of potential.

I'm attempting to scrape a product from this one website and obtain the URLs to the photos, but I'm having trouble since I don't know how to extract the link from the xpath. It was much simpler for me to perform this using Regex in the past, but I'm switching to this HAP now.

I currently use this code. Although I don't believe it will be particularly helpful to see, I'll still provide it.

    private static void HAP()
    {
        var url = "https://www.dhgate.com/product/brass-hexagonal-fidget-spinner-hexa-spinner/403294406.html#gw-0-4|ff8080815e03d6df015e9394cc681f8a:ff80808159abe8a5015a3fd78c5b51bb";
        // HtmlWeb - A Utility class to get HTML document from http
        var web = new HtmlWeb();
        //Load() Method download the specified HTML document from an Internet resource.
        var doc = web.Load(url);

        var rootNode = doc.DocumentNode;

        var divs = doc.DocumentNode.SelectNodes(String.Format("//IMG[@src='{0}']", "www.dhresource.com/webp/m/100x100/f2/albu/g5/M00/14/45/rBVaI1kWttaAI1IrAATeirRp-t8793.jpg"));
        Console.WriteLine(divs);
        Console.ReadLine();
    }

This is the URL that I'm copying.

https://www.dhgate.com/product/2017-led-light-up-hand-spinners-fidget-spinner/398793721.html#s1-0-1b;searl|4175152669

And this ought to be the first image's xPath.

//IMG[@src='//www.dhresource.com/webp/m/100x100s/f2-albu-g5-M00-6E-20-rBVaI1kWtmmAF9cmAANMKysq_GY926.jpg/2017-led-light-up-hand-spinners-fidget-spinner.jpg']
1
1
9/20/2017 11:47:07 AM

Popular Answer

For this, I write a helper method. To acquire all the links, I had to retrieve the node, then the attribute, and finally cycle over the attribute.

private static void HAP()
        {
            //Declare the URL
            var url = "https://www.dhgate.com/product/brass-hexagonal-fidget-spinner-hexa-spinner/403294406.html#gw-0-4|ff8080815e03d6df015e9394cc681f8a:ff80808159abe8a5015a3fd78c5b51bb";
            // HtmlWeb - A Utility class to get HTML document from http
            var web = new HtmlWeb();
            //Load() Method download the specified HTML document from an Internet resource.
            var doc = web.Load(url);

            var rootNode = doc.DocumentNode;

            var nodes = doc.DocumentNode.SelectNodes("//img");
            foreach (var src in nodes)
            {
                var links = src.Attributes["src"].Value;
                Console.WriteLine(links);
            }
            Console.ReadLine();
        }
0
9/20/2017 12:59:51 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow