Scrape Dynamic Data from Website Using C# HTMLAGILITYPACK

c# html-agility-pack web-scraping

Question

I'm using HTMLAGILITY Pack to scrape data, but the website won't load correctly.

I need my code to wait till the whole page has loaded.

Although there is a workaround to utilize the browser in the form, I don't need to.

The Link that I need to shred is shown below, along with my code.

HtmlWeb web = new HtmlWeb();
            ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
        HtmlAgilityPack.HtmlDocument doc = web.Load(website);
         var goldTypes = doc.DocumentNode.SelectNodes("//h2[@class='gold-box-title']").ToList();
       var goldPrices = doc.DocumentNode.SelectNodes("//span[@class='gold-box-price--sale'").ToList();

          for (int i = 0; i < 2; i++)
             {
               string  goldPrice = goldPrices[i].InnerText;
               string  goldType = goldTypes[i].InnerText;

             }
1
3
8/10/2018 1:16:46 PM

Accepted Answer

You were right; the ":buyable" property of the "buyable-gold" components contains a structured json representation of all the data.

This ought to be what you want, according to a brief test I conducted. This will provide a collection of structured objects with the information you want.

HtmlWeb web = new HtmlWeb();
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
HtmlAgilityPack.HtmlDocument doc = web.Load("https://www.ezrsgold.com/buy-runescape-gold");

var buyGoldNodes = doc.DocumentNode.SelectNodes("//buyable-gold");

var buyableJsonList = buyGoldNodes.Select(x => HttpUtility.HtmlDecode(x.Attributes[":buyable"].Value)).ToList();

var buyables = buyableJsons.Select(x => JsonConvert.DeserializeObject<Buyable>(x)).ToList();

If so, your Buyable class would resemble this.

public class Buyable
{
    public int id { get; set; }
    public string sku { get; set; }
    public int game_id { get; set; }
    public string title { get; set; }
    public int min_qty { get; set; }
    public int max_qty { get; set; }
    public string base_price { get; set; }
    public string sale_price { get; set; }
    public Bulk_Price[] bulk_price { get; set; }
    public string delivery_time { get; set; }
    public string description { get; set; }
    public object sort_order { get; set; }
    public string created_at { get; set; }
    public string updated_at { get; set; }
    public string price { get; set; }
    public bool on_sale { get; set; }
    public int discount_from { get; set; }
}

public class Bulk_Price
{
    public string qty { get; set; }
    public string price { get; set; }
}
2
8/11/2018 3:39:24 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow