HtmlAgilityPack scraping - extracting specific nodes from html document

c# html html-agility-pack web-scraping xpath

Question

I apologize in advance if this has previously been addressed (if so, please direct me to the appropriate page). I looked here, the web, YouTube, and other places for two days without success.

I want to extract some information from the website at https://betcity.ru/en/results/sp_fl=a:46;

I'm attempting to get the names of each event for the day (the first is Ho Kwan Kit/Wong Chun Ting followed by Fan Zhendong/Xu Xin). I can see the following HTML code when I investigate that element:

<div class="content-results-data__event"><span>Ho Kwan Kit/Wong Chun Ting — Fan Zhendong/Xu Xin</span></div>

Getting every div with the class "content-results-data event" and then extracting the inner text from those divs was what I was considering doing. I get nothing every time I execute my code. I can see that there are divs with this class, so why am I not receiving any nodes? Also, how do I obtain all events? (if I learn how to do that I could get other info which I need from this site). The code is below (have to say I am fairly new to this).

public partial class Scrapper : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {
        List<string> Events = new List<string>();
        HtmlWeb web = new HtmlWeb();
        HtmlDocument doc = NewMethod(web);
        var Nodes = doc.DocumentNode.SelectNodes(xpath: "//div[@class='content - results - data__event'']").ToList();

        foreach (var item in Nodes)
        {
            Events.Add(item.InnerText);
        }

        GridView1.DataSource = Events;
        GridView1.DataBind();


    }

    private static HtmlDocument NewMethod(HtmlAgilityPack.HtmlWeb web)
    {
        return web.Load("https://betcity.ru/en/results/sp_fl=a:46;");
    }
}

}

1
0
11/19/2017 10:52:32 PM

Accepted Answer

Here's how to use Selenium to get the HTML for a single day of games. HtmlAgilityPack is the rest. I have to set up the driver to allow self signed certificates since the website utilizes them. A fantastic time.

        var ffOptions = new FirefoxOptions();
        ffOptions.BrowserExecutableLocation = @"C:\Program Files (x86)\Mozilla Firefox\firefox.exe";
        ffOptions.LogLevel = FirefoxDriverLogLevel.Default;
        ffOptions.Profile = new FirefoxProfile { AcceptUntrustedCertificates = true };
        var service = FirefoxDriverService.CreateDefaultService();
       var driver = new FirefoxDriver(service, ffOptions, TimeSpan.FromSeconds(120));

        string url = "https://betcity.ru/en/results/date=2017-11-19;"; //remember to update the date accordingly.

        driver.Navigate().GoToUrl(url);
        Thread.Sleep(2000);
        Console.Write(driver.PageSource);
0
11/20/2017 12:37:12 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow