Comment gratter les valeurs d'une page Web à l'aide de HTML Agility Pack

c# html html-agility-pack web-scraping

Question

J'ai besoin de certaines valeurs d'une page Web, je construis donc un scraping en utilisant le pack d'agilité HTML.

Je vais vous montrer le site HTML et mon Csharp.

Site Web html:

  <div class="box-overflow">
    <div class="box-overflow__in">
      <table class="table-main js-tablebanner-t js-tablebanner-ntb">
        <tr>
          <th class="h-text-left" colspan="2">17. Round</th>

          <th class="h-text-center">1</th>

          <th class="h-text-center">X</th>

          <th class="h-text-center">2</th>

          <th>&nbsp;</th>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/" class=
          "in-match"><span>Lechia Gdansk</span> - <span>Leczna</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/">3:0</a></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="4.04"></td>

          <td class="table-matches__odds" data-odd="6.29"></td>

          <td class="h-text-right h-text-no-wrap">28.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/" class=
          "in-match"><span>Plock</span> - <span>Piast Gliwice</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/">0:0</a></td>

          <td class="table-matches__odds" data-odd="2.05"></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="3.50"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/" class=
          "in-match"><span>Slask Wroclaw</span> - <span>Legia</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/">0:4</a></td>

          <td class="table-matches__odds" data-odd="4.53"></td>

          <td class="table-matches__odds" data-odd="3.64"></td>

          <td class="table-matches__odds colored"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>
      </table>
    </div>
  </div>

Mon csharp:

 var url = "http://www.betexplorer.com/soccer/poland/ekstraklasa/results/";

        var web = new HtmlWeb();
        var doc = web.Load(url);

        Bets = new List<Bet>();



        // Lettura delle righe
        var Rows = doc.DocumentNode.SelectNodes("//table");

        foreach (var row in Rows)
        {
            if (!row.GetAttributeValue("class", "").Contains("table-main js-tablebanner-t js-tablebanner-ntb"))
            {
                if (string.IsNullOrEmpty(row.InnerText))
                    continue;

                var rowBet = new Bet();
                foreach (var node in row.ChildNodes)
                {
                    var data_odd = node.GetAttributeValue("data-odd", "");

                    if (string.IsNullOrEmpty(data_odd))
                    {
                        if (node.GetAttributeValue("class", "").Contains("in-match"))
                        {
                            rowBet.Match = node.InnerText.Trim();
                            var matchTeam = rowBet.Match.Split(new[] { " - " }, StringSplitOptions.RemoveEmptyEntries);
                            rowBet.Home = matchTeam[0];
                            rowBet.Host = matchTeam[1];
                        }


                        if (node.GetAttributeValue("class", "").Contains("h-text-center"))
                        {
                            rowBet.Result = node.InnerText.Trim();
                            var matchPoints = rowBet.Result.Split(new[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
                            int help;
                            if (int.TryParse(matchPoints[0], out help))
                            {
                                rowBet.HomePoints = help;
                            }
                            if (matchPoints.Length == 2 && int.TryParse(matchPoints[1], out help))
                            {
                                rowBet.HostPoints = help;
                            }

                        }


                        if (node.GetAttributeValue("class", "").Contains("h-text-right h-text-no-wrap"))
                            rowBet.Date = node.InnerText.Trim();

                    }
                    else
                    {
                        rowBet.Odds.Add(data_odd);
                    }
                }

                if (!string.IsNullOrEmpty(rowBet.Match))
                    Bets.Add(rowBet);
            }
        }

Je vous donnerai plus d'informations:

I need to take teams name (e.g. Lechia Gdansk - Leczna),
result (e.g. 3:0)
data-odd (e.g. 1.49, 4.04, 6.29)
and match date (e.g. 28.11.2016)

Si quelqu'un a besoin de plus d'infrastructures, demandez-moi ce que vous voulez savoir. Merci

Réponse acceptée

Je le ferais comme

var list =  doc.DocumentNode.SelectSingleNode("//table[@class='table-main js-tablebanner-t js-tablebanner-ntb']")
                .Descendants("tr")
                .Select(x => new
                {
                    Val1 = x.SelectSingleNode("td[@class='h-text-left']")?.InnerText,
                    Val2 = x.SelectSingleNode("td[@class='h-text-center']")?.InnerText
                })
                .Where(x => x.Val1!=null)
                .ToList();


Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow
Est-ce KB légal? Oui, apprenez pourquoi
Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow
Est-ce KB légal? Oui, apprenez pourquoi