Comment gratter les valeurs d'une page Web à l'aide de HTML Agility Pack

c# html html-agility-pack web-scraping

Question

J'ai besoin de certaines valeurs d'une page Web, je construis donc un scraping en utilisant le pack d'agilité HTML.

Je vais vous montrer le site HTML et mon Csharp.

Site Web html:

  <div class="box-overflow">
    <div class="box-overflow__in">
      <table class="table-main js-tablebanner-t js-tablebanner-ntb">
        <tr>
          <th class="h-text-left" colspan="2">17. Round</th>

          <th class="h-text-center">1</th>

          <th class="h-text-center">X</th>

          <th class="h-text-center">2</th>

          <th>&nbsp;</th>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/" class=
          "in-match"><span>Lechia Gdansk</span> - <span>Leczna</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/">3:0</a></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="4.04"></td>

          <td class="table-matches__odds" data-odd="6.29"></td>

          <td class="h-text-right h-text-no-wrap">28.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/" class=
          "in-match"><span>Plock</span> - <span>Piast Gliwice</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/">0:0</a></td>

          <td class="table-matches__odds" data-odd="2.05"></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="3.50"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/" class=
          "in-match"><span>Slask Wroclaw</span> - <span>Legia</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/">0:4</a></td>

          <td class="table-matches__odds" data-odd="4.53"></td>

          <td class="table-matches__odds" data-odd="3.64"></td>

          <td class="table-matches__odds colored"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>
      </table>
    </div>
  </div>

Mon csharp:

  <div class="box-overflow">
    <div class="box-overflow__in">
      <table class="table-main js-tablebanner-t js-tablebanner-ntb">
        <tr>
          <th class="h-text-left" colspan="2">17. Round</th>

          <th class="h-text-center">1</th>

          <th class="h-text-center">X</th>

          <th class="h-text-center">2</th>

          <th>&nbsp;</th>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/" class=
          "in-match"><span>Lechia Gdansk</span> - <span>Leczna</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/">3:0</a></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="4.04"></td>

          <td class="table-matches__odds" data-odd="6.29"></td>

          <td class="h-text-right h-text-no-wrap">28.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/" class=
          "in-match"><span>Plock</span> - <span>Piast Gliwice</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/">0:0</a></td>

          <td class="table-matches__odds" data-odd="2.05"></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="3.50"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/" class=
          "in-match"><span>Slask Wroclaw</span> - <span>Legia</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/">0:4</a></td>

          <td class="table-matches__odds" data-odd="4.53"></td>

          <td class="table-matches__odds" data-odd="3.64"></td>

          <td class="table-matches__odds colored"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>
      </table>
    </div>
  </div>

Je vous donnerai plus d'informations:

  <div class="box-overflow">
    <div class="box-overflow__in">
      <table class="table-main js-tablebanner-t js-tablebanner-ntb">
        <tr>
          <th class="h-text-left" colspan="2">17. Round</th>

          <th class="h-text-center">1</th>

          <th class="h-text-center">X</th>

          <th class="h-text-center">2</th>

          <th>&nbsp;</th>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/" class=
          "in-match"><span>Lechia Gdansk</span> - <span>Leczna</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/">3:0</a></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="4.04"></td>

          <td class="table-matches__odds" data-odd="6.29"></td>

          <td class="h-text-right h-text-no-wrap">28.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/" class=
          "in-match"><span>Plock</span> - <span>Piast Gliwice</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/">0:0</a></td>

          <td class="table-matches__odds" data-odd="2.05"></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="3.50"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/" class=
          "in-match"><span>Slask Wroclaw</span> - <span>Legia</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/">0:4</a></td>

          <td class="table-matches__odds" data-odd="4.53"></td>

          <td class="table-matches__odds" data-odd="3.64"></td>

          <td class="table-matches__odds colored"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>
      </table>
    </div>
  </div>

Si quelqu'un a besoin de plus d'infrastructures, demandez-moi ce que vous voulez savoir. Merci

Réponse acceptée

Je le ferais comme

var list =  doc.DocumentNode.SelectSingleNode("//table[@class='table-main js-tablebanner-t js-tablebanner-ntb']")
                .Descendants("tr")
                .Select(x => new
                {
                    Val1 = x.SelectSingleNode("td[@class='h-text-left']")?.InnerText,
                    Val2 = x.SelectSingleNode("td[@class='h-text-center']")?.InnerText
                })
                .Where(x => x.Val1!=null)
                .ToList();



Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow
Est-ce KB légal? Oui, apprenez pourquoi
Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow
Est-ce KB légal? Oui, apprenez pourquoi