Cómo raspar los valores de una página web usando Html Agility Pack

c# html html-agility-pack web-scraping

Pregunta

Necesito algunos valores de una página web, así que estoy creando un raspado utilizando el paquete de agilidad html.

Te mostraré el sitio html y mi Csharp.

Sitio web de html:

  <div class="box-overflow">
    <div class="box-overflow__in">
      <table class="table-main js-tablebanner-t js-tablebanner-ntb">
        <tr>
          <th class="h-text-left" colspan="2">17. Round</th>

          <th class="h-text-center">1</th>

          <th class="h-text-center">X</th>

          <th class="h-text-center">2</th>

          <th>&nbsp;</th>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/" class=
          "in-match"><span>Lechia Gdansk</span> - <span>Leczna</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/">3:0</a></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="4.04"></td>

          <td class="table-matches__odds" data-odd="6.29"></td>

          <td class="h-text-right h-text-no-wrap">28.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/" class=
          "in-match"><span>Plock</span> - <span>Piast Gliwice</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/">0:0</a></td>

          <td class="table-matches__odds" data-odd="2.05"></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="3.50"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/" class=
          "in-match"><span>Slask Wroclaw</span> - <span>Legia</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/">0:4</a></td>

          <td class="table-matches__odds" data-odd="4.53"></td>

          <td class="table-matches__odds" data-odd="3.64"></td>

          <td class="table-matches__odds colored"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>
      </table>
    </div>
  </div>

Mi csharp

  <div class="box-overflow">
    <div class="box-overflow__in">
      <table class="table-main js-tablebanner-t js-tablebanner-ntb">
        <tr>
          <th class="h-text-left" colspan="2">17. Round</th>

          <th class="h-text-center">1</th>

          <th class="h-text-center">X</th>

          <th class="h-text-center">2</th>

          <th>&nbsp;</th>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/" class=
          "in-match"><span>Lechia Gdansk</span> - <span>Leczna</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/">3:0</a></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="4.04"></td>

          <td class="table-matches__odds" data-odd="6.29"></td>

          <td class="h-text-right h-text-no-wrap">28.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/" class=
          "in-match"><span>Plock</span> - <span>Piast Gliwice</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/">0:0</a></td>

          <td class="table-matches__odds" data-odd="2.05"></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="3.50"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/" class=
          "in-match"><span>Slask Wroclaw</span> - <span>Legia</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/">0:4</a></td>

          <td class="table-matches__odds" data-odd="4.53"></td>

          <td class="table-matches__odds" data-odd="3.64"></td>

          <td class="table-matches__odds colored"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>
      </table>
    </div>
  </div>

Te daré más información:

  <div class="box-overflow">
    <div class="box-overflow__in">
      <table class="table-main js-tablebanner-t js-tablebanner-ntb">
        <tr>
          <th class="h-text-left" colspan="2">17. Round</th>

          <th class="h-text-center">1</th>

          <th class="h-text-center">X</th>

          <th class="h-text-center">2</th>

          <th>&nbsp;</th>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/" class=
          "in-match"><span>Lechia Gdansk</span> - <span>Leczna</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/lechia-gdansk-leczna/Kjnscb6D/">3:0</a></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="4.04"></td>

          <td class="table-matches__odds" data-odd="6.29"></td>

          <td class="h-text-right h-text-no-wrap">28.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/" class=
          "in-match"><span>Plock</span> - <span>Piast Gliwice</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/plock-piast-gliwice/KrhILsqE/">0:0</a></td>

          <td class="table-matches__odds" data-odd="2.05"></td>

          <td class="table-matches__odds colored"></td>

          <td class="table-matches__odds" data-odd="3.50"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>

        <tr>
          <td class="h-text-left"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/" class=
          "in-match"><span>Slask Wroclaw</span> - <span>Legia</span></a></td>

          <td class="h-text-center"><a href=
          "/soccer/poland/ekstraklasa/slask-wroclaw-legia/bZjMK1bK/">0:4</a></td>

          <td class="table-matches__odds" data-odd="4.53"></td>

          <td class="table-matches__odds" data-odd="3.64"></td>

          <td class="table-matches__odds colored"></td>

          <td class="h-text-right h-text-no-wrap">27.11.2016</td>
        </tr>
      </table>
    </div>
  </div>

Si alguien necesita más información, pregúntame qué quieres saber. Gracias

Respuesta aceptada

Lo haria como

var list =  doc.DocumentNode.SelectSingleNode("//table[@class='table-main js-tablebanner-t js-tablebanner-ntb']")
                .Descendants("tr")
                .Select(x => new
                {
                    Val1 = x.SelectSingleNode("td[@class='h-text-left']")?.InnerText,
                    Val2 = x.SelectSingleNode("td[@class='h-text-center']")?.InnerText
                })
                .Where(x => x.Val1!=null)
                .ToList();



Licencia bajo: CC-BY-SA with attribution
No afiliado con Stack Overflow
¿Es esto KB legal? Sí, aprende por qué
Licencia bajo: CC-BY-SA with attribution
No afiliado con Stack Overflow
¿Es esto KB legal? Sí, aprende por qué