Utilisation de HtmlAgilityPack pour obtenir des données de ligne et de colonne spécifiques

c# html-agility-pack xml-parsing xpath

Question

C'est ma table

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

Je boucle en boucle chaque noeud de mon document HTML en utilisant le code ci-dessous

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

Quand j'utilise le suivant

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

Je reçois le code HTML suivant

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

Comment en extraire l'adresse 120 NW 157TH AVE ?

Quand j'ai essayé d'utiliser

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

Je reçois une erreur:

La référence d'objet n'est pas définie à une instance d'un objet

Réponse acceptée

Votre code html est un désordre. Les balises se chevauchent. Je vous suggère d'utiliser des nœuds de texte comme identificateurs plutôt que des index, par exemple.

.//td[./a[contains(text(),'See on Map')]]/td/text() 

obtenir

120 NW 157TH AVE

Voici un exemple complet qui vous donne tout

.//td[./a[contains(text(),'See on Map')]]/td/text() 

Notez que parce que votre html est désordonné, les xpaths doivent être aussi désordonnés, essayer d'accéder à l'élément tr par index ne fonctionnera pas car tous les éléments tr sont des enfants du précédent tr , ce qui est .//tr[4] table normale est .//tr/tr/tr/tr dans votre table.




Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow
Est-ce KB légal? Oui, apprenez pourquoi
Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow
Est-ce KB légal? Oui, apprenez pourquoi