HtmlAgilityPack을 사용하여 특정 행 및 열 데이터 가져 오기

c# html-agility-pack xml-parsing xpath

문제

이게 내 테이블이야.

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

아래 코드를 사용하여 HTML 문서의 각 노드를 반복합니다.

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

다음을 사용할 때

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

나는 다음 html을 얻는다.

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

어떻게 120 NW 157TH AVE 주소를 추출합니까?

사용을 시도했을 때

<table class="DataRows" frame="myFrames" rules="Standard" width="100%">

  <colgroup><col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  </colgroup><thead>

  <col width="70" align="CENTER">
  <col width="200" align="LEFT">
  <col width="80" align="LEFT">
  <col align="LEFT">
  <col align="RIGHT">

  <thead>

  <tr>
    <td valign="TOP"><span class="classicBold"> 20 </span> Kg.
    <td class="BOLD" valign="TOP" nowrap="">
      PA Passion Foods Inc.
    <td class="BOLD">Fax:
    <td>
      222-555666
    <td class="BOLD">
      Processed foods and juices

  <tr>
    <td><a target="_blank" href="">See on Map </a>
    <td>
      120 NW 157TH AVE 
    <td class="BOLD">Warehouse Hours:
    <td colspan="2">


  <tr>
    <td>
    <td><span class="BOLD">
      Jacksonville,
      </span>
      FL 300000
    <td class="BOLD">Url:
    <td colspan="2">
      <a target="_blank" href="">PA Passion</a>
      &nbsp&nbsp
      <span class="BOLD">E-mail:</span>
      zoro@xyz.com

  <tr>
    <td>
    <td class="REDBOLD" colspan="4">


  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
 Nutrella


</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT">Franchisee for:<span class="BOLD">
APPLE Foods, Constants
</span>
  <tr>
    <td>
    <td colspan="4" align="LEFT"><span class="BOLD">

</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We service:<span class="BOLD">
All occasions and hospitality services
</span>

  <tr>
    <td>
    <td colspan="4" align="LEFT">We sell :<span class="BOLD">
----
</span>

</td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></tr></td></td></td></td></tr></td></td></td></td></tr></td></td></td></td></td></tr>
  </thead>
</table>

오류가 발생했습니다.

개체 참조가 개체의 인스턴스로 설정되지 않았습니다.

수락 된 답변

귀하의 HTML은 난장판 태그가 겹쳐있다 난 당신이 예를 들어 인덱스 대신 식별자로 텍스트 노드를 사용하는 것이 좋습니다

.//td[./a[contains(text(),'See on Map')]]/td/text() 

도착

120 NW 157TH AVE

여기에 모든 것을 얻는 예제가 있습니다.

.//td[./a[contains(text(),'See on Map')]]/td/text() 

당신의 html이 얼마나 지저분 해지는가에 따라 xpaths는 지저분 해 야합니다. 모든 tr 요소가 이전 tr 자식이기 때문에 인덱스로 tr 요소에 액세스하려고하면 작동하지 않습니다. .//tr[4] 는 정상적인 테이블은 .//tr/tr/tr/tr 입니다.




아래 라이선스: CC-BY-SA with attribution
와 제휴하지 않음 Stack Overflow
이 KB는 합법적입니까? 예, 이유를 알아보십시오.
아래 라이선스: CC-BY-SA with attribution
와 제휴하지 않음 Stack Overflow
이 KB는 합법적입니까? 예, 이유를 알아보십시오.