HtmlAgillityPack을 사용하여 HTML 읽기 옵션 태그 내용 파싱하기

html html-agility-pack select xpath

문제

HtmlAgilityPack을 사용하여 HTML을 구문 분석하려고하지만 문제가 있습니다.

샘플 HTML 문서 :

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>VÅ¡etky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">VÅ¡etci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

<select name="akt-miest" id="onoffaci"> 에서 구문 분석 값이 필요합니다.

예 :

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>VÅ¡etky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">VÅ¡etci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

나는 가치 **a_0** 과 텍스트 **VÅ¡etci** 얻을 필요가있다.

그래서 나는 이드 (Id)가 선택한 첫 번째 접근을 시도한다.

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>VÅ¡etky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">VÅ¡etci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

그런 다음 Xpath를 사용하여 모든 옵션 노드를 선택하십시오.

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>VÅ¡etky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">VÅ¡etci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

그리고 가치를 얻으십시오 :

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>VÅ¡etky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">VÅ¡etci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

하지만 다른 select ( <select id="region" name="region"> )에서 값을 가져옵니다.이 선택은 html 코드의 맨 위에 있습니다.

편집 :

Darin Dimitrov의 조언을 다음과 같이 적용 해 보겠습니다.

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>VÅ¡etky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">VÅ¡etci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

처음 세 옵션 요소 만 구문 분석합니다. 왜냐하면 문제는 선택 구성 요소라고 생각하기 때문입니다.

optgroup 태그

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>VÅ¡etky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">VÅ¡etci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

나는이 모든 다음 노드를 선택하려고합니다.

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>VÅ¡etky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">VÅ¡etci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

하지만이 오류가 발생합니다 : xpath has an invalid token.

selectNode의 모든 자식에 대한 액세스를 원합니다.

<tr>
  <td class="css_lokalita" colspan="4">
    <select id="region" name="region">
      <option value="0"  selected>VÅ¡etky regiony</option>
      <optgroup>Banskobystrický kraj</optgroup>
      <option value="k_1"  style="color: #000000; font-weight:bold;">Banskobystrický kraj</option>
      <option value="1">&nbsp;&nbsp;&nbsp;Banská Bystrica</option>
          .
          .
          .
      <option value="174">&nbsp;&nbsp;&nbsp;CZ - Ústecký kraj</option>
      <option value="175">&nbsp;&nbsp;&nbsp;CZ - Zlínský kraj</option>     
    </select>
  </td>
</tr>

<tr>
  <td class="css_sfotkou"  colspan="4">
    <input type="checkbox" name="foto" value="1" id="foto" />
    <label for="foto">Iba používatelia s fotkou</label>
  </td>
</tr>

<tr>
  <td class="css_miestnost" colspan="4">
    <select name="akt-miest" id="onoffaci">
      <option value="a_0">VÅ¡etci</option>
          .
          .
          .
      <optgroup label="Záľuby a záujmy">
        <option value="m_1419307">&nbsp;&nbsp;&nbsp;Bez Lásky</option>
          .
          .
          .
        <option value="m_1108016">&nbsp;&nbsp;&nbsp;Drum N Bass</option>
      </optgroup>
    </select>
  </td>
</tr>

편집 # 2 :

여기에 모든 HTML 파일이 있는데, 거기에서 파싱 옵션 태그가 필요합니다.

http://hotfile.com/dl/98442053/577b556/source.html

수락 된 답변

기본적으로 <OPTION> 태그는 Html Agility Pack에 의해 "Empty"로 처리됩니다. 즉, 닫기 </OPTION> 가 필요하지 않습니다. 이 경우 닫는 태그는 버려집니다. HtmlNode.ElementFlags 컬렉션을 사용하여이 동작을 변경할 수 있습니다.

원하는 코드를 작성해야합니다.

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags.Remove("option");
doc.LoadHtml(yourHtml);

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//select[@id='onoffaci']//option"))
{
    Console.WriteLine("Value=" + node.Attributes["value"].Value);
    Console.WriteLine("InnerText=" + node.InnerText);
    Console.WriteLine();
}

인기 답변

XPath 표현식 :

//option

절대 경로입니다. 루트에서 시작하여 모든 트리를 통과합니다.

상대적 XPath 표현식이 필요합니다.

//option

또는 속기

//option

참고 : 경로를 시작할 수있는 유일한 경우입니다 . ( self::node() 속기)는 유용합니다.




아래 라이선스: CC-BY-SA with attribution
와 제휴하지 않음 Stack Overflow
이 KB는 합법적입니까? 예, 이유를 알아보십시오.
아래 라이선스: CC-BY-SA with attribution
와 제휴하지 않음 Stack Overflow
이 KB는 합법적입니까? 예, 이유를 알아보십시오.