긁는 법
HtmlAgilityPack C #을 사용하여

asp.net c# html html-agility-pack

문제

영화 일정과 영화 제목을 스크랩하는 영화 사이트에서 데이터를 긁어 내고 싶습니다.이 html <div class="content" id="getSh"> 를 근근이 살아가는 쿼리를 작성하는 법을 모릅니다.

<div class="container">
          <div class="content" id="getSh"><ul class="ctr"><li class="ctrl">Cinema 1</li>
          <li class="ctrr">09, Mar</li><li class="cl"></li></ul>
          <ul class="col_row"><li class="col"><a href="#">3:15 pm</a></li>
          <li class="col cb"><a href="/movies/detail/299">The Second Best Exotic Marigold Hotel 
          <span class="blue">Digital 2D</span></a></li><li class="col cc"><a href="#">--</a>
          </li><li class="cl"></li></ul> <ul class="col_row"><li class="col"><a href="#">6:15 pm</a
          li><li class="col cb"><a href="/movies/detail/307">Focus <span class="blue">Digital 2D
          </span><span class="red">Adults Only</span></a></li><li class="col cc"><a href="#">--
          </a></li><li class="cl"></li></ul> <ul class="col_row"><li class="col">
          <a href="#">8:45 pm</a></li><li class="col cb"><a href="/movies/detail/266">
          Kingsman: The Secret Service <span class="blue">Digital 2D</span><span class="red">
          Adults Only</span></a></li><li class="col cc"><a href="#">--</a></li><li class="cl">
          </li></ul><ul class="col_row col_m"><li class="col"><a href="#">11:45 pm</a></li>
          <li class="col cb"><a href="/movies/detail/267">Badlapur <span class="blue">Digital 2D
          </span></a></li><li class="col cc"><a href="#">--</a></li><li class="cl">
          </li></ul><ul class="ctr"><li class="ctrl">Cinema 2</li><li class="ctrr">09, Mar</li>
          <li class="cl"></li></ul> <ul class="col_row"><li class="col"><a href="#">3:30 pm</a>
          </li><li class="col cb"><a href="/movies/detail/307">Focus <span class="blue">Digital 
          </span><span class="red">Adults Only</span></a></li><li class="col cc"><a href="#">--<
          /a></li><li class="cl"></li></ul> <ul class="col_row"><li class="col"><a href="#">6:00
          pm</a></li><li class="col cb"><a href="/movies/detail/266">Kingsman: The Secret Service
          <span class="blue">Digital 2D</span><span class="red">Adults Only</span></a></li>
          <li class="col cc"><a href="#">--</a></li><li class="cl"></li></ul> <ul class="col_row">
          <li class="col"><a href="#">9:00 pm</a></li><li class="col cb"><a href="/movies/detail/307">
          Focus <span class="blue">Digital 2D</span><span class="red">Adults Only</span></a></li>
          <li class="col cc"><a href="#">--</a></li><li class="cl"></li></ul><ul class="col_row col_m">
          <li class="col"><a href="#">11:30 pm</a></li><li class="col cb"><a href="/movies/detail/266">
          Kingsman: The Secret Service <span class="blue">Digital 2D</span><span class="red">Adults Only
          </span></a></li><li class="col cc"><a href="#">--</a></li><li class="cl"></li></ul><ul class="
          ctr"><li class="ctrl">Cinema 3</li><li class="ctrr">09, Mar</li><li class="cl"></li></ul>
          <ul class="col_row"><li class="col"><a href="#">3:45 pm</a></li><li class="col cb"><
          a href="/movies/detail/321">Hey Bro <span class="blue">Digital 2D</span></a></li><
          li class="col cc"><a href="#">--</a></li><li class="cl"></li></ul> <ul class="col_row"><
          li class="col"><a href="#">6:30 pm</a></li><li class="col cb"><a href="/movies/detail/328">D
          irty Politics <span class="blue">Digital 2D</span><span class="red">Adults Only</span>
          </a></li><li class="col cc"><a href="#">--</a></li><li class="cl"></li></ul> 
          <ul class="col_row"><li class="col"><a href="#">9:30 pm</a></li><li class="col cb">
          <a href="/movies/detail/321">Hey Bro <span class="blue">Digital 2D</span></a></li><
          li class="col cc"><a href="#">--</a></li><li class="cl"></li></ul><ul class="col_row col_m">
          <li class="col"><a href="#">12:15 am</a></li><li class="col cb"><a href="/movies/detail/328"
          >Dirty Politics <span class="blue">Digital 2D</span><span class="red">Adults Only</span></a>

          </li><li class="col cc"><a href="#">--</a></li><li class="cl"></li></ul><ul class="ctr">
          <li class="ctrl">Cinema 4</li><li class="ctrr">09, Mar</li><li class="cl"></li></ul> 
          <ul class="col_row"><li class="col"><a href="#">3:00 pm</a></li><li class="col cb">
          <a href="/movies/detail/295">The SpongeBob Movie: Sponge Out of Water  <span class="blue">D
          igital 3D</span></a></li><li class="col cc"><a href="#">--</a></li><li class="cl"></li>
          </ul> <ul class="col_row"><li class="col"><a href="#">5:15 pm</a></li><li class="col cb">
          <a href="/movies/detail/300">Paddington <span class="blue">Digital 2D</span></a></li>
          <li class="col cc"><a href="#">--</a></li><li class="cl"></li></ul> <ul class="col_row"><
          li class="col"><a href="#">7:30 pm</a></li><li class="col cb"><a href="/movies/detail/297">
          Unbroken <span class="blue">Digital 2D</span></a></li><li class="col cc"><a href="#">--</a>
          </li><li class="cl"></li></ul><ul class="col_row col_m"><li class="col"><a href="#">10:30 pm
          </a></li><li class="col cb">
          <a href="/movies/detail/299">The Second Best Exotic Marigold Hotel <span class="blue">Digital 2D<
          /span></a></li><li class="col cc"><
          a href="#">--</a></li><li class="cl"></li></ul><ul class="ctr">
          <li class="ctrl">Royal Cinema</li><li class="ctrr">09, Mar</li>
          <li class="cl"></li></ul> <ul class="col_row"><li class="col"><
          a href="#">3:05 pm</a></li><li class="col cb"><a href="/movies/detail/328">Dirty Politics <
          span class="blue">Digital 2D</span><span class="red">Adults Only</span></a></li><li class="col cc">
          <a href="#">--</a></li><li class="cl"></li></ul> <ul class="col_row"><li class="col"><a href="#">
          6:05 pm</a></li><li class="col cb"><a href="/movies/detail/307">Focus <span class="blue">Digital 2D
          </span><span class="red">Adults Only</span></a></li><li class="col cc"><a href="#">--</a></li>
          <li class="cl"></li></ul><ul class="col_row col_m"><li class="col"><a href="#">8:30 pm</a></li>
          <li class="col cb"><a href="/movies/detail/299">The Second Best Exotic Marigold Hotel
          <span class="blue">Digital 2D</span></a></li><li class="col cc"><a href="#">--</a></li>
          <li class="cl"></li></ul></div>
        </div>

그리고이 C # 코드를 사용하여 작동하지 않는 데이터를 추출합니다.

HtmlNode htmlNode = document.DocumentNode.SelectSingleNode("//div[@id='customScrollBox']");


        List<string> movieList = new List<string>();


        foreach (HtmlNode heading in htmlNode.SelectNodes("//ul[@class='col_row']"))
        {
            movieList.Add(heading.InnerText);

        }

이걸 원해요. 출력 시네마 룸 = 영화 1 영화 이름 = 이차 최고의 이국적인 메리 골드 호텔과 스케줄

수락 된 답변

내가 모은 것에서 영화 제목을 얻으려고하는거야? 그렇다면 아래 코드가이를 수행해야합니다.

    foreach (HtmlNode heading in htmlNode.SelectNodes("//ul[@class='col_row']")
    {
        var heading = heading.SelectSingleNode(".//li[@class='col cb']/a").InnerText;
        //I Presume you want other fields here?
    }


아래 라이선스: CC-BY-SA with attribution
와 제휴하지 않음 Stack Overflow
이 KB는 합법적입니까? 예, 이유를 알아보십시오.
아래 라이선스: CC-BY-SA with attribution
와 제휴하지 않음 Stack Overflow
이 KB는 합법적입니까? 예, 이유를 알아보십시오.