HTML 민첩성 팩 콘텐츠 가져 오기 <p itemprop>

c# html-agility-pack parsing xpath

문제

나는 HTML 민첩성 팩을 사용하는 내용을 얻으려고 노력하고있다. 다음은 구문 분석하려고하는 HTML의 샘플입니다.

         <p itemprop="articleBody">
    Hundreds of thousands of Ukrainians filled the streets of Kiev on Sunday, first to hear speeches and music and then to fan out and erect barricades in the district where government institutions have their headquarters.</p><p itemprop="articleBody">
    Carrying blue-and-yellow Ukrainian and European Union flags, the teeming crowd filled 
Independence Square, where protests have steadily gained momentum since Mr. Yanukovich refused on Nov. 21 to sign trade and political agreements with the European Union. The square has been transformed by a vast and growing tent encampment, and demonstrators have occupied City Hall and other public buildings nearby. Thousands more people gathered in other cities across the country.        </p><p itemprop="articleBody">
    “Resignation! Resignation!” people in the Kiev crowd chanted on Sunday, demanding that Mr. Yanukovich and the government led by Prime Minister Mykola Azarov leave office.        </p>

난 folllowing 코드를 사용하여 위의 HTML을 구문 분석하고있어 :

         <p itemprop="articleBody">
    Hundreds of thousands of Ukrainians filled the streets of Kiev on Sunday, first to hear speeches and music and then to fan out and erect barricades in the district where government institutions have their headquarters.</p><p itemprop="articleBody">
    Carrying blue-and-yellow Ukrainian and European Union flags, the teeming crowd filled 
Independence Square, where protests have steadily gained momentum since Mr. Yanukovich refused on Nov. 21 to sign trade and political agreements with the European Union. The square has been transformed by a vast and growing tent encampment, and demonstrators have occupied City Hall and other public buildings nearby. Thousands more people gathered in other cities across the country.        </p><p itemprop="articleBody">
    “Resignation! Resignation!” people in the Kiev crowd chanted on Sunday, demanding that Mr. Yanukovich and the government led by Prime Minister Mykola Azarov leave office.        </p>

편집하다:

하지만 articleBodyScope가 비어있는 것 같습니다. 이유는 다음과 같습니다.

         <p itemprop="articleBody">
    Hundreds of thousands of Ukrainians filled the streets of Kiev on Sunday, first to hear speeches and music and then to fan out and erect barricades in the district where government institutions have their headquarters.</p><p itemprop="articleBody">
    Carrying blue-and-yellow Ukrainian and European Union flags, the teeming crowd filled 
Independence Square, where protests have steadily gained momentum since Mr. Yanukovich refused on Nov. 21 to sign trade and political agreements with the European Union. The square has been transformed by a vast and growing tent encampment, and demonstrators have occupied City Hall and other public buildings nearby. Thousands more people gathered in other cities across the country.        </p><p itemprop="articleBody">
    “Resignation! Resignation!” people in the Kiev crowd chanted on Sunday, demanding that Mr. Yanukovich and the government led by Prime Minister Mykola Azarov leave office.        </p>

"CONTENT NOT NULL"을 인쇄하지 않고 articleBodyText 는 비어 있습니다. 누구든지 해결책을 가르쳐 주시면 미리 감사드립니다.

인기 답변

New York Times는 실제로 귀하가 쿠키를 수락하지 않는다고 판단합니다. 따라서 쿠키 경고 및 로그온 상자를 제공합니다. 실제로 CookieContainer 를 제공하면 .Net이 전체 쿠키 비즈니스를 처리하고 NYT가 실제로 그 내용을 표시하게 할 수 있습니다.

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace UnitTestProject3
{
    using System.Net;
    using System.Runtime;

    using HtmlAgilityPack;

    [TestClass]
    public class UnitTest1
    {
        [TestMethod]
        public void WhenProvidingCookiesYouSeeContent()
        {
            HtmlDocument doc = new HtmlDocument();

            WebClient wc = new WebClientEx(new CookieContainer());

            string contents = wc.DownloadString(
                "http://www.nytimes.com/2013/12/10/world/asia/thailand-protests.html?partner=rss&emc=rss&_r=1&");
            doc.LoadHtml(contents);

            var nodes = doc.DocumentNode.SelectNodes(@"//p[@itemprop='articleBody']");

            Assert.IsNotNull(nodes);
            Assert.IsTrue(nodes.Count > 0);
        }
    }

    public class WebClientEx : WebClient
    {
        public WebClientEx(CookieContainer container)
        {
            this.container = container;
        }

        private readonly CookieContainer container = new CookieContainer();

        protected override WebRequest GetWebRequest(Uri address)
        {
            WebRequest r = base.GetWebRequest(address);
            var request = r as HttpWebRequest;
            if (request != null)
            {
                request.CookieContainer = container;
            }
            return r;
        }

        protected override WebResponse GetWebResponse(WebRequest request, IAsyncResult result)
        {
            WebResponse response = base.GetWebResponse(request, result);
            ReadCookies(response);
            return response;
        }

        protected override WebResponse GetWebResponse(WebRequest request)
        {
            WebResponse response = base.GetWebResponse(request);
            ReadCookies(response);
            return response;
        }

        private void ReadCookies(WebResponse r)
        {
            var response = r as HttpWebResponse;
            if (response != null)
            {
                CookieCollection cookies = response.Cookies;
                container.Add(cookies);
            }
        }
    }
}

확장 된 WebClient 클래스에 대한 대답 덕분에.

노트

웹 사이트에서 새 이야기를 긁어 모으는 것은 NYT 이용 약관에 위배 될 수 있습니다.




아래 라이선스: CC-BY-SA with attribution
와 제휴하지 않음 Stack Overflow
이 KB는 합법적입니까? 예, 이유를 알아보십시오.
아래 라이선스: CC-BY-SA with attribution
와 제휴하지 않음 Stack Overflow
이 KB는 합법적입니까? 예, 이유를 알아보십시오.