登錄後Htmlagilitypack

c# facebook html-agility-pack httprequest

我想解析一些像facebook這樣的html網站,
例如(www.facebook.com/somePage)
如果我想將此鏈接粘貼到我的資源管理器,它會重定向我首先登錄我的帳戶。所以我不看那個頁面。所以我不能使用Htmlagilitypack來獲得響應。
那麼我怎樣才能首先登錄網站programmaticaly(不使用webbrowser控件)然後調用該facebook頁面並獲得響應並使用Htmlagility包進行解析。我知道如何使用HtmlAgility包並且我知道使用Httprequest設置cookie我使用以下代碼來設置cookie但是之後如何解析somePage

CookieCollection cookies = new CookieCollection();
        try
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.facebook.com");
            request.CookieContainer = new CookieContainer();
            request.CookieContainer.Add(cookies);
            //Get the response from the server and save the cookies from the first request..
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            cookies = response.Cookies;
            response.Close();
        }
        catch (WebException)
        {
            MessageBox.Show("error");
        }

        string getUrl = "https://www.facebook.com/login.php?login_attempt=1";
        string postData = String.Format("email={0}&pass={1}", "xxxx@hotmail.com", "xxxxx");
        HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(getUrl);
        getRequest.CookieContainer = new CookieContainer();
        getRequest.CookieContainer.Add(cookies); //recover cookies First request
        getRequest.Method = WebRequestMethods.Http.Post;
        getRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
        getRequest.AllowWriteStreamBuffering = true;
        getRequest.ProtocolVersion = HttpVersion.Version11;
        getRequest.AllowAutoRedirect = true;
        getRequest.ContentType = "application/x-www-form-urlencoded";

        byte[] byteArray = Encoding.ASCII.GetBytes(postData);
        getRequest.ContentLength = byteArray.Length;
        Stream newStream = getRequest.GetRequestStream(); //open connection
        newStream.Write(byteArray, 0, byteArray.Length); // Send the data.
        newStream.Close();
        //How I parse (www.facebook.com/somePage) here?
       HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
       HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
       using (StreamReader sr = new StreamReader(getResponse.GetResponseStream(), Encoding.GetEncoding("windows-1251")))
        {

            doc.LoadHtml(sr.ReadToEnd());

        }

        foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
        {
            listBox1.Items.Add(link.InnerHtml);
        }

一般承認的答案

在您的HttpWebRequest調用GetResponse方法。這將生成一個WebResponse對象,您可以在其上調用GetResponseStream()來獲取內容。

由於Facebook幾乎沒有返回任何HTML內容(它們為瀏覽器發送大量Javascript以生成文檔),HtmlAgilityPack將無法真正幫助您。它將下載Javascript,但無法執行它,因此您會遇到難以解釋的文檔。

其他Html軟件包,如AwesoniumPhantomJS ,實際上可以執行Javascript並返回解釋的HtmlDomDocument。這些不需要你運行整個瀏覽器,它們都可以運行Headless(因為它被稱為運行瀏覽器而沒有UI)。

或者,使用Facebook Graph API訪問Facebook上的數據而無需解析HTML,它更穩定,並且是為了與Facebook上的數據交互的精確目的而構建的。



Related

許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow