Htmlagilitypack after login

c# facebook html-agility-pack httprequest

Question

I want to parse some html site like facebook,
Forexample (www.facebook.com/somePage)
If I want to paste this link to my explorer, it redirects me to login my account first. So I aint see that page. So I cant use Htmlagilitypack to get response.
So How can I first login site programmaticaly (without using webbrowser control) then call that facebook page and get response and parse with Htmlagility pack. I know How can I use HtmlAgility pack and I know setting cookies with Httprequest I Use following code to set cookies but after that how can I parse that somePage

CookieCollection cookies = new CookieCollection();
        try
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.facebook.com");
            request.CookieContainer = new CookieContainer();
            request.CookieContainer.Add(cookies);
            //Get the response from the server and save the cookies from the first request..
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            cookies = response.Cookies;
            response.Close();
        }
        catch (WebException)
        {
            MessageBox.Show("error");
        }

        string getUrl = "https://www.facebook.com/login.php?login_attempt=1";
        string postData = String.Format("email={0}&pass={1}", "xxxx@hotmail.com", "xxxxx");
        HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(getUrl);
        getRequest.CookieContainer = new CookieContainer();
        getRequest.CookieContainer.Add(cookies); //recover cookies First request
        getRequest.Method = WebRequestMethods.Http.Post;
        getRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
        getRequest.AllowWriteStreamBuffering = true;
        getRequest.ProtocolVersion = HttpVersion.Version11;
        getRequest.AllowAutoRedirect = true;
        getRequest.ContentType = "application/x-www-form-urlencoded";

        byte[] byteArray = Encoding.ASCII.GetBytes(postData);
        getRequest.ContentLength = byteArray.Length;
        Stream newStream = getRequest.GetRequestStream(); //open connection
        newStream.Write(byteArray, 0, byteArray.Length); // Send the data.
        newStream.Close();
        //How I parse (www.facebook.com/somePage) here?
       HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
       HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
       using (StreamReader sr = new StreamReader(getResponse.GetResponseStream(), Encoding.GetEncoding("windows-1251")))
        {

            doc.LoadHtml(sr.ReadToEnd());

        }

        foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
        {
            listBox1.Items.Add(link.InnerHtml);
        }

Accepted Answer

On your HttpWebRequest call the GetResponse method. This yields a WebResponse object on which you can call GetResponseStream() to get to the contents.

Since Facebook returns almost no HTML content (they send a load of Javascript for the browser to generate the Document from), the HtmlAgilityPack won't really help you. It will download the Javascript, but can't execute it, so you're stuck with a document that's hard to interpret.

Other Html packages, like Awesonium or PhantomJS can actually execute the Javascript and return you the interpreted HtmlDomDocument. These won't require you to run the whole browser, they can both run Headless (as it's called to run a browser without a UI on top of it).

Alternatively, use the Facebook Graph API to access the data on facebook without parsing the HTML, it's much stabler and built for the exact purpose of interacting with data on facebook.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why