How to pass a password when using HtmlAgilityPack

c# html-agility-pack web-scraping

Question

I am trying to read the XML files of a website, I am using HtmlAgilityPack. This is the code I am using:

HtmlWeb web = new HtmlWeb( ) ;
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument( ) ;
doc = web.Load( "http://example.com/index.asp"  ) ;

The page asks for a password which they have provided me with but I'm not sure how to pass the password in order to get to the index.asp page where I will read the XML links of the page.

The example.com/index.asp looks like this:

 <form action="index.asp" method="post">
 <table>
     <tbody>
         <tr>
            <td>
                <input type="Text" name="password" value="" size="20"> 
            </td>
         </tr>
     </tbody>
 </table>
</form

How do I pass the password to this page from HtmlAgilityPack? I saw an example here that uses 'HtmlWeb.PreRequest' but I don't really understand too much about the process. I see that HtmlWeb.Load has 7 overloads but I don't know where to put my variable that holds the password.

doc = web.Load( "http://example.com/index.asp", "passwordVariable" ) ;

If someone could direct me to the right path to research I would really appreciate it.

Thank you

Accepted Answer

I think what you are looking for is to Post this page, and try and access another page that is protected. Security for web pages varies dramatically and the owner may be actively trying to prevent such programmatic access.

For a simply security site that uses cookies, you can mimic the actions a browser does by requesting the login page, doing a POST with the proper credentials (and any hidden fields that may be required) capturing the cookies that were created and browsing to the page you want to visit with the supplied cookies.

    private HttpWebRequest CreateRequest(string url, string method)
    {
        var request = (HttpWebRequest)WebRequest.Create(url);
        request.Referer = Host;
        request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36";
        request.Method = method;
        request.ContentType = "application/x-www-form-urlencoded; charset=UTF-8";

        return request;
    }

    public void Login()
    {
        byte[] bytes;
        string data;
        var SharedCookie = new CookieContainer();

        var url = "index.asp";

        try
        {
            //Start Session
            var request = CreateRequest(url, "GET");
            request.CookieContainer = SharedCookie;

            using (var tmpResponse = request.GetResponse())
            {
                //WriteResponse(tmpResponse);
                tmpResponse.Close();
            }

            //Login
            data = "password=123456";
            bytes = Encoding.UTF8.GetBytes(data);

            request = CreateRequest(url, "POST");
            request.CookieContainer = SharedCookie;

            using (var stream = request.GetRequestStream())
            {
                stream.Write(bytes, 0, bytes.Length);
            }

            using (var tmpResponse = request.GetResponse())
            {
                //WriteResponse(tmpResponse);
                tmpResponse.Close();
            }
            IsLoggedIn = true;
        }
        catch (System.Net.WebException ex)
        {
            Console.WriteLine("Web Error:" + ex.Status);
            Console.WriteLine("Url:" + url);
            Console.WriteLine(ex.Message);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Url:" + url);
            Console.WriteLine(ex.Message);
        }
    }


Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow