How to pass a password when using HtmlAgilityPack

c# html-agility-pack web-scraping

Question

Using HtmlAgilityPack, I'm attempting to read certain XML files from a website. I'm using the following code:

HtmlWeb web = new HtmlWeb( ) ;
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument( ) ;
doc = web.Load( "http://example.com/index.asp"  ) ;

The website requests a password, which I have been given, but I'm not sure how to enter it in order to access the index.asp page, where I can view the page's XML links.

This is how example.com/index.asp appears:

 <form action="index.asp" method="post">
 <table>
     <tbody>
         <tr>
            <td>
                <input type="Text" name="password" value="" size="20"> 
            </td>
         </tr>
     </tbody>
 </table>
</form

How can I use HTMLAgilityPack to transmit the password to this page? I came saw an example using HTMLWeb.PreRequest, but I don't fully understand how it works. I can see that there are seven overloads for HtmlWeb.Load, but I'm not sure where to add the variable that contains the password.

doc = web.Load( "http://example.com/index.asp", "passwordVariable" ) ;

I would really appreciate it if someone could point me in the direction of the proper research avenue.

Many thanks

1
0
9/28/2018 3:17:21 PM

Accepted Answer

What you're trying to do, in my opinion, is post this page and attempt to access a restricted page. Web page security varies greatly, and the owner could be deliberately attempting to block such automated access.

For a straightforward security site that employs cookies, you can simulate the actions a browser takes by requesting the login page, executing a POST with the correct credentials (and any hidden fields that may be necessary), capturing the cookies that were generated, and then browsing to the page you want to visit using the provided cookies.

    private HttpWebRequest CreateRequest(string url, string method)
    {
        var request = (HttpWebRequest)WebRequest.Create(url);
        request.Referer = Host;
        request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36";
        request.Method = method;
        request.ContentType = "application/x-www-form-urlencoded; charset=UTF-8";

        return request;
    }

    public void Login()
    {
        byte[] bytes;
        string data;
        var SharedCookie = new CookieContainer();

        var url = "index.asp";

        try
        {
            //Start Session
            var request = CreateRequest(url, "GET");
            request.CookieContainer = SharedCookie;

            using (var tmpResponse = request.GetResponse())
            {
                //WriteResponse(tmpResponse);
                tmpResponse.Close();
            }

            //Login
            data = "password=123456";
            bytes = Encoding.UTF8.GetBytes(data);

            request = CreateRequest(url, "POST");
            request.CookieContainer = SharedCookie;

            using (var stream = request.GetRequestStream())
            {
                stream.Write(bytes, 0, bytes.Length);
            }

            using (var tmpResponse = request.GetResponse())
            {
                //WriteResponse(tmpResponse);
                tmpResponse.Close();
            }
            IsLoggedIn = true;
        }
        catch (System.Net.WebException ex)
        {
            Console.WriteLine("Web Error:" + ex.Status);
            Console.WriteLine("Url:" + url);
            Console.WriteLine(ex.Message);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Url:" + url);
            Console.WriteLine(ex.Message);
        }
    }
1
9/28/2018 4:01:17 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow