How to read html source from a HTTPS url

.net c# html html-agility-pack

Question

i am trying to read the html source code of a https url in c# with the following code:

 WebClient webClient = new WebClient();
 string htmlString = w.DownloadString("https://www.targetUrl.com");

enter image description here

this doesn't work for me as i get encoded html string. I tried using HtmlAgilityPack but with no help.

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlString);

Accepted Answer

That URL is returning a gzip compressed string. WebClient doesn't support this by default, so you'll want to go down to the underlying HttpWebRequest class instead. Blatant rip-off of the answer by feroze over here - Automatically decompress gzip response via WebClient.DownloadData

class MyWebClient : WebClient
{
    protected override WebRequest GetWebRequest(Uri address)
    {
        HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
        request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
        return request;
    }
}

Popular Answer

ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
WebClient webClient = new WebClient();
string htmlString = w.DownloadString(url);



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why