i am trying to read the html source code of a https url in c# with the following code:
WebClient webClient = new WebClient();
string htmlString = w.DownloadString("https://www.targetUrl.com");
this doesn't work for me as i get encoded html string. I tried using HtmlAgilityPack but with no help.
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlString);
That URL is returning a gzip compressed string. WebClient
doesn't support this by default, so you'll want to go down to the underlying HttpWebRequest
class instead. Blatant rip-off of the answer by feroze over here - Automatically decompress gzip response via WebClient.DownloadData
class MyWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest;
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
return request;
}
}
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; };
WebClient webClient = new WebClient();
string htmlString = webClient.DownloadString(url);