WebClient GodLikeClient = new WebClient(); HtmlAgilityPack.HtmlDocument GodLikeHTML = new HtmlAgilityPack.HtmlDocument(); GodLikeHTML.Load(GodLikeClient.OpenRead("www.alfa.lt");
Thus, this code outputs: "What is your psychological diagnosis of homosexuality? Instead of "Skaitytojo klausimas psichologui: kas lemia homosexualum," why not "Naujien...3 portalas Alfa.lt"? - Naujien's 3 websites Alfa.lt ".
This website is 1257 (baltic) encoded, but
textBox1.Text = GodLikeHTML.DocumentNode.OuterHtml;
returns the warped text, with baltic diacritics becoming strange, lengthy strings of characters: (
I have tried the HtmlAgilityPack forums, too. They are awful.
P.S. Although I am not a coder, I am working on a community project and I must get this code to run. Thank you;
Actually, UTF-8 is used to encrypt the page.
Or you could use the code in my SO respond that correctly re-encodes after detecting encoding from http headers or meta tags. (It furthermore supports gzip to cut down on download size.)
Your code would look like this if it used the download class:
HttpDownloader downloader = new HttpDownloader("http://www.alfa.lt",null,null); GodLikeHTML.LoadHtml(downloader.GetPage());
Similar encoding issues occurred to me. By include the following in my WebClient setup, I was able to correct issue with the most recent version of HTML Agility Pack.
var htmlWeb = new HtmlWeb(); htmlWeb.OverrideEncoding = Encoding.UTF8; var doc = htmlWeb.Load("www.alfa.lt");