The error "Object reference not assigned to an instance of an object" is returned by HtmlAgilityPack WebGet.Load.

c# html-agility-pack


I'm working on a project to get new vehicle pricing data from dealer websites. Most websites I can obtain the HTML for. But WebGet fails when I attempt to load one of them. Using the Load(url) function,Object reference not set to an instance of an object. error. There are no distinctions between these websites that I could identify.

Examples of typical functioning urls:

a website's problems

I appreciate your support.

var webGet = new HtmlWeb();  
var document = webGet.Load("");

This link does not load the content when I use it.

5/10/2011 4:33:40 PM

Popular Answer

Internals of the HTMLAgilityPack are where the true issue lies. This meta content type is present on the broken page:<META http-equiv="Content-Type" content="text/html; charset=8859-9"> where charset=8859-9 seems to be inconsistent. The internals of the HAL attempt to find a suitable encoding for this string using something likeEncoding.GetEncoding("8859-9") and this results in an error (I believe the proper encoding should beiso-8859-9 ).

The HAL only needs to be instructed not to read the encoding for theHtmlDocument (just HtmlDocument.OptionReadEncoding = true ), yet it seems that this is impossible withHtmlWeb.Load (setting HtmlWeb.AutoDetectEncoding does not apply here). Therefore, the easiest approach would be to manually read the url:

var document = new HtmlDocument();
document.OptionReadEncoding = false;

var url = 
   new Uri("");
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
using (var response = (HttpWebResponse)request.GetResponse())
    using (var stream = response.GetResponseStream())
        document.Load(stream, Encoding.GetEncoding("iso-8859-9"));

This correctly parses the page and works.

@:Simon Mourier: EDIT: Yes, it does increaseNullReferenceException given that it capturesArgumentException as well_declaredencoding = null there. after that_declaredencoding.WindowsCodePage line emits a null reference error.

A code block from the HtmlDocument.cs is shown below.ReadDocumentEncoding method:

    _declaredencoding = Encoding.GetEncoding(charset);
catch (ArgumentException)
    _declaredencoding = null;
if (_onlyDetectEncoding)
    throw new EncodingFoundException(_declaredencoding);

if (_streamencoding != null)
    if (_declaredencoding.WindowsCodePage != _streamencoding.WindowsCodePage)
            _line, _lineposition,
            _index, node.OuterHtml,
            "Encoding mismatch between StreamEncoding: " +
            _streamencoding.WebName + " and DeclaredEncoding: " +

Here is my stack trace as well:

System.NullReferenceException was unhandled
  Message=Object reference not set to an instance of an object.
       at HtmlAgilityPack.HtmlDocument.ReadDocumentEncoding(HtmlNode node) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1916
       at HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32 index, Boolean close) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1805
       at HtmlAgilityPack.HtmlDocument.Parse() in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1468
       at HtmlAgilityPack.HtmlDocument.Load(TextReader reader) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 769
       at HtmlAgilityPack.HtmlDocument.Load(Stream stream, Boolean detectEncodingFromByteOrderMarks) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 597
       at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1515
       at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563
       at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1152
       at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107
       at test.console.Program.Main(String[] args) in W:\Projects\Me\test.console\test.console\Program.cs:line 54
       at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
       at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
       at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
       at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
       at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
       at System.Threading.ThreadHelper.ThreadStart()
5/12/2011 11:01:37 AM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow