Why is HtmlDocument.DocumentNode null in HTML Agility Pack?

asp.net c# html-agility-pack


I'm using this code to change the href attribute of a HTML stream.

first I download a full html page using this code:(URL is webpage address)

HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse myHttpWebResponse = 

Stream s = myHttpWebResponse.GetResponseStream();

then I process this:

HtmlDocument doc = new HtmlDocument();

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
    string att = link.Attributes["href"].Value;
    link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;

s is html stream.

but I've got an exception that says doc.DocumentNode is null!

i tried many sites but doc.DocumentNode is null to

3/3/2012 3:57:14 PM

Accepted Answer

This works for me.

using(WebClient client = new WebClient())
    client.Encoding = System.Text.Encoding.UTF8;
    var doc = new HtmlAgilityPack.HtmlDocument();
    foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
        if (href == null) continue;
        href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
    StringWriter writer = new StringWriter();
    var finalHtml = writer.ToString();

Also see the HttpUtility.UrlEncode to be able to get the url back correctly. Otherwise, some parameters in original url may cause problem.

Use HttpUtility.UrlDecode to decode it.

3/7/2012 9:53:40 AM

Popular Answer

Anchor tag reference is an incorrectly escaped string:

...doc.DocumentNode.SelectNodes("/a")    //incorrect
...doc.DocumentNode.SelectNodes("//a")   //correct
...doc.DocumentNode.SelectNodes(@"/a")   //also correct

The original code fails to select any nodes and evaluates to null; this should be checked against to prevent failing on, say, a document where there are no links at all (however unlikely that is :)

var anchors = doc.DocumentNode.SelectNodes("//a");
if (anchors != null)
    foreach (HtmlNode link in anchors)
        /*do stuff*/

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow