why HTML Agility Pack HtmlDocument.DocumentNode is null?

asp.net c# html-agility-pack

Question

I'm using this code to change the href attribute of a HTML stream.

first I download a full html page using this code:(URL is webpage address)

HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse myHttpWebResponse = 
                         (HttpWebResponse)myHttpWebRequest.GetResponse();

Stream s = myHttpWebResponse.GetResponseStream();

then I process this:

HtmlDocument doc = new HtmlDocument();

doc.Load(s);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
{
    string att = link.Attributes["href"].Value;
    link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
}
doc.Save(s);

s is html stream.

but I've got an exception that says doc.DocumentNode is null!

i tried many sites but doc.DocumentNode is null to

Accepted Answer

This works for me.

using(WebClient client = new WebClient())
{
    client.Encoding = System.Text.Encoding.UTF8;
    var doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
    foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
    {
        if (href == null) continue;
        href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
    }
    StringWriter writer = new StringWriter();
    doc.Save(writer);
    var finalHtml = writer.ToString();
}

Also see the HttpUtility.UrlEncode to be able to get the url back correctly. Otherwise, some parameters in original url may cause problem.

Use HttpUtility.UrlDecode to decode it.


Popular Answer

Anchor tag reference is an incorrectly escaped string:

...doc.DocumentNode.SelectNodes("/a")    //incorrect
...doc.DocumentNode.SelectNodes("//a")   //correct
...doc.DocumentNode.SelectNodes(@"/a")   //also correct

The original code fails to select any nodes and evaluates to null; this should be checked against to prevent failing on, say, a document where there are no links at all (however unlikely that is :)

var anchors = doc.DocumentNode.SelectNodes("//a");
if (anchors != null)
{
    foreach (HtmlNode link in anchors)
    {
        /*do stuff*/
    } 
}



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why