I'm using this code to change the href attribute of a HTML stream.
first I download a full html page using this code:(URL is webpage address)
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse myHttpWebResponse =
(HttpWebResponse)myHttpWebRequest.GetResponse();
Stream s = myHttpWebResponse.GetResponseStream();
then I process this:
HtmlDocument doc = new HtmlDocument();
doc.Load(s);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("/a"))
{
string att = link.Attributes["href"].Value;
link.Attributes["href"].Value = "http://ahmadalli.somee.com/default.aspx?url=" + att;
}
doc.Save(s);
s
is html stream.
but I've got an exception that says doc.DocumentNode
is null!
i tried many sites but doc.DocumentNode
is null to
This works for me.
using(WebClient client = new WebClient())
{
client.Encoding = System.Text.Encoding.UTF8;
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(client.DownloadString("http://www.google.com?q=stackoverflow"));
foreach (var href in doc.DocumentNode.Descendants("a").Select(x => x.Attributes["href"]))
{
if (href == null) continue;
href.Value = "http://ahmadalli.somee.com/default.aspx?url=" + HttpUtility.UrlEncode(href.Value);
}
StringWriter writer = new StringWriter();
doc.Save(writer);
var finalHtml = writer.ToString();
}
Also see the HttpUtility.UrlEncode
to be able to get the url back correctly. Otherwise, some parameters in original url may cause problem.
Use HttpUtility.UrlDecode
to decode it.
Anchor tag reference is an incorrectly escaped string:
...doc.DocumentNode.SelectNodes("/a") //incorrect
...doc.DocumentNode.SelectNodes("//a") //correct
...doc.DocumentNode.SelectNodes(@"/a") //also correct
The original code fails to select any nodes and evaluates to null; this should be checked against to prevent failing on, say, a document where there are no links at all (however unlikely that is :)
var anchors = doc.DocumentNode.SelectNodes("//a");
if (anchors != null)
{
foreach (HtmlNode link in anchors)
{
/*do stuff*/
}
}