Using HTMLAgilityPack, select all p>s from a Node's children.

c# html-agility-pack screen-scraping

Question

To generate an HTML page, I'm using the code shown below. Make the links rel nofollow and open in a new window or tab after making the urls absolute. My concern is with the attribute addition to the<a> s.

        string url = "http://www.mysite.com/";
        string strResult = "";            

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        if ((request.HaveResponse) && (response.StatusCode == HttpStatusCode.OK)) {
            using (StreamReader sr = new StreamReader(response.GetResponseStream())) {
                strResult = sr.ReadToEnd();
                sr.Close();
            }
        }

        HtmlDocument ContentHTML = new HtmlDocument();
        ContentHTML.LoadHtml(strResult);
        HtmlNode ContentNode = ContentHTML.GetElementbyId("content");

        foreach (HtmlNode node in ContentNode.SelectNodes("/a")) {
            node.Attributes.Append("rel", "nofollow");
            node.Attributes.Append("target", "_blank");
        }

        return ContentNode.WriteTo();

Can somebody point out my mistakes? Long attempts here have yielded no results. With this code, the ContentNode appears. No object instance is set in SelectNodes("/a"). I considered trying to set the steam to zero.

Cheers, Denis

1
5
1/21/2010 5:45:24 PM

Accepted Answer

Is ContentNode null? With the query, you may need to use select-single."//*[@id='content']" .

for details"/a" indicates all anchors between zzz-18 zzz."descendant::a" work? In addition,HtmlElement.GetElementsByTagName which may be simpler, i.e.yourElement.GetElementsByTagName("a") .

4
1/21/2010 5:44:21 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow