HTML Agility Pack link correction

c# html-agility-pack syntax


I'm working on a small project and I got a little problem, hope you could help me.

I got this basic few lines that load a given url and takes out some tags:

var webGet2 = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = webGet2.Load(pattern);
var htmlMatches = doc.DocumentNode.SelectNodes("//li[@class=''] | //li[@class='f']");

After I'm receiving the collection, I need to run a foreach loop that can take all the href and src link and make them valid, because when I'm downloading the source, the link looks like /folder/folder/image.jpg I want to add before each link.

I've build this project with Regex and had no problem doing that, but with HTML agility its not getting straight with my mind.

Thank you!

7/31/2012 7:59:06 PM

Accepted Answer

So you want to search some nodes for certain attributes that contain relative urls and change them to absolute urls? You could do this:

static void AdjustAttributes(HtmlNode root, string baseUrl, string attrName)
    var query =
        from node in root.Descendants()
        let attr = node.Attributes[attrName]
        where attr != null
        select attr;
    foreach (var attr in query)
        var url = GetAbsoluteUrlString(baseUrl, attr.Value);
        attr.Value = url;

static string GetAbsoluteUrlString(string baseUrl, string url)
    var uri = new Uri(url, UriKind.RelativeOrAbsolute);
    if (!uri.IsAbsoluteUri)
        uri = new Uri(new Uri(baseUrl), uri);
    return uri.ToString();
var web = new HtmlWeb();
var doc = web.Load(pattern);
var selectedNodes = doc.DocumentNode.SelectNodes("//li[@class=''] | //li[@class='f']");
foreach (var node in selectedNodes)
    AdjustAttributes(node, url, "href");
    AdjustAttributes(node, url, "src");
7/31/2012 10:32:37 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow