HTML Agility Pack Replace Links

.net c# html-agility-pack

Question

I have some HTML Agility Pack code. I want to replace each and every link in the HTML text.

Example: I'll swap out

http://oldserver/Documents/1.pdf 

to

http://newserver/Documents/2.pdf

I can get all of the links' values and enumerate them, but when I do doc. When you call Save(), the HTML source is saved. not the revised html. How can I get the most recent HTML from the HTML Document?

private string FixHyperlinks(string contentHtml, SPWeb web)
    {
        TextReader reader = new StringReader(contentHtml);

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.Load(reader);

        List<string> hrefTags = new List<string>();

        foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
        {
            HtmlAttribute att = link.Attributes["href"];
            att.Value = RepairHyperlinkAddress(att.Value, web);
        }

        MemoryStream memoryStream = new MemoryStream();
        doc.Save(memoryStream);
        memoryStream.Seek(0, System.IO.SeekOrigin.Begin);
        StreamReader streamReader = new StreamReader(memoryStream);
        string result = streamReader.ReadToEnd();

        return result;
    }
1
2
12/5/2012 5:32:04 PM

Accepted Answer

This ought to perform better:

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
        {
            HtmlAttribute att = link.Attributes["href"];
            att.Value = RepairHyperlinkAddress(att.Value, web);
        }

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//img[@src]"))
        {
            HtmlAttribute att = link.Attributes["src"];
            att.Value = RepairHyperlinkAddress(att.Value, web);
        }
2
12/5/2012 4:26:10 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow