Create Html links from in-text Urls using Html Agility Pack

c# html-agility-pack

Question

How can I convert a url to html link from text using Html Agility Pack + c#?

For example: "www.stackoverflow.com is a very cool site."

Output:

"<a href="www.stackoverflow.com">www.stackoverflow.com</a>  is a very cool site."

Accepted Answer

Thanks @user1778606 for your answer. I got this working though it still uses a bit of Regex. It works much better and safer (i.e. it will never create hyperlinks within hyperlinks and the href attribute).

        //convert text to html
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(inputString);

        // \w* - means it can start with any alphanumeric charactar
        // \s+ - was placed to replace all white spaces (when there is more than one word).
        // \b - set bounderies for the keyword
        const string pattern = @"((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[.\!\/\\w]*))?)";

        //get all elements text propery except for anchor element 
        var nodes = doc.DocumentNode.SelectNodes("//text()[not(ancestor::a)]") ?? new HtmlAgilityPack.HtmlNodeCollection(null);

        foreach (var node in nodes)
        {
            Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
            node.InnerHtml = regex.Replace(node.InnerHtml, "<a href=\"$1\">$1</a>").Replace("href=\"www", "href=\"http://www");
        }

        return doc.DocumentNode.OuterHtml;

Popular Answer

I'm pretty sure its possible, although I haven't attempted it.

Here's how to replace a fixed string in a document with links

Find keyword in text when keyword match certain conditions - C#

Heres how to regex for urls

regular expression for url

Put those together and it should be possible.

Pseudocode

select all text nodes

for each node

get the inner text
find urls in the text (use regex?)
for each url found

replace the text of the url with string literal link tag (a href = etc ...)



Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow