HtmlAgilityPack Attributes.Remove on Image Only Removes One, When There Are Two

attributes c# html-agility-pack

Question

In our project, I'm utilizing HtmlAgilityPack so that I may show some HTML from one of our systems. I want to make sure I'm not doing anything incorrectly since I came across this problem during my unit testing. If I have an image with two "src" values, I want to choose one, take them both out, then put one back in with the proper path. With our HTML, I don't believe this will happen, but just in case...

So here is an example of an image tag:

<img align=\"left\" alt=\"\" src=\"/blah.jpg\" src=\"/knowledge/blah.jpg\" border=\"0\" />

Here is the HTML manipulation code:

    public static string FixHtmlLinks(this string html)
    {
        var htmlDoc = new HtmlDocument()
        {
            OptionWriteEmptyNodes = true
        };
        htmlDoc.LoadHtml(html);

        var imagesToCheck = htmlDoc.DocumentNode.SelectNodes("//img[@src!='']");

        if (null != imagesToCheck)
        {
            foreach (var image in imagesToCheck.ToList())
            {
                var src = image.GetAttributeValue("src", string.Empty);
                if (Uri.IsWellFormedUriString(src, UriKind.Relative))
                {
                    image.Attributes.Remove("src");
                    image.SetAttributeValue("src", string.Format(RELATIVE_IMAGE_PROTOCOL_AND_HOST, src));
                }
                else if (Uri.IsWellFormedUriString(src, UriKind.Absolute))
                {
                    image.Attributes.Remove("src");
                    image.SetAttributeValue("src", src.Replace(ABSOLUTE_IMAGE_HOST_TO_REPLACE, IMAGE_PROTOCOL_AND_HOST));
                }
            }
        }

        return htmlDoc.DocumentNode.OuterHtml;
    }

When I am debugging and reach the line "image.Attributes. There are two "src"s in this sentence: "values are as predicted. There is only one "source" value, the one that begins with "/knowledge," once that line has finished running. However, given what the Remove summary states, I would anticipate that both of them will be deleted.

Removes an attribute from the list, using its name. If there are more than one attributes with this name, they will all be removed.

Everything seems to be working as it should, according to the code source for the HtmlAttributeCollection at CodePlex. The Remove function runs it through a loop to remove the values.

Am I doing this incorrectly, or have I discovered a way to deliver a fix for HTMLAgilityPack?

1
2
6/20/2013 5:41:07 PM

Accepted Answer

Confirmed: image.Attributes.Remove just gets rid of the first instance.

Make many calls to Remove as a fast remedy. It performs nothing if it is invoked and the attribute cannot be found.

Perhaps you should inform the creators of the HTMLAgilityPack about this.

1
6/20/2013 5:55:51 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow