Replace double quotes inside double quotes html attribute

c# html html-agility-pack regex replace


In certain instances, my customers provide me an HTML string that contains elements with improperly formatted attributes. akin to this

<img src="../imgTest.jpg" alt="Something "quoted here, or here"">

How can I alter these instances to look like this next? dynamically

<img src="../imgTest.jpg" alt="Something 'quoted here, or here'">

I need to work with this HTML rather than have it display in the browser.

I use HtmlAgilityPack to manage html issues, however in the following situations, it modifies my html string in a way that isn't what I want:

<img src="../imgTest.jpg" alt="Something" quoted="" here,="" or="" here="">

Using HTMLAgilityPack, my code is:

var htmlDoc = new HtmlDocument();
htmlDoc.OptionFixNestedTags = true;

var htmlError = htmlDoc.ParseErrors.SafeAny();

if (!htmlError)
    myHtmlStr = htmlDoc.DocumentNode.InnerHtml;
2/29/2016 3:48:19 PM

Accepted Answer

My plan is to pair a" if it is not an attribute qualifier and is included inside a tag.

This technique may not be 100% effective (it will need adaption in the event that namespaces are added to the names of elements or attributes), but it should be effective when a tag name follows the element name.< Attribute value qualifiers are immediately enclosed in double quotes, and there are no< within attributes, symbols



then replace with' .

look at the demo regex.

The first lookbehind verifies that we are looking for a double quote within a tag, the second one fails the match if a word is immediately preceded by an equal sign, and the negative lookahead fails the match if the double quote is followed by whitespaces followed by a closing angle bracket (likely preceded by the forward slash) or when there are whitespaces followed by a word that is immediately preceded by an equal sign.

2/29/2016 4:42:17 PM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow