Only replace quote marks in HTML tags using regular expressions.

asp.net c# html-agility-pack regex

Question

I have the string below:

<div id="mydiv">This is a "div" with quotation marks</div>

The following is what I want regular expressions to return:

<div id='mydiv'>This is a "div" with quotation marks</div>

Observe how the div's id property is now enclosed in apostrophes.

How can I use a regular expression to do this?

Edit: I'm not trying to find a solution that can handle every possible edge circumstance. Regex should be used with caution when parsing HTML, however in this situation and for my specific purpose, regex IS the answer. I simply need a little assistance finding the appropriate phrase.

Edit #2: Jens assisted me in finding a solution, however anybody who happens to stumble onto this website should consider utilizing this method very carefully. It works in my situation since I am quite comfortable with the kind of strings I'll be working with. I will ensure that you are aware of the risks and hazards. If you're unsure if you know, it's likely that you don't and that you shouldn't utilize this strategy. You've been made aware.

1
3
1/16/2013 4:38:32 PM

Accepted Answer

One method for doing this is as follows: I believe you want to replace each occurrence of" which is between a and< and a> with ' .

Thus, you search for each" Look behind you in your file for a< , as well as a> . The regex seems to be:

(?<=\<[^<>]*)"(?=[^><]*\>)

You may change the characters that were discovered to your taste, perhaps by usingRegex.Replace .

Although I find the Stack Overflow community to be quite kind and supportive, I believe that the responses to these Regex/HTML queries are often a bit too hostile. Since nothing else is being matched by this regex, it does not specifically inquire "What regex matches all valid HTML."

2
3/19/2012 2:01:44 AM

Popular Answer

I see that you are aware of the risks involved in doing these kind of substitutions using Regex. If you want to have a solution that will continue to function even when the input docs change, I've included the following response for those looking for a much more "stable" technique.

This works while using the HTML Agility Pack (undertaking page, nuget):

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("your html here"); 
// or doc.Load(stream);

var nodes = doc.DocumentNode.DescendantNodes();

foreach (var node in nodes)
{
    foreach (var att in node.Attributes)
    {
         att.QuoteType = AttributeValueQuote.SingleQuote;
    }
}

var fixedText = doc.DocumentNode.OuterHtml;
//doc.Save(/* stream */);


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow