How to replace Html Comment tags with string.Empty

c# html html-agility-pack

Question

The following is the HTMLNode selection code from my C# file that I am using to attempt to remove all of the HTML comment tags from my htmlNode.

HtmlNode table = doc5.DocumentNode.SelectSingleNode("//div[@id='div12']");

returned the pseudo-markup code for HtmlNode below.

<table>
  <tr>
    <td>test</td>
    <td>
      <!-- <a href='url removed' >Test link Test 2 Comment </a> -->
    </td>
  </tr>
</table>​

I was able to adjust Regular expression to fix the problem; the c# code is below. However, it only functioned properly during my test run and with a string input type. View the following c# code.

string rkr;
rkr = "<!-- <a href='url removed' >Test link Test 2 Comment </a> -->";
rkr = Regex.Replace(rkr, @"(\<!--\s*.*?((--\>)|$))",String.Empty);

Outcome = ". It is what I want for every tag in the live run.

On forums like the stackoverflow forum, I have seen a lot of code samples. But nothing comes close to what I'm looking for. One article was quite helpful, however it was for php, thus it was useless.

Now, if I input

rkr = Regex.Replace(table, @"(\<!--\s*.*?((--\>)|$))",String.Empty);

I encounter the next error

The best overloaded method match for 'System.Text.RegularExpressions.Regex.Replace(string, System.Text.RegularExpressions.MatchEvaluator, int)' has some invalid arguments

I also made an effort to change

rkr = Regex.Replace(table.ToString(), @"(\<!--\s*.*?((--\>)|$))",String.Empty);

However, I then get the return value rkr = "HtmlAgilityPack.HtmlNode".

Any assistance is much appreciated.

1
1
6/19/2015 9:01:15 AM

Accepted Answer

I appreciate all of your assistance. I discovered the answer in the function below.

After filling up the doc5, just call the method as shown below.

HtmlNode table = doc5.DocumentNode.SelectSingleNode("//div[@id='div12']");

RemoveComments(table);

public static void RemoveComments(HtmlNode node)
{
    foreach (var n in node.ChildNodes.ToArray())
        RemoveComments(n);
    if (node.NodeType == HtmlNodeType.Comment)
        node.Remove();
}

For reference: I discovered the solution in the subsequent article. How to use HTMLAgilityPack to choose node types that are HtmlNodeType Comment

highly detailed and includes a variety of sample kinds, just what I needed.

0
5/23/2017 12:24:21 PM

Popular Answer

Response: here

doc5.DocumentNode.Descendants()
    .Where(n => n.NodeType == HtmlAgilityPack.HtmlNodeType.Comment)
    .ToList()
    .ForEach(n => n.Remove());

Because you cannot alter the sequence that you are enumerating, ToList is required.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow