How do I remove whitespace in HTML Source with Html Agility Pack and C#

c# html html-agility-pack

Question

I tried the suggestion from this forum before posting:

Remove gaps between markups in HTML source using C#?

A sample of the HTML I'm using is shown below:

<p>This is my text</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>This is next text</p>

HTML Agility Pack is what I'm using to tidy up the HTML:

HtmlDocument doc = new HtmlDocument();
doc.Load(htmlLocation);
foreach (var item in doc.DocumentNode.Descendants("p").ToList())
{
    if (item.InnerHtml == "&nbsp;")
    {
        item.Remove();
    }
}

The code's result from above is

<p>This is my text</p>





<p>This is next text</p>

Therefore, my issue is with removing the unnecessary whitespace in the HTML code between the two paragraphs.

1
2
5/23/2017 11:46:24 AM

Popular Answer

Between the first and final paragraphs, remove all text nodes:

HTML:

var html = @"
<p>This is my text</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>This is next text</p>";

Analyze it:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var paragraphs = doc.DocumentNode.Descendants("p").ToList();
foreach (var item in paragraphs)
{
    if (item.InnerHtml == "&nbsp;") item.Remove();
}
var followingText = paragraphs[0]
    .SelectNodes(".//following-sibling::text()")
    .ToList();
foreach (var text in followingText) 
{
    text.Remove();
}

Result:

<p>This is my text</p><p>This is next text</p>

Use a to maintain the line break between the paragraphs.for call and loopRemove() on each and every text node with the except code.

for (int i = 0; i < followingText.Count - 1; ++i)
{
    followingText[i].Remove();
}

Result:

<p>This is my text</p>
<p>This is next text</p>
2
4/3/2017 5:09:08 AM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow