HtmlAgilityPack replace paragraph tags with linebreaks

c# html-agility-pack html-parsing

Question

I'm attempting to replace all paragraph tags with two linebreak tags using HTMLAgilityPack since the third part export program we use will not correctly display paragraph tags (does not include the additional line between paragraphs).

Here is what I currently have.

// Shortened for this example
string rawHtml = "<p><strong><span>1.0 Purpose</span></strong></p><p><span>The role</span></p><p><span>NOTE: Defined...</span></p>";

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
doc.LoadHtml(rawHtml);
doc.OptionWriteEmptyNodes = true;

// Updated using suggestion from Petr
HtmlNode linebreak = doc.CreateElement("br"); 
var paragraphTags = doc.DocumentNode.SelectNodes("p");
for (int i = 0; i < paragraphTags.Count; i++)
{
    HtmlNode childNode = HtmlNode.CreateNode(paragraphTags[i].InnerHtml);
    HtmlNode nextNode = paragraphTags[i];

    if (i > 0)
    {
        nextNode = doc.DocumentNode.InsertAfter(linebreak, nextNode);
        nextNode = doc.DocumentNode.InsertAfter(linebreak, nextNode);
    }
    doc.DocumentNode.InsertAfter(childNode, nextNode);
    paragraphTags[i].Remove();
}

The paragraph tag is removed, but just one line break is produced. To go as far as I have, I have scoured the internet, but nothing appears to work.

OuterHtml appears as follows.

<strong><span>1.0 Purpose</span></strong><br /><span>The role</span><br /><span>NOTE: Defined...</span>

Do you know what I'm doing incorrectly? I get the impression that a simpler method must exist.

1
1
6/15/2012 2:52:50 PM

Accepted Answer

understood it. Give Simon and Petr points for their ideas. I appeared to require two separate linebreak nodes, and that was the key.

string rawHtml = "<p><strong><span>1.0 Purpose</span></strong></p><p><span>The role</span></p><p><span>NOTE: Defined...</span></p>";

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
doc.LoadHtml(rawHtml);
doc.OptionWriteEmptyNodes = true;

HtmlNode linebreak1 = doc.CreateElement("br");
HtmlNode linebreak2 = doc.CreateElement("br");
var paragraphTags = doc.DocumentNode.SelectNodes("p");
for (int i = 0; i < paragraphTags.Count; i++)
{
    if (i > 0)
    {
        doc.DocumentNode.InsertBefore(linebreak1, paragraphTags[i]);
        doc.DocumentNode.InsertBefore(linebreak2, paragraphTags[i]);
    }
    doc.DocumentNode.InsertBefore(HtmlNode.CreateNode(paragraphTags[i].InnerHtml), paragraphTags[i]);
    paragraphTags[i].ParentNode.RemoveChild(paragraphTags[i]);
}
6
6/15/2012 3:14:45 PM

Popular Answer

Does using it help?

HtmlNode linebreak = doc.CreateElement("br");

linebreak node to be created?



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow