HtmlAgilityPack用換行符替換段落標記

c# html-agility-pack html-parsing

我們使用的第三部分導出應用程序將無法正確呈現段落標記(不包括段落之間的額外行),因此我嘗試使用HtmlAgilityPack將所有段落標記替換為兩個換行標記。

這是我到目前為止所擁有的......

// Shortened for this example
string rawHtml = "<p><strong><span>1.0 Purpose</span></strong></p><p><span>The role</span></p><p><span>NOTE: Defined...</span></p>";

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
doc.LoadHtml(rawHtml);
doc.OptionWriteEmptyNodes = true;

// Updated using suggestion from Petr
HtmlNode linebreak = doc.CreateElement("br"); 
var paragraphTags = doc.DocumentNode.SelectNodes("p");
for (int i = 0; i < paragraphTags.Count; i++)
{
    HtmlNode childNode = HtmlNode.CreateNode(paragraphTags[i].InnerHtml);
    HtmlNode nextNode = paragraphTags[i];

    if (i > 0)
    {
        nextNode = doc.DocumentNode.InsertAfter(linebreak, nextNode);
        nextNode = doc.DocumentNode.InsertAfter(linebreak, nextNode);
    }
    doc.DocumentNode.InsertAfter(childNode, nextNode);
    paragraphTags[i].Remove();
}

它確實刪除了段落標記,但只渲染了一個換行符。我已經搜索了互聯網以獲得盡可能多的東西,但似乎沒有任何效果。

OuterHtml看起來像這樣....

// Shortened for this example
string rawHtml = "<p><strong><span>1.0 Purpose</span></strong></p><p><span>The role</span></p><p><span>NOTE: Defined...</span></p>";

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
doc.LoadHtml(rawHtml);
doc.OptionWriteEmptyNodes = true;

// Updated using suggestion from Petr
HtmlNode linebreak = doc.CreateElement("br"); 
var paragraphTags = doc.DocumentNode.SelectNodes("p");
for (int i = 0; i < paragraphTags.Count; i++)
{
    HtmlNode childNode = HtmlNode.CreateNode(paragraphTags[i].InnerHtml);
    HtmlNode nextNode = paragraphTags[i];

    if (i > 0)
    {
        nextNode = doc.DocumentNode.InsertAfter(linebreak, nextNode);
        nextNode = doc.DocumentNode.InsertAfter(linebreak, nextNode);
    }
    doc.DocumentNode.InsertAfter(childNode, nextNode);
    paragraphTags[i].Remove();
}

知道我做錯了什麼嗎?我覺得有一個更簡單的方法,是嗎?

一般承認的答案

弄清楚了。向Petr和Simon提出建議。關鍵似乎是我需要兩個不同的換行節點。

string rawHtml = "<p><strong><span>1.0 Purpose</span></strong></p><p><span>The role</span></p><p><span>NOTE: Defined...</span></p>";

HtmlDocument doc = new HtmlDocument();
HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
doc.LoadHtml(rawHtml);
doc.OptionWriteEmptyNodes = true;

HtmlNode linebreak1 = doc.CreateElement("br");
HtmlNode linebreak2 = doc.CreateElement("br");
var paragraphTags = doc.DocumentNode.SelectNodes("p");
for (int i = 0; i < paragraphTags.Count; i++)
{
    if (i > 0)
    {
        doc.DocumentNode.InsertBefore(linebreak1, paragraphTags[i]);
        doc.DocumentNode.InsertBefore(linebreak2, paragraphTags[i]);
    }
    doc.DocumentNode.InsertBefore(HtmlNode.CreateNode(paragraphTags[i].InnerHtml), paragraphTags[i]);
    paragraphTags[i].ParentNode.RemoveChild(paragraphTags[i]);
}

熱門答案

如果你使用它會有幫助嗎?

HtmlNode linebreak = doc.CreateElement("br");

創建linebreak節點?




許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因
許可下: CC-BY-SA with attribution
不隸屬於 Stack Overflow
這個KB合法嗎? 是的,了解原因