Problème: J'ai besoin d'examiner certains éléments HTML à l'aide de HtmlAgilityPack et de combiner les noms de balises. Est-il possible d'extraire chaque balise, du parent à l'enfant, en la remplaçant par une étendue ayant une classe portant le nom «strikeUEmStrong» En outre, le nom change en fonction de l'élément HTML.
L'ordre du nom de la classe importe en fait, je l'ai compris par essais et erreurs. Dans la mesure où il est capable d’obtenir tous les éléments et de les combiner. Il est très possible qu'il comporte plusieurs nœuds de texte avec différents niveaux de formatage.
Cela affectera plusieurs paragraphes.
Par exemple, si j'ai ce code HTML:
<p>
<strike><u><em><strong>four styles</strong></em></u></strike></p>
Comment puis-je le convertir en ceci:
<p>
<span class="strikeUEmStrong">four styles</span></p>
Il est également possible d’avoir ce type de code:
<p>
<strike><u><em><strong>four styles</strong></em></u></strike> <strike><u><em>three styles</em></u></strike></p>
<p>
<em><strong>two styles</strong></em></p>
La sortie devrait ressembler à ceci:
<p>
<span class="strikeUEmStrong">four styles</span> <span class="strikeUEm">three styles<span></p><p><span class="emStrong">two styles<span></p>
Prototype:
'Retrive the class name of each format node
Function GetClassName(ByVal n As HtmlNode) As String
Dim ret As String = String.Empty
If (n.Name <> "#text") And (n.Name <> "p") Then
ret = n.Name + " "
End If
'Get the next node
For Each child As HtmlNode In n.ChildNodes
ret &= GetClassName(child)
Next
Return ret
End Function
'Create a list of class names
Function GetClassNameList(ByVal classNameList As String) As List(Of String)
Dim ret As New List(Of String)
Dim classArr() As String = classNameList.Split(" ")
For Each className As String In classArr
ret.Add(className)
Next
Return ret
End Function
'Sort a list of class names and return a merged class string
Function GetSortedClassNameString(ByVal classList As List(Of String)) As String
Dim sortedMergedClass As String = String.Empty
classList.Sort()
For Each className As String In classList
sortedMergedClass &= className
Next
Return sortedMergedClass
End Function
'Lets point to the body node
Dim bodyNode As HtmlNode = htmlDoc.DocumentNode.SelectSingleNode("//body")
'Lets create some generic nodes
Dim currPNode As HtmlNode
Dim formatNodes As HtmlNodeCollection
Dim text As String = String.Empty
Dim textSize As Integer = 0
'Make sure the editor has something in it
If editorText <> "" Then
'Send the text from the editor to the body node
If bodyNode IsNot Nothing Then
bodyNode.InnerHtml = editorText
End If
Dim pNode = bodyNode.SelectNodes("//p")
Dim span As HtmlNode = htmlDoc.CreateElement("span")
Dim tmpBody As HtmlNode = htmlDoc.CreateElement("body")
Dim textNode As HtmlNode = htmlDoc.CreateTextNode
Dim pCount As Integer = bodyNode.SelectNodes("//body/p").Count - 1
For childCountP As Integer = 0 To pCount
Dim paragraph = HtmlNode.CreateNode(htmlDoc.CreateElement("p").WriteTo)
'Which paragraph I am at.
currPNode = pNode.Item(childCountP)
'For this paragraph get me the collection of html nodes
formatNodes = currPNode.ChildNodes
'Count how many Format nodes we have in a paragraph
Dim formatCount As Integer = currPNode.ChildNodes.Count - 1
'Go through each node and examine the elements.
'Then look at the markup to create classes and then group them under one span
For child As Integer = 0 To formatCount
'Iterate through the formateNodes, strike, em, strong, etc.
Dim currFormatNode = HtmlNode.CreateNode(formatNodes(child).WriteTo)
'TODO: Handle nested images and links? How do we know what to rip out?
'First check for format nodes
'Note, we can't let it use everything because it will change nested elements as well. I.E. span within span.
If (currFormatNode.Name <> "#text") And (currFormatNode.Name = "strike") Or (currFormatNode.Name = "em") _
Or (currFormatNode.Name = "strong") Or (currFormatNode.Name = "u") Or (currFormatNode.Name = "sub") _
Or (currFormatNode.Name = "sup") Or (currFormatNode.Name = "b") Then
'strip all tags, just take the inner text
span.InnerHtml = currFormatNode.InnerText
'Create a text node with text from the lowest node
textNode = htmlDoc.CreateTextNode(span.InnerText)
'Recursively go through the format nodes
'Create a list from the string
'Then sort the list and return a string
'Appending the class to the span
span.SetAttributeValue("class", GetSortedClassNameString(GetClassNameList(GetClassName(currFormatNode).Trim())))
'Attach the span before the current format node
currFormatNode.ParentNode.InsertBefore(span, currFormatNode)
'Remove the formatted children leaving the above node
currFormatNode.ParentNode.ChildNodes.Remove(currFormatNode)
'We need to build a paragraph here
paragraph.InnerHtml &= span.OuterHtml
'Lets output something for debugging
childNodesTxt.InnerText &= span.OuterHtml
Else 'handle #text and other nodes seperately
'We need to build a paragraph here
paragraph.InnerHtml &= span.OuterHtml
textNode = htmlDoc.CreateTextNode(currFormatNode.InnerHtml)
'Lets output something for debugging
childNodesTxt.InnerText &= textNode.OuterHtml
End If
Next
'End of formats
'Start adding the new paragraph's to the body node
tmpBody.AppendChild(paragraph)
Next
'End of paragraphs
'Clean out body first and replace with new elements
htmlDoc.DocumentNode.SelectSingleNode("//body").Remove()
'Update our body
htmlDoc.DocumentNode.SelectSingleNode("//html").AppendChild(tmpBody)
End If
htmlDoc.Save(Server.MapPath("html\editor.html"))
End If
Sortie:
<span class="strikeuemstrong">four styles</span>
Enfin, obtenir le bon résultat après avoir résolu le problème de la commande. Merci pour l'aide.
Ce n'est pas une question simple à laquelle répondre. Je vais décrire comment j'écrirais l'algorithme pour le faire et inclure un pseudo-code pour vous aider.
Pseudo-code. Veuillez excuser les fautes de frappe car je tape ceci à la volée.
public string GetClassName(Node n)
{
var ret = n.TagName;
foreach(var child in n.ChildNodes)
{
ret += GetClassName(child);
}
return ret;
}
foreach(var p in paragraphs)
{
foreach(var child in p.ChildNodes)
{
var span = new Span();
span.InnerText = child.InnerText; // strip all tags, just take the inner text
span.ClassName = GetClassName(child);
child.ReplaceWith(span); // note: if you do this with a FOREACH and not a for loop, it'll blow up C# for modifying the collection while iterating. Use for loops. if you're going to do "active" replacement like in this pseudo code
}
}
Je serais heureux de modifier ma réponse une fois que j'aurais plus de contexte. Veuillez revoir ce que je suggère et commenter avec plus de contexte si vous avez besoin de moi pour affiner ma suggestion. Sinon, j'espère que cela vous donnera ce dont vous avez besoin :)