Pack d'agilité HTML - Remplacez toutes les balises de paragraphe html par br

c# html-agility-pack xpath

Question

J'essaie de remplacer un <p>example content</p> par un example content<br><br>

Voici mon code actuel:

static string replaceParagraphs(string s) // Replace p tags with BR
    {
        HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
        doc.LoadHtml(s);
        doc.OptionWriteEmptyNodes = false;

        HtmlNode linebreak1 = doc.CreateElement("br");
        HtmlNode linebreak2 = doc.CreateElement("br");
        var paragraphTags = doc.DocumentNode.SelectNodes("p");
        for (int i = 0; i < paragraphTags.Count; i++)
        {
            if (i > 0)
            {
                doc.DocumentNode.InsertBefore(linebreak1, paragraphTags[i]);
                doc.DocumentNode.InsertBefore(linebreak2, paragraphTags[i]);
            }
            doc.DocumentNode.InsertBefore(HtmlNode.CreateNode(paragraphTags[i].InnerHtml), paragraphTags[i]);
            paragraphTags[i].ParentNode.RemoveChild(paragraphTags[i]);
        }



        return doc.DocumentNode.OuterHtml;
    }

Et voici un exemple de document que je passe à la méthode:

<div id=JobDetailSection class=details>
  <h1>Admin Officer (Rodney House)</h1>

  <dl>



    <dd><span>Ref: </span>RH/AO/SS</dd>
    <dd><span>Employer: </span>Manchester City Council</dd>
    <dd><span>Location: </span>Rodney House School, Barrass Street, Openshaw, Manchester, M11 1WP</dd>
    <dd><span>Salary: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Salary Grade: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Working Pattern: </span>Part Time, Term Time</dd>
    <dd><span>Working Hours: </span>15 hours per week</dd>
    <dd><span>Contract Type: </span>Temporary</dd>
    <dd><span>Closing date: </span>25/09/2015 23:59</dd>
    <dd><span>Job Type: </span>Administration/Clerical, School Support Staff</dd>
    <dd><span>Interview Date: </span>Tuesday 6th October 2015</dd>


  </dl>
  <hr>
  <div class=description>
    <p>The Governors seek to appoint a well motivated, flexible and enthusiastic Admin Support Assistant to join our committed staff team.</p>
    <p>The successful candidate will be required to provide general clerical admin and finance support to the school and outreach service while the school develops a project for the LA.&nbsp; The successful candidate will also be able to demonstrate high
      standards of literacy, numeracy and ICT skills.&nbsp;</p>
    <p>Rodney House works in close collaboration with Manchester’s Children’s Centres.&nbsp; A commitment to working with our partner settings is essential. Rodney House delivers an Outreach Service which requires the production and maintenance of support
      packages</p>
    <p>All posts are subject to satisfactory references and an enhanced DBS check.
      <br>Prospective candidates need to know that we apply our stringent policy on Safeguarding children when appointing staff to Rodney House.</p>
    <p>Visits to the school are encouraged and welcomed by appointment.</p>
    <p>
      <br>
      <strong>How to apply - information for applicants.<br>
</strong>If you are interested in this vacancy, please download the documents attached.
      <br>Completed applications can be emailed to&nbsp;<a href=mailto:admin@example.com>admin@example.com</a>&nbsp;CV's will not be accepted.</p>
    <p>Closing date: - Friday 25th September at noon. Short listing on the same day.
      <br>Interview Date: Tuesday 6th October 2015</p>
    <p>Only those shortlisted for interview will be informed. No agencies please.</p>
    <p><strong>Equal Opportunities statement<br>
</strong>We are an Equal Opportunities Employer and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual
      orientation.</p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
  </div>
  <p><a class=button print href=Javascript:window.print()>Print</a>
  </p>
</div>

Je suis à deux problèmes avec les données de sortie étant d' abord le fait que plusieurs br balises sont insérés quand il doit bien être deux et pour une raison quelconque quand je vérifie la sortie de la dernière partie du texte est absent de la chaîne: and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual

Je ne sais pas ce qui cause ces problèmes.

Réponse populaire

Puisque vous utilisez C #, pourquoi ne pas traiter cela avec XSLT? Je sais que vous ne le demandez pas, mais vous n'aurez pas à gérer toutes les bizarreries que vous rencontrez lorsque vous faites cela "à la main" noeud par noeud:

<xsl:output method="html" />

<!-- boilerplate, identity-template, leaves everything not matched exactly the same -->
<xsl:template match="* | @*">
    <xsl:copy>
        <xsl:apply-templates select="@* | *" />
    </xsl:copy>
</xsl:template>

<!-- the actual business logic, does all you need -->
<xsl:template match="p">
    <xsl:copy-of select="node()" />
    <br /><br />
</xsl:template>

Vous pouvez utiliser HtmlAgilityPack pour obtenir le code HTML en tant que nœud de document DOM que vous pouvez alimenter en XslCompiledTransform de .NET .

Désolé, je ne pouvais pas facilement repérer l'erreur dans votre code ci-dessus, mais c'est parce que je trouve la manipulation de nœud si fastidieuse et relativement difficile à résoudre, j'essaie donc d'utiliser des solutions plus simples;).




Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow
Est-ce KB légal? Oui, apprenez pourquoi
Sous licence: CC-BY-SA with attribution
Non affilié à Stack Overflow
Est-ce KB légal? Oui, apprenez pourquoi