Html Agility Pack - Reemplaza todas las etiquetas de párrafos html con br

c# html-agility-pack xpath

Pregunta

Estoy tratando de reemplazar <p>example content</p> con example content<br><br>

Aquí está mi código actual:

static string replaceParagraphs(string s) // Replace p tags with BR
    {
        HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
        doc.LoadHtml(s);
        doc.OptionWriteEmptyNodes = false;

        HtmlNode linebreak1 = doc.CreateElement("br");
        HtmlNode linebreak2 = doc.CreateElement("br");
        var paragraphTags = doc.DocumentNode.SelectNodes("p");
        for (int i = 0; i < paragraphTags.Count; i++)
        {
            if (i > 0)
            {
                doc.DocumentNode.InsertBefore(linebreak1, paragraphTags[i]);
                doc.DocumentNode.InsertBefore(linebreak2, paragraphTags[i]);
            }
            doc.DocumentNode.InsertBefore(HtmlNode.CreateNode(paragraphTags[i].InnerHtml), paragraphTags[i]);
            paragraphTags[i].ParentNode.RemoveChild(paragraphTags[i]);
        }



        return doc.DocumentNode.OuterHtml;
    }

Y aquí hay un documento de ejemplo que estoy pasando al método:

<div id=JobDetailSection class=details>
  <h1>Admin Officer (Rodney House)</h1>

  <dl>



    <dd><span>Ref: </span>RH/AO/SS</dd>
    <dd><span>Employer: </span>Manchester City Council</dd>
    <dd><span>Location: </span>Rodney House School, Barrass Street, Openshaw, Manchester, M11 1WP</dd>
    <dd><span>Salary: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Salary Grade: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Working Pattern: </span>Part Time, Term Time</dd>
    <dd><span>Working Hours: </span>15 hours per week</dd>
    <dd><span>Contract Type: </span>Temporary</dd>
    <dd><span>Closing date: </span>25/09/2015 23:59</dd>
    <dd><span>Job Type: </span>Administration/Clerical, School Support Staff</dd>
    <dd><span>Interview Date: </span>Tuesday 6th October 2015</dd>


  </dl>
  <hr>
  <div class=description>
    <p>The Governors seek to appoint a well motivated, flexible and enthusiastic Admin Support Assistant to join our committed staff team.</p>
    <p>The successful candidate will be required to provide general clerical admin and finance support to the school and outreach service while the school develops a project for the LA.&nbsp; The successful candidate will also be able to demonstrate high
      standards of literacy, numeracy and ICT skills.&nbsp;</p>
    <p>Rodney House works in close collaboration with Manchester’s Children’s Centres.&nbsp; A commitment to working with our partner settings is essential. Rodney House delivers an Outreach Service which requires the production and maintenance of support
      packages</p>
    <p>All posts are subject to satisfactory references and an enhanced DBS check.
      <br>Prospective candidates need to know that we apply our stringent policy on Safeguarding children when appointing staff to Rodney House.</p>
    <p>Visits to the school are encouraged and welcomed by appointment.</p>
    <p>
      <br>
      <strong>How to apply - information for applicants.<br>
</strong>If you are interested in this vacancy, please download the documents attached.
      <br>Completed applications can be emailed to&nbsp;<a href=mailto:admin@example.com>admin@example.com</a>&nbsp;CV's will not be accepted.</p>
    <p>Closing date: - Friday 25th September at noon. Short listing on the same day.
      <br>Interview Date: Tuesday 6th October 2015</p>
    <p>Only those shortlisted for interview will be informed. No agencies please.</p>
    <p><strong>Equal Opportunities statement<br>
</strong>We are an Equal Opportunities Employer and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual
      orientation.</p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
  </div>
  <p><a class=button print href=Javascript:window.print()>Print</a>
  </p>
</div>

Tengo dos problemas, ya que los datos de salida son, en primer lugar, el hecho de que se insertan varias etiquetas br cuando debería haber dos y, por alguna razón, cuando verifico la salida, falta la última parte del texto de la cadena: and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual

No estoy seguro de lo que está causando estos problemas.

Respuesta popular

Ya que está utilizando C #, ¿por qué no procesa esto con XSLT? Sé que no lo estás preguntando, pero no tendrás que lidiar con todos los caprichos con los que te enfrentarás al hacer esto "mano a mano" nodo por nodo:

<xsl:output method="html" />

<!-- boilerplate, identity-template, leaves everything not matched exactly the same -->
<xsl:template match="* | @*">
    <xsl:copy>
        <xsl:apply-templates select="@* | *" />
    </xsl:copy>
</xsl:template>

<!-- the actual business logic, does all you need -->
<xsl:template match="p">
    <xsl:copy-of select="node()" />
    <br /><br />
</xsl:template>

Puede usar HtmlAgilityPack para obtener el HTML como un nodo de documento DOM, que puede XslCompiledTransform de .NET .

Lo siento, no pude detectar fácilmente el error en su código anterior, pero eso es porque me parece que la manipulación de nodos es tediosa y comparativamente difícil de corregir, por lo que trato de usar soluciones más simples;).



Licencia bajo: CC-BY-SA with attribution
No afiliado con Stack Overflow
¿Es esto KB legal? Sí, aprende por qué
Licencia bajo: CC-BY-SA with attribution
No afiliado con Stack Overflow
¿Es esto KB legal? Sí, aprende por qué