Html Agility Pack - Sostituisci tutti i tag di paragrafo html con br

c# html-agility-pack xpath

Domanda

Sto cercando di sostituire <p>example content</p> con example content<br><br>

Ecco il mio codice attuale:

static string replaceParagraphs(string s) // Replace p tags with BR
    {
        HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
        doc.LoadHtml(s);
        doc.OptionWriteEmptyNodes = false;

        HtmlNode linebreak1 = doc.CreateElement("br");
        HtmlNode linebreak2 = doc.CreateElement("br");
        var paragraphTags = doc.DocumentNode.SelectNodes("p");
        for (int i = 0; i < paragraphTags.Count; i++)
        {
            if (i > 0)
            {
                doc.DocumentNode.InsertBefore(linebreak1, paragraphTags[i]);
                doc.DocumentNode.InsertBefore(linebreak2, paragraphTags[i]);
            }
            doc.DocumentNode.InsertBefore(HtmlNode.CreateNode(paragraphTags[i].InnerHtml), paragraphTags[i]);
            paragraphTags[i].ParentNode.RemoveChild(paragraphTags[i]);
        }



        return doc.DocumentNode.OuterHtml;
    }

E qui c'è un documento di esempio che sto passando al metodo:

<div id=JobDetailSection class=details>
  <h1>Admin Officer (Rodney House)</h1>

  <dl>



    <dd><span>Ref: </span>RH/AO/SS</dd>
    <dd><span>Employer: </span>Manchester City Council</dd>
    <dd><span>Location: </span>Rodney House School, Barrass Street, Openshaw, Manchester, M11 1WP</dd>
    <dd><span>Salary: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Salary Grade: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Working Pattern: </span>Part Time, Term Time</dd>
    <dd><span>Working Hours: </span>15 hours per week</dd>
    <dd><span>Contract Type: </span>Temporary</dd>
    <dd><span>Closing date: </span>25/09/2015 23:59</dd>
    <dd><span>Job Type: </span>Administration/Clerical, School Support Staff</dd>
    <dd><span>Interview Date: </span>Tuesday 6th October 2015</dd>


  </dl>
  <hr>
  <div class=description>
    <p>The Governors seek to appoint a well motivated, flexible and enthusiastic Admin Support Assistant to join our committed staff team.</p>
    <p>The successful candidate will be required to provide general clerical admin and finance support to the school and outreach service while the school develops a project for the LA.&nbsp; The successful candidate will also be able to demonstrate high
      standards of literacy, numeracy and ICT skills.&nbsp;</p>
    <p>Rodney House works in close collaboration with Manchester’s Children’s Centres.&nbsp; A commitment to working with our partner settings is essential. Rodney House delivers an Outreach Service which requires the production and maintenance of support
      packages</p>
    <p>All posts are subject to satisfactory references and an enhanced DBS check.
      <br>Prospective candidates need to know that we apply our stringent policy on Safeguarding children when appointing staff to Rodney House.</p>
    <p>Visits to the school are encouraged and welcomed by appointment.</p>
    <p>
      <br>
      <strong>How to apply - information for applicants.<br>
</strong>If you are interested in this vacancy, please download the documents attached.
      <br>Completed applications can be emailed to&nbsp;<a href=mailto:admin@example.com>admin@example.com</a>&nbsp;CV's will not be accepted.</p>
    <p>Closing date: - Friday 25th September at noon. Short listing on the same day.
      <br>Interview Date: Tuesday 6th October 2015</p>
    <p>Only those shortlisted for interview will be informed. No agencies please.</p>
    <p><strong>Equal Opportunities statement<br>
</strong>We are an Equal Opportunities Employer and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual
      orientation.</p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
  </div>
  <p><a class=button print href=Javascript:window.print()>Print</a>
  </p>
</div>

Sto avendo due problemi con i dati di output in primo luogo essendo il fatto che più tag br sono inseriti quando dovrebbero esserci due e per qualche motivo quando controllo l'output manca l'ultima parte del testo dalla stringa: and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual

Non sono sicuro di cosa abbia causato questi problemi.

Risposta popolare

Dato che stai usando C #, perché non elaborarlo con XSLT? So che non lo stai chiedendo, ma non dovrai affrontare tutte le stranezze che devi affrontare quando fai questo nodo "a mano" per nodo:

<xsl:output method="html" />

<!-- boilerplate, identity-template, leaves everything not matched exactly the same -->
<xsl:template match="* | @*">
    <xsl:copy>
        <xsl:apply-templates select="@* | *" />
    </xsl:copy>
</xsl:template>

<!-- the actual business logic, does all you need -->
<xsl:template match="p">
    <xsl:copy-of select="node()" />
    <br /><br />
</xsl:template>

È possibile utilizzare HtmlAgilityPack per ottenere l'HTML come nodo del documento DOM, che è possibile alimentare con XslCompiledTransform di .NET .

Scusate, non ho individuato facilmente l'errore nel vostro codice qui sopra, ma questo perché trovo la manipolazione del nodo così noiosa e relativamente difficile da correggere, quindi cerco di usare soluzioni più semplici;).



Related

Autorizzato sotto: CC-BY-SA with attribution
Non affiliato con Stack Overflow
È legale questo KB? Sì, impara il perché
Autorizzato sotto: CC-BY-SA with attribution
Non affiliato con Stack Overflow
È legale questo KB? Sì, impara il perché