Html Agility Pack - Replace all html paragraph tags with br

c# html-agility-pack xpath

Question

I am trying to replace <p>example content</p> with example content<br><br>

Here is my current code:

static string replaceParagraphs(string s) // Replace p tags with BR
    {
        HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
        doc.LoadHtml(s);
        doc.OptionWriteEmptyNodes = false;

        HtmlNode linebreak1 = doc.CreateElement("br");
        HtmlNode linebreak2 = doc.CreateElement("br");
        var paragraphTags = doc.DocumentNode.SelectNodes("p");
        for (int i = 0; i < paragraphTags.Count; i++)
        {
            if (i > 0)
            {
                doc.DocumentNode.InsertBefore(linebreak1, paragraphTags[i]);
                doc.DocumentNode.InsertBefore(linebreak2, paragraphTags[i]);
            }
            doc.DocumentNode.InsertBefore(HtmlNode.CreateNode(paragraphTags[i].InnerHtml), paragraphTags[i]);
            paragraphTags[i].ParentNode.RemoveChild(paragraphTags[i]);
        }



        return doc.DocumentNode.OuterHtml;
    }

And here is an example document I am passing to the method:

<div id=JobDetailSection class=details>
  <h1>Admin Officer (Rodney House)</h1>

  <dl>



    <dd><span>Ref: </span>RH/AO/SS</dd>
    <dd><span>Employer: </span>Manchester City Council</dd>
    <dd><span>Location: </span>Rodney House School, Barrass Street, Openshaw, Manchester, M11 1WP</dd>
    <dd><span>Salary: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Salary Grade: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Working Pattern: </span>Part Time, Term Time</dd>
    <dd><span>Working Hours: </span>15 hours per week</dd>
    <dd><span>Contract Type: </span>Temporary</dd>
    <dd><span>Closing date: </span>25/09/2015 23:59</dd>
    <dd><span>Job Type: </span>Administration/Clerical, School Support Staff</dd>
    <dd><span>Interview Date: </span>Tuesday 6th October 2015</dd>


  </dl>
  <hr>
  <div class=description>
    <p>The Governors seek to appoint a well motivated, flexible and enthusiastic Admin Support Assistant to join our committed staff team.</p>
    <p>The successful candidate will be required to provide general clerical admin and finance support to the school and outreach service while the school develops a project for the LA.&nbsp; The successful candidate will also be able to demonstrate high
      standards of literacy, numeracy and ICT skills.&nbsp;</p>
    <p>Rodney House works in close collaboration with Manchester’s Children’s Centres.&nbsp; A commitment to working with our partner settings is essential. Rodney House delivers an Outreach Service which requires the production and maintenance of support
      packages</p>
    <p>All posts are subject to satisfactory references and an enhanced DBS check.
      <br>Prospective candidates need to know that we apply our stringent policy on Safeguarding children when appointing staff to Rodney House.</p>
    <p>Visits to the school are encouraged and welcomed by appointment.</p>
    <p>
      <br>
      <strong>How to apply - information for applicants.<br>
</strong>If you are interested in this vacancy, please download the documents attached.
      <br>Completed applications can be emailed to&nbsp;<a href=mailto:admin@example.com>admin@example.com</a>&nbsp;CV's will not be accepted.</p>
    <p>Closing date: - Friday 25th September at noon. Short listing on the same day.
      <br>Interview Date: Tuesday 6th October 2015</p>
    <p>Only those shortlisted for interview will be informed. No agencies please.</p>
    <p><strong>Equal Opportunities statement<br>
</strong>We are an Equal Opportunities Employer and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual
      orientation.</p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
  </div>
  <p><a class=button print href=Javascript:window.print()>Print</a>
  </p>
</div>

I'am having two issues with the output data first being the fact that multiple br tags are inserted when there should of been two and for some reason when I check the output the last part of the text is missing from the string: and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual

I'am not sure whats causing these issues.

Popular Answer

Since you are using C#, why not process this with XSLT? I know you aren't asking it, but you won't have to deal with all the quirks that you get to deal with when doing this "by hand" node by node:

<xsl:output method="html" />

<!-- boilerplate, identity-template, leaves everything not matched exactly the same -->
<xsl:template match="* | @*">
    <xsl:copy>
        <xsl:apply-templates select="@* | *" />
    </xsl:copy>
</xsl:template>

<!-- the actual business logic, does all you need -->
<xsl:template match="p">
    <xsl:copy-of select="node()" />
    <br /><br />
</xsl:template>

You can use the HtmlAgilityPack to get the HTML as a DOM document node, which you can feed to .NET's XslCompiledTransform.

Sorry, I couldn't readily spot the error in your code above, but that's because I find node manipulation so tedious and comparatively hard to get right, so I try to use simpler solutions ;).




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why