Html Agility Pack - Replace all html paragraph tags with br

c# html-agility-pack xpath

Question

I'm attempting to swap out<p>example content</p> with example content<br><br>

My current code is as follows:

static string replaceParagraphs(string s) // Replace p tags with BR
    {
        HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        HtmlNode.ElementsFlags["br"] = HtmlElementFlag.Empty;
        doc.LoadHtml(s);
        doc.OptionWriteEmptyNodes = false;

        HtmlNode linebreak1 = doc.CreateElement("br");
        HtmlNode linebreak2 = doc.CreateElement("br");
        var paragraphTags = doc.DocumentNode.SelectNodes("p");
        for (int i = 0; i < paragraphTags.Count; i++)
        {
            if (i > 0)
            {
                doc.DocumentNode.InsertBefore(linebreak1, paragraphTags[i]);
                doc.DocumentNode.InsertBefore(linebreak2, paragraphTags[i]);
            }
            doc.DocumentNode.InsertBefore(HtmlNode.CreateNode(paragraphTags[i].InnerHtml), paragraphTags[i]);
            paragraphTags[i].ParentNode.RemoveChild(paragraphTags[i]);
        }



        return doc.DocumentNode.OuterHtml;
    }

A sample document that I am giving to the method is seen below.

<div id=JobDetailSection class=details>
  <h1>Admin Officer (Rodney House)</h1>

  <dl>



    <dd><span>Ref: </span>RH/AO/SS</dd>
    <dd><span>Employer: </span>Manchester City Council</dd>
    <dd><span>Location: </span>Rodney House School, Barrass Street, Openshaw, Manchester, M11 1WP</dd>
    <dd><span>Salary: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Salary Grade: </span>Grade 3 £15,523 to £16,969 per annum pro rata</dd>
    <dd><span>Working Pattern: </span>Part Time, Term Time</dd>
    <dd><span>Working Hours: </span>15 hours per week</dd>
    <dd><span>Contract Type: </span>Temporary</dd>
    <dd><span>Closing date: </span>25/09/2015 23:59</dd>
    <dd><span>Job Type: </span>Administration/Clerical, School Support Staff</dd>
    <dd><span>Interview Date: </span>Tuesday 6th October 2015</dd>


  </dl>
  <hr>
  <div class=description>
    <p>The Governors seek to appoint a well motivated, flexible and enthusiastic Admin Support Assistant to join our committed staff team.</p>
    <p>The successful candidate will be required to provide general clerical admin and finance support to the school and outreach service while the school develops a project for the LA.&nbsp; The successful candidate will also be able to demonstrate high
      standards of literacy, numeracy and ICT skills.&nbsp;</p>
    <p>Rodney House works in close collaboration with Manchester’s Children’s Centres.&nbsp; A commitment to working with our partner settings is essential. Rodney House delivers an Outreach Service which requires the production and maintenance of support
      packages</p>
    <p>All posts are subject to satisfactory references and an enhanced DBS check.
      <br>Prospective candidates need to know that we apply our stringent policy on Safeguarding children when appointing staff to Rodney House.</p>
    <p>Visits to the school are encouraged and welcomed by appointment.</p>
    <p>
      <br>
      <strong>How to apply - information for applicants.<br>
</strong>If you are interested in this vacancy, please download the documents attached.
      <br>Completed applications can be emailed to&nbsp;<a href=mailto:admin@example.com>admin@example.com</a>&nbsp;CV's will not be accepted.</p>
    <p>Closing date: - Friday 25th September at noon. Short listing on the same day.
      <br>Interview Date: Tuesday 6th October 2015</p>
    <p>Only those shortlisted for interview will be informed. No agencies please.</p>
    <p><strong>Equal Opportunities statement<br>
</strong>We are an Equal Opportunities Employer and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual
      orientation.</p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
  </div>
  <p><a class=button print href=Javascript:window.print()>Print</a>
  </p>
</div>

I'm encountering two problems with the output data. The first is that severalbr tags are added when there need to be two, and for some reason, when I look at the result, the last character of the string is missing:and we positively welcome applications from all candidates regardless of age, disability, gender reassignment, marriage and civil partnership, pregnancy and maternity, race, religion or belief, sex and sexual

I don't know what's creating these problems.

1
0
9/9/2015 1:40:03 PM

Popular Answer

Why not use XSLT to process this given that you are using C#? You won't have to deal with all the peculiarities that you have to deal with when doing this "by hand," node by node, even though I know you aren't asking me to:

<xsl:output method="html" />

<!-- boilerplate, identity-template, leaves everything not matched exactly the same -->
<xsl:template match="* | @*">
    <xsl:copy>
        <xsl:apply-templates select="@* | *" />
    </xsl:copy>
</xsl:template>

<!-- the actual business logic, does all you need -->
<xsl:template match="p">
    <xsl:copy-of select="node()" />
    <br /><br />
</xsl:template>

The HTML may be obtained as a DOM document node using the HtmlAgilityPack, which you can then feed to .NET's XslCompiledTransform.

I apologize for not seeing the issue in your code more quickly, but I tend to use simpler solutions since I find node manipulation to be so time-consuming and difficult to get right.

0
9/9/2015 2:06:11 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow