How can I update an HTML snippet using HTML Agility Pack?

c# html-agility-pack

Question

So I want to use C# to change a piece of HTML.

<div>
This is a specialSearchWord that I want to link to
<img src="anImage.jpg" />
<a href="foo.htm">A hyperlink</a>
Some more text and that specialSearchWord again.
</div>

and I want to change it to be like this:

<div>
This is a <a class="special" href="http://mysite.com/search/specialSearchWord">specialSearchWord</a> that I want to link to
<img src="anImage.jpg" />
<a href="foo.htm">A hyperlink</a>
Some more text and that <a class="special" href="http://mysite.com/search/specialSearchWord">specialSearchWord</a> again.
</div>

Based on the numerous suggestions made here, I'm going to employ HTML Agility Pack, but I have no idea where I'm headed. more specifically

  1. How can I load a portion of an HTML page as a string rather than the whole thing?
  2. How do you edit?
  3. The text string of the changed object must then be returned, but how?
1
16
3/1/2012 5:25:21 PM

Accepted Answer

  1. equivalent to a whole HTML document. It is irrelevant.
  2. There are two possibilities: editingInnerHtml direct ownership (orText on text nodes) or changing the dom tree using tools likeAppendChild , PrependChild etc.
  3. You could utilizeHtmlDocument.DocumentNode.OuterHtml use or propertyHtmlDocument.Save method (personally I prefer the second option).

In terms of parsing, I pick out the text nodes inside of yourdiv , and just usestring.Replace a technique to replace it

var doc = new HtmlDocument();
doc.LoadHtml(html);
var textNodes = doc.DocumentNode.SelectNodes("/div/text()[contains(.,'specialSearchWord')]");
if (textNodes != null)
    foreach (HtmlTextNode node in textNodes)
        node.Text = node.Text.Replace("specialSearchWord", "<a class='special' href='http://mysite.com/search/specialSearchWord'>specialSearchWord</a>");

Likewise, save the outcome as a string:

string result = null;
using (StringWriter writer = new StringWriter())
{
    doc.Save(writer);
    result = writer.ToString();
}
26
3/1/2012 8:08:03 PM

Popular Answer

Answers:

  1. I'm not sure how to accomplish that, but there could be a method. I advise downloading the complete file.
  2. Use XPath and regular expressions together.
  3. For a fabricated example, see the code below. Although this code example is just one possible set of restrictions, it should help you get started.

Be aware that finding the desired div may need a more complicated Xpath statement.

HtmlDocument doc = new HtmlDocument();

doc.Load(yourHtmlFile);
HtmlNode divNode = doc.DocumentNode.SelectSingleNode("//div[2]");
string newDiv = Regex.Replace(divNode.InnerHtml, @"specialSearchWord", 
"<a class='special' href='http://etc'>specialSearchWord</a>");
divNode.InnerHtml = newDiv;
Console.WriteLine(doc.DocumentNode.OuterHtml);


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow