I need to parse HTML for images and replace those tags with a new tag that just has the text that is contained in the alternate. Can someone show how to do this with both linq and standard usage?
I've using a linq sample now to replace paragraphs. But, I'm getting a read only error.
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
IEnumerable<HtmlNode> paragraphs = doc.DocumentNode.DescendantNodes().Where(p => p.Name.ToLower() == "p");
foreach (HtmlNode p in paragraphs)
{
p.InnerText = "Hello World";
}
Also is there an easy way to transfer it back to text? I.e Output as string
This is what I have which doesn't work
MemoryStream outStream = new MemoryStream();
doc.Save(outStream);
outStream.Seek(0, SeekOrigin.Begin);
StreamReader reader = new StreamReader( outStream );
string text = reader.ReadToEnd();
var images = doc.DocumentNode.SelectNodes("//img");
if (images != null)
{
foreach (HtmlNode image in images)
{
var alt = image.GetAttributeValue("alt", "");
var nodeForReplace = HtmlTextNode.CreateNode(alt);
image.ParentNode.ReplaceChild(nodeForReplace, image);
}
}
var sb = new StringBuilder();
using (var writer = new StringWriter(sb))
{
doc.Save(writer);
}