This is the first time I am using Html Agility Pack and facing problems straight away.
Just as my title suggest I want to get entire element as string including inner elements.
So for example below is my html and I am searching for a form element with id aspnetForm
<html>
<head>
</head>
<body>
<form name="aspnetForm" id="aspnetForm">
<div id="div1">
<a href="div1-a1">Link 1 inside div1</a>
<a href="div1-a2">Link 2 inside div1</a>
</div>
<a href="a3">Link 3 outside all divs</a>
<div id="div2">
<a href="div2-a1">Link 1 inside div2</a>
<a href="div2-a2">Link 2 inside div2</a>
</div>
</form>
</body>
</html>
I want the following to be the output (in string)
<form name="aspnetForm" id="aspnetForm">
<div id="div1">
<a href="div1-a1">Link 1 inside div1</a>
<a href="div1-a2">Link 2 inside div1</a>
</div>
<a href="a3">Link 3 outside all divs</a>
<div id="div2">
<a href="div2-a1">Link 1 inside div2</a>
<a href="div2-a2">Link 2 inside div2</a>
</div>
</form>
I usually do not like to ask such spoon-feeding questions but I have been trying and searching but couldnt get an answer.
Please help!
Thanks in advance!
Seems you're looking for HtmlNode.OuterHtml
:
//
// Summary:
// Gets or Sets the object and its content in HTML.
public virtual string OuterHtml { get; }
So you just have to select your form node and get its OuterHtml property:
HtmlDocument doc = ... // load your HTML
HtmlNode formNode = doc.DocumentNode.SelectSingleNode("//form[@id='aspnetForm']");
string entireElementAsString = formNode.OuterHtml;
UPDATE
It seems there's a very old bug with how HAP treats form
tags. Or maybe it's a feature!
In any case, here's a workaround:
HtmlNode.ElementsFlags.Remove("form");
So this should work:
HtmlNode.ElementsFlags.Remove("form");
HtmlDocument doc = ... // load your HTML
HtmlNode formNode = doc.DocumentNode.SelectSingleNode("//form[@id='aspnetForm']");
string entireElementAsString = formNode.OuterHtml;
Indeed good question, weird enough all the following fails !
Using HtmlAgilityPack - not able yet to come up with a solution!
(note that I use the nuget library ScraySharp as well, to get the Css selectors extension (ScrapySharp.Extensions)
string html = @"<html>
<head>
</head>
<body>
<form name='aspnetForm' id='aspnetForm'>
<div id='div1'>
<a href='div1-a1'>Link 1 inside div1</a>
<a href='div1-a2'>Link 2 inside div1</a>
</div>
<a href='a3'>Link 3 outside all divs</a>
<div id='div2'>
<a href='div2-a1'>Link 1 inside div2</a>
<a href='div2-a2'>Link 2 inside div2</a>
</div>
</form>
</body>
</html>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
string result = string.Empty;
var formElement = doc.DocumentNode.CssSelect("form").FirstOrDefault();
var formChildren = formElement.Descendants();
StringBuilder sb = new StringBuilder();
if (formChildren != null)
{
foreach (var child in formChildren)
{
sb.AppendLine(child.InnerHtml);
}
}
//formElement.InnerHtml also returns empty !
Console.WriteLine(sb.ToString());
You can however achieve this - way easier - with AngleSharp (angle sharp seems to be the recommendable option these days, since it is still maintained/developed, whereas HtmlAgility Pack not).
Using AngleSharp - works
HtmlParser parser = new HtmlParser();
var parsedDoc = parser.Parse(html);
Console.WriteLine(parsedDoc.QuerySelector("form").InnerHtml);
Output (using AngleSharp):