Get entire form element as string using Html Agility Pack

c# html-agility-pack

Question

This is the first time I am using Html Agility Pack and facing problems straight away.

Just as my title suggest I want to get entire element as string including inner elements.

So for example below is my html and I am searching for a form element with id aspnetForm

<html>  
<head>  
</head>  
<body>  
  <form name="aspnetForm" id="aspnetForm">
    <div id="div1">  
        <a href="div1-a1">Link 1 inside div1</a>  
        <a href="div1-a2">Link 2 inside div1</a>  
    </div>  
    <a href="a3">Link 3 outside all divs</a>      
    <div id="div2">  
        <a href="div2-a1">Link 1 inside div2</a>  
        <a href="div2-a2">Link 2 inside div2</a>  
    </div> 
  </form> 
</body>  
</html>

I want the following to be the output (in string)

  <form name="aspnetForm" id="aspnetForm">
    <div id="div1">  
        <a href="div1-a1">Link 1 inside div1</a>  
        <a href="div1-a2">Link 2 inside div1</a>  
    </div>  
    <a href="a3">Link 3 outside all divs</a>      
    <div id="div2">  
        <a href="div2-a1">Link 1 inside div2</a>  
        <a href="div2-a2">Link 2 inside div2</a>  
    </div> 
  </form> 

I usually do not like to ask such spoon-feeding questions but I have been trying and searching but couldnt get an answer.

Please help!

Thanks in advance!

Accepted Answer

Seems you're looking for HtmlNode.OuterHtml:

//
// Summary:
//     Gets or Sets the object and its content in HTML.
public virtual string OuterHtml { get; }

So you just have to select your form node and get its OuterHtml property:

HtmlDocument doc = ... // load your HTML
HtmlNode formNode = doc.DocumentNode.SelectSingleNode("//form[@id='aspnetForm']");
string entireElementAsString = formNode.OuterHtml;

UPDATE

It seems there's a very old bug with how HAP treats form tags. Or maybe it's a feature!

In any case, here's a workaround:

HtmlNode.ElementsFlags.Remove("form");

So this should work:

HtmlNode.ElementsFlags.Remove("form");
HtmlDocument doc = ... // load your HTML
HtmlNode formNode = doc.DocumentNode.SelectSingleNode("//form[@id='aspnetForm']");
string entireElementAsString = formNode.OuterHtml;

Popular Answer

Indeed good question, weird enough all the following fails !

Using HtmlAgilityPack - not able yet to come up with a solution!

(note that I use the nuget library ScraySharp as well, to get the Css selectors extension (ScrapySharp.Extensions)

 string html = @"<html>
        <head>
        </head>
        <body>
          <form name='aspnetForm' id='aspnetForm'>
            <div id='div1'>
                <a href='div1-a1'>Link 1 inside div1</a>
                <a href='div1-a2'>Link 2 inside div1</a>
            </div>
            <a href='a3'>Link 3 outside all divs</a>
            <div id='div2'>
                <a href='div2-a1'>Link 1 inside div2</a>
                <a href='div2-a2'>Link 2 inside div2</a>
            </div>
          </form>
        </body>
        </html>";

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);

    string result = string.Empty;

    var formElement = doc.DocumentNode.CssSelect("form").FirstOrDefault();
    var formChildren = formElement.Descendants();

    StringBuilder sb = new StringBuilder();

    if (formChildren != null)
    {
        foreach (var child in formChildren)
        {
            sb.AppendLine(child.InnerHtml);
        }
    }

        //formElement.InnerHtml also returns empty !
        Console.WriteLine(sb.ToString());

You can however achieve this - way easier - with AngleSharp (angle sharp seems to be the recommendable option these days, since it is still maintained/developed, whereas HtmlAgility Pack not).

Using AngleSharp - works

 HtmlParser parser = new HtmlParser();
 var parsedDoc = parser.Parse(html);
 Console.WriteLine(parsedDoc.QuerySelector("form").InnerHtml);

Output (using AngleSharp):

enter image description here




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why