Html Agility Pack - Get html fragment from an html document

c# html html-agility-pack

Question

Using the html agility pack; how would I extract an html "fragment" from a full html document? For my purposes, an html "fragment" is defined as all content inside of the <body> tags.

For example:

Sample Input:

<html>
   <head>
     <title>blah</title>
   </head>
   <body>
    <p>My content</p>
   </body>
</html>

Desired Output:

<p>My content</p>

Ideally, I'd like to return the content unaltered if it didn't contain an <html> or <body> element (eg. assume that I was passed a fragment in the first place if it wasn't a full html document)

Can anyone point me in the right direction?

Accepted Answer

I think you need to do it in pieces.

you can do selectnodes of document for body or html as follows

doc.DocumentNode.SelectSingleNode("//body") // returns body with entire contents :)

then you can check for null values for criteria and if that is provided, you can take the string as it is.

Hope it helps :)


Popular Answer

The following will work:

public string GetFragment(HtmlDocument document)
{
   return doc.DocumentNode.SelectSingleNode("//body") == null ? doc.DocumentNode.InnerHtml : doc.DocumentNode.SelectSingleNode("//body").InnerHtml;
}



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why