Get an html fragment from an html document using the Html Agility Pack.

c# html html-agility-pack

Question

How would I separate an HTML "fragment" from a whole HTML page using the HTML Agility Pack? For my purposes, the term "html fragment" refers to the whole content of the<body> tags.

For instance:

Typical Input

<html>
   <head>
     <title>blah</title>
   </head>
   <body>
    <p>My content</p>
   </body>
</html>

Desired Results:

<p>My content</p>

If there wasn't an error, I'd want to restore the material unchanged.<html> or <body> If the document was not a complete HTML page, I should presume that I was given a fragment.

Can someone please assist me in the proper path?

1
2
12/3/2010 6:16:12 PM

Accepted Answer

I believe you should do it in sections.

You may do the following selectnodes for the document's body or html:

doc.DocumentNode.SelectSingleNode("//body") // returns body with entire contents :)

If no null values are given, you may accept the string as is after checking for them.

Hope it's useful:

6
2/19/2013 9:20:00 AM

Popular Answer

What follows will function:

public string GetFragment(HtmlDocument document)
{
   return doc.DocumentNode.SelectSingleNode("//body") == null ? doc.DocumentNode.InnerHtml : doc.DocumentNode.SelectSingleNode("//body").InnerHtml;
}


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow