How to use Html Agility Pack for HTML validations

c# html-agility-pack


I validate my HTML using HTML Agility Pack. I use what is listed below.

public class MarkupErrors
    public string ErrorCode { get; set; }
    public string ErrorReason { get; set; }

public static List<MarkupErrors> IsMarkupValid(string html)
    var document = new HtmlAgilityPack.HtmlDocument();
    document.OptionFixNestedTags = true;

    var parserErrors = new List<MarkupErrors>();
    foreach(var error in document.ParseErrors)
        parserErrors.Add(new MarkupErrors
                                 ErrorCode = error.Code.ToString(),
                                 ErrorReason = error.Reason

    return parserErrors;

Let's imagine my contribution looks something like the example below:

Hello World</h2> 
<h3>Missing close h3 tag

My current method thus returns the following list of errors.

- Start tag <h2> was not found
- End tag </h3> was not found

which is fine...

The issue I have is that I want all of the HTML to be legitimate, which means having a suitable<head> and <body> tags, download as.html files since this html will subsequently be accessible for preview.

So I was wondering if I could use HTML Agility Pack to check for this.

We welcome any suggestions you may have. Thanks

5/20/2013 8:15:20 AM

Accepted Answer

You may verify that an HTML element, for instance, like this one, has a HEAD element or a BODY element behind it:

bool hasHead = doc.DocumentNode.SelectSingleNode("html/head") != null;
bool hasBody = doc.DocumentNode.SelectSingleNode("html/body") != null;

If there is no HTML element or no BODY element behind the HTML element, they would fail.

Take note that I do not using this kind of XPATH statement."//head" because even if the head was not precisely underneath the HTML element, a result would still be produced.

5/20/2013 8:45:12 AM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow