I found a brilliant example of a HTML sanitizer using HTMLAgilityPack. In the code, the Microsoft.Security.Application.Encoder
class is used:
// AntiXss
a.Value = Microsoft.Security.Application.Encoder.UrlPathEncode(a.Value);
I cannot find the assembly containing this class, I would prefer to not have another dependency in my project, and the sanitizer works without this line. However removing this call may leave a security breach in the code.
In order to decide for or against using this assembly, I would like to know: what does this method actually do?
You can take a look at the source code
From the source code for the method
/// <summary>
/// URL-encodes the path section of a URL string and returns the encoded string.
/// </summary>
/// <param name="input">The text to URL path encode</param>
/// <returns>The URL path encoded text.</returns>
[System.Diagnostics.CodeAnalysis.SuppressMessage(
"Microsoft.Design",
"CA1055:UriReturnValuesShouldNotBeStrings",
Justification = "This does not return a full URL so the return type can be a string.")]
public static string UrlPathEncode(string input)
{
if (string.IsNullOrEmpty(input))
{
return input;
}
// DevDiv #211105: We should make the UrlPathEncode method encode only the path portion of URLs.
string schemeAndAuthority;
string path;
string queryAndFragment;
bool validUrl = UriUtil.TrySplitUriForPathEncode(input, out schemeAndAuthority, out path, out queryAndFragment);
if (!validUrl)
{
// treat as a relative URL, so we might still need to chop off the query / fragment components
schemeAndAuthority = null;
UriUtil.ExtractQueryAndFragment(input, out path, out queryAndFragment);
}
return schemeAndAuthority + HtmlParameterEncoder.UrlPathEncode(path, Encoding.UTF8) + queryAndFragment;
}
You will have to dig deeper to get to all the moving parts in encoding the uri. Usually I would recommend looking into the unit tests to see what is expected of the component but there is no tests on the Encoder
class at first glance :(