Input sanitization is the process of stripping user-supplied input of unwanted or untrusted data so that the application can safely process that input. It is the most common approach to mitigating the effects of code injection, particularly XSS and SQL injection. Any online form that echoes input from the user back to the user on the web page, or which stores input data within the web app database, must be sanitized before the data is output or processed. There are actually several tactics that are considered types of input sanitization, and each one has a different purpose and mitigates different types of attacks.

For XSS, the most prominent type of sanitization is escaping HTML special characters such as angle brackets (< and >) and the ampersand (&) to prevent them from being processed by the browser with the user input. Escaping, also referred to as encoding, substitutes special characters in HTML markup with representations called entities. For example, the entity for less than (<) is &lt; when encoded. Entities ensure that the browser does not interpret malicious code as something that it should run. Depending on the language the page is written in, you will need to use the encoding command appropriate for that language. In PHP, you can use the htmlspecialchars() function to escape major HTML characters:

<?php function my_func($input) { echo htmlspecialchars($input, ENT_QUOTES, 'UTF-8'); } ?>
<!DOCTYPE html> 
<html> 
	<body> 
    	<?php my_func('<script>alert("XSS attack successful!");</script>'; ?>
    </body> 
</html>

The htmlspecialchars() function encodes the accepted $input input parameter so that any instances of ampersands (&), double quotes ("), single quotes ('), less than symbols (<), or greater than symbols (>) in the input are turned into entities. So, in the HTML below that, when the custom my_func() function is called with the malicious alert string, it gets encoded into &lt;script&gt;alert(&quot;XSS attack successful!&quot;);&lt;/script&gt; and therefore the browser will not run the script.

This type of encoding is sufficient for preventing XSS in many cases, but not all. For example, encoding won't work in apps that need to accept HTML input. In those cases, you should use a sanitization library that is written to the relevant language. These libraries automatically parse and strip user-supplied HTML input of untrusted data. Some example libraries include HtmlSanitizer (.NET), PHP HTML Purifier (PHP), SanitizeHelper (Ruby on Rails), and OWASP Java HTML Sanitizer Project (Java).

Additional XSS Mitigation Techniques

In addition to using sanitization libraries, you can also whitelist the type of rich text inputs you've deemed safe for the web app to accept. Any inputs not matching the whitelist will be rejected. You can also replace raw HTML markup for rich text components with another markup language, like Markdown. Attempts to inject malicious HTML will prove ineffective.

Null Byte Sanitization

The most effective way of preventing the poison null byte is to remove it from the input entirely. Modern web app languages tend to handle this automatically, but you can also perform the sanitization manually if you're using an older version. For example, in PHP, you can strip the null byte as follows:

$file = str_replace(chr(0), '', $input);