Sunday, July 02, 2006

Writing Secure Code - Input Validation

There are a number of attacks an intruder can use that take advantage of the common assumption that data supplied by users while filling out a Web form is safe. Improper checks against data supplied by users can make the site vulnerable to a number of different attacks.


Vulnerabilities


Buffer Overflows


Buffer overflow attacks have been around for decades. A buffer overflow attack can either result in a denial of service or it can cause code injected by the attacker to be run on the server. .NET code is not as susceptible to buffer overflows because the code base is better managed and array bounds are checked before arrays are accessed. Even .NET sites can be susceptible where unmanaged APIs or COM objects are involved.


Cross Site Scripting


In Cross Site Scripting (XSS), an attacker takes advantage of poor handling of data either at entry or display. This technique can be used to gather confidential user information or to impersonate users and achieve access to the Web application with the same rights as the impersonated user. Cross Site Scripting is a technique commonly associated with phishing.


SQL Injection


In SQL Injection, and attacker takes advantage of poorly handled data and weakly constructed queries to a SQL Database. Most commonly, this is on screens where developers are relying on input from the end user to filter or sort data and then the data provided by the end users is not properly validated.


Recommendations


Validate all input all the time


Assume all input is malicious, regardless of source and handle it as such. You can’t be certain that a service, file share, or database you work with has not been compromised. You absolutely can’t be certain that a user is who they claim they are or has good intentions.
Do not assume that data validation only need to take place at a single layer of the application. Verify data at all levels of the application. If any one layer is circumvented or compromised, the remaining layers must perform their due diligence to assure the integrity of the system.


Use common validation routines

Make input validation a core element of your application development strategy. Create shared validation routines for all common routines such as email, zip code, phone numbers, etc. This ensures validation is consistent and makes maintenance much easier.
Be careful about page or module specific validation. Make sure this approach is truly necessary and then attempt to leverage as much of the common routines as possible.


Constrain and sanitize


Constrain
To constrain data is to allow only expected characters or patterns to be submitted. This is commonly accomplished through the application or regular expressions. However it is applied, the idea is to check the data for type, length, format, and range, considering all data that fails to meet the criteria as bad. We would check string patterns and reject any that did not match our specific rules. This not only eliminates errant characters, but further assures the accuracy of the data stored.
In the case of an age field, for example, the length would be at least one and no more than three and only digits would be acceptable. Any string not matching this pattern would be rejected.

/^[0-9]{1,3}$/


An email address would be more complicated. The following pattern is good for most email addresses. It must start with a character, followed by any number of word characters, dots, or hyphens, followed by either a character or digit, followed by “@”, followed by either an IP Address or a character, followed by any series of word characters, dots, or hyphens, followed by either a character or a digit, followed by a dot, and two to four characters.

^[a-zA-Z][\w\.-]*[a-zA-Z0-9]@([0-9]{2,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})([a-zA-Z][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z]{2,4})$

Sanitize
To sanitize data is to transform the data into a safe format. This is different than constraint. In constraint, we do not allow data that does not match our patterns. In sanitization, we alter the data to ensure it is not harmful. This may include stripping nulls or other extended characters from strings or escaping out values so they are treated as literals.
In the simple example shown below, we remove the characters, whereas in actual code, we may choose to replace the characters with displayable representations, such as replacing “<” with “<”.


function RemoveBad(InStr){
InStr = InStr.replace(/\/g,"");
InStr = InStr.replace(/\"/g,"");
InStr = InStr.replace(/\'/g,"");
InStr = InStr.replace(/\%/g,"");
InStr = InStr.replace(/\;/g,"");
InStr = InStr.replace(/\(/g,"");
InStr = InStr.replace(/\)/g,"");
InStr = InStr.replace(/\&/g,"");
InStr = InStr.replace(/\+/g,"");

return InStr;
}


The HTMLEncode method will escape out HTML Characters and the URLEncode method will ensure a URL is a valid URI request. These should be a required part of your standard input/output data handling.


Set the Character Set


If the character set of a page is not explicitly defined, the server is unable to determine which characters are special. This ambiguity can be exploited by hackers because filters for special characters are that much more difficult to create.
Character coding for HTML and HTTP was intended to default to ISO-8859-1, but many browsers did not support this encoding by default. Version 4 of the HTML standard now allows for any character encoding to be used, unless explicitly indicated in the page header.
Recommendation is to set all pages to the same character set, consistent with the server. The following shows a simple example of how to set the character set to ISO-8859-1 in an HTML page. This can be done through a more universal means such as a standard include file for all page headers or use of page templates.


<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>HTML SAMPLE</title>
</head>
<body>
<p>This is a sample HTML page</p>
</body>
</html>

No comments: