User Input Sanitization
User input filtering, a.k.a. input sanitization, is one of the most important concepts within the security realm. Improper handling of user input can lead to numerous vulnerabilities, including buffer overflows, SQL injection, command injection, format string attacks, etc. Sadly, developers often overlook the importance of this practice. While this concept extends to most information security realms, this article focuses on Web-based applications and the dangers imposed when user input in Web applications is not filtered properly.
I did a short, six-month stint as an application developer. During this time I noticed a glaring security-related knowledge gap within the Web development community. Since that time, I have wanted to write an article on this topic to raise awareness. I do NOT purport to be an expert when it comes to programming (far from it!). Therefore, my intent is to bring attention to the issue, rather than to highlight every feasible way input can be filtered. In fact, my goal is to spur conversation, and I appreciate any and all feedback on this article.
Input Sanitization in the Web Realm
If you look at the OWASP Top 10 for 2013, you will notice that (at least!) five (5) of the top ten (10) Web-based risks identified revolve around input sanitization in some way, shape, or form. The fact that half of the top Web-based vulnerabilities deal with user input should be enough to make any Web developer realize that this concept is important. Check out the OWASP Top 10 for 2013 here:
The Problem: Many Web application developers do not realize that input sanitization requires a multi-tiered approach.
Bottom Line: Input sanitization needs to occur at the following layers:
1) User – Make the browser do some of the work; this offloads data processing from the server and catches user errors!
2) Server – What the heck did the user just try to pass into my application? No, no, no.
3) Database – Use proper database schema! Selecting proper data types helps avoid bogus data.
Many developers focus on one or maybe two of these layers, but a truly secure environment requires sanitization on all three fronts. I see various articles related to sanitization that have some great content, but I have yet to come across any “triple-pronged approach” articles, which is why I am writing this article (please leave a comment if you know of one).
I will emphasize checking the 1) length and 2) content of user input, since I view these as two primary focal points.
Regex-based manual functions
|Server||PHP||Bounds checking function – length
|Database||MySQL||Proper data types
Parameterized stored procedures
Since user-layer sanitization is common, I do not want to spend too much time on this topic. Anyways, here we go.
• In this code segment, we have a text input on an HTML page that should only accept a numeric response that is 1-6 digits in length
• To help the user, the
title= section provides a tooltip concerning the required input
maxlength= tells the text input area to only allow up to six characters, but we still need to check that the input is numeric before we send it to the server
• The submit button used to submit user data, including the User ID value
• This code segment ensures that the passed input is numeric; pretty simple
• This can be accomplished dozens of ways, this is just one that I used
Once the user’s input reaches the server, we need to perform additional, sometimes redundant, input checking. Again, there are many ways to do this, so I will simply cover some of my preferred methods.
Initial filtering (length + content)
One thing I like to do when filtering input data via PHP is to perform an immediate check of POST data prior to passing anything into other PHP functions. In other words, once the user submits POST data, I do some basic checks right away. Here’s an example setup:
main(), set the variables you intend to use to
2) Draw the user input form to the screen
a. Some devs like to enter the raw HTML within the PHP file, but I like to generate the HTML via PHP functions
3) Check for
$_POST["run"], which is the
name= variable set for the Submit button on the HTML form
a. If this is set, the user has clicked Submit, so we need to check POST data
b. Otherwise, just draw the user form and exit, awaiting the user’s input
c. For forms with multiple submit buttons, one can simply check for each post value using
4) Call a POST data processing function, like
More in-depth input processing occurs in
processOpts(). For example:
As you see above, I like using a
checkBounds() function to check the bounds of various inputs, minimizing the code footprint related to checking input size in multiple places.
The idea here is simple: Rather than checking every dang user input in a dedicated code block, we can pass the input, low bound, and high bound to the function to see if the input fits the desired length.
• The function returns “none” if there is no error, which is processed in the calling function
• You will notice that I prefer to use an array for errors. I subscribe to the idea that errors should be caught during execution and that error display should occur at the end of the program run (whether short-circuited or not due to try/catch error generation).
While I did not include direct examples, a function similar to processOpts() would also pass specific variables to functions that serve to protect against SQL injection, XSS, command injection, etc.
Dynamic Input Checking – a.k.a. The Good Stuff
As we will see in the Database-layer section, the SQL
enum data type is invaluable. Not only can we restrict invalid input at the DB-layer using the
enum data type, but the bugger also allows us to implement dynamic input checking at the server layer. For example, if we have an input field whose values need to match the values available within a defined list, we can pull the list directly from the DB, rather than using a global constant array within the program.
To restrict input to a defined list, developers sometimes include the list of acceptable items within the code itself. This includes defining arrays randomly throughout classes and/or functions, which can make it difficult to maintain visibility into each array, thereby making it difficult to keep these arrays in sync with changes in the DB. To avoid this hassle, avoid defining such lists within the code or global constants. Rather, rely on a SQL pull to find the most current, acceptable input list. This way, you can just update the values within the DB and the dynamic code will pull the correct values. Change the
enum types in the DB, and both the server- and DB-layers will use this information. BAM!
I could not find my old code for properly exploding the SQL query result in this scenario, so I used code from Jade Krafsig’s blog, located here: [http://jadendreamer.wordpress.com/2011/03/16/php-tutorial-put-mysql-enum-values-into-drop-down-select-box/]. Respect where due — Props to her!
Let’s take a look at a re-usable function:
• I used
mysql:: in this older code, but you’ll want to use
• NOTE: The “dynamic” nature of this code breaks if you change your table or field names within the DB. You can somewhat protect against this by using constants (global or not) with these names. Then again, if you’ve already reached the third normal form (3NF) within your DB, and you begin to make name changes… yeah, no, stop it.
We call this bad boy with something like the following:
This calls for the creation of a function that loops through
$enum_options values to see if the user’s input exists within the array, but you get the idea. To throw in a little something extra: This function can be called in your function(s) that create the HTML, such that the enum items are used for items in a drop-down. This way, you can use the same enum list to provide options and to error-check them (win/win).
Now that we’ve taken a look at user- and server-layer input checking, let’s move to the final frontier: the DB-layer.
The third layer, one that I find to be overlooked the most, involves restricting invalid data at the database (DB) itself. I prefer to use at least the following three techniques to restrict invalid data from DB fields/records:
1) Proper data types
2) Prepared Statements
3) Parameterized Stored Procedures
I will not provide a breakdown of prepared statements or stored procedures, as a simple Web search for either will yield numerous results. Thus, I would like to focus on using proper data types within your DB(s). In our case, we are going to focus on MySQL data types, which can be found at the following locations:
Given the breadth of data types available, I will not attempt to cover them all, but let’s look at a common data type that relates to our previous User ID example:
|Number Types||Unsigned Values||Signed Values|
||0 – 255||-128 – 127|
||0 – 65535||-32768 – 32767|
||0 – 16777215||-8388608 – 8388607|
||0 – 4294967295||-2147483648 – 2147483647|
Here, we have the more popular
int data types (ignoring bigint, that’s huge!). In my previous examples, I used a User ID field that request a value between one and six digits (1-6) in length. In this case, the maximum value would be 999999. In this scenario, which number data type best fits the situation? Well, both tinyint and smallint are too small. Some application developers would choose to use a standard int for this input type. However, this wastes memory and introduces additional address space that can be filled.
• Concerning buffer overflows, the more memory for variables we provide, the easier an attacker can implement a buffer overflow (DEP or not!)
• Why provide additional memory that is not needed?
• Concerning memory management, every bit of memory (ohhhh! see that?!) that can be saved decreases an application’s memory footprint
• Why use more memory than we need to use?
Taking this concept further, some developers like to use a
double number type when they are not needed. Why? I often feel as though the developer wants to have “flexibility” in their code, allowing him or her to sneak a decimal number into a normally non-decimal field. If you’re like me, you cringed when you read that last sentence! ‘Nuff said on that matter!
String Types — a.k.a. Enum to the Rescue!!
Another point to drive home is the difference between restricting allowed strings using an
enumdata type versus allowing a user to enter any string by relying on something like a varchar data type.
Basically, if we want the user to be able to enter one of five different named user groups, we can rely on the
enum data type to restrict content input. Here’s an example (from phpMyAdmin):
Here we have two example methods of implementing a user_group field
user_group_good field uses the
enum data type and allows only one of five different strings: “limited,” “basic,” “power,” “admin,” or “root.”
user_group_bad field uses the
varchar data type with a length of sevent (7), since the longest string expected (“limited”) is seven characters
The problem with using a
varchar(7) data type for this field is that this technically allows any string that is 7 or fewer characters to be input into this field. By using the
enum data type, we can provide a catch for when both the user and server layer sanitization methods are circumvented. As such, if a malicious user attempts to input a memory address in the form of a string, such as
0x28736, into this field, the DB simply will not accept this value.
I will digress at this point, but do not forget to think about the lower-layer differences between other string types, such as
I have attempted to show why input sanitization should occur at all three layers, and I hope I that I was able to persuade a few readers. Overall, leveraging the restriction capabilities available within each layer allows you as the developer to implement a system of checks and balances. Such a system prevents malicious data from entering into your Web application, even when one or even two layers are defeated via user error or malicious action. In closing, your Web app should say, “You wanna go?! You have to get by my THREE friends first!”
Thank you for reading this article, and please, leave me a comment with your thoughts. I will make the code snippets that I used available upon request.