2013/10/25

User Input Sanitization – A Triple-Pronged Approach

by 8bits0fbr@in
Categories: Coding, Java, JavaScript, PHP
Tags: No Tags
Comments: 1 Comment

User Input Sanitization

User input filtering, a.k.a. input sanitization, is one of the most important concepts within the security realm.  Improper handling of user input can lead to numerous vulnerabilities, including buffer overflows, SQL injection, command injection, format string attacks, etc.  Sadly, developers often overlook the importance of this practice. While this concept extends to most information security realms, this article focuses on Web-based applications and the dangers imposed when user input in Web applications is not filtered properly.

Why?

I did a short, six-month stint as an application developer.  During this time I noticed a glaring security-related knowledge gap within the Web development community.  Since that time, I have wanted to write an article on this topic to raise awareness.  I do NOT purport to be an expert when it comes to programming (far from it!).  Therefore, my intent is to bring attention to the issue, rather than to highlight every feasible way input can be filtered.  In fact, my goal is to spur conversation, and I appreciate any and all feedback on this article.

Input Sanitization in the Web Realm

If you look at the OWASP Top 10 for 2013, you will notice that (at least!) five (5) of the top ten (10) Web-based risks identified revolve around input sanitization in some way, shape, or form.  The fact that half of the top Web-based vulnerabilities deal with user input should be enough to make any Web developer realize that this concept is important.  Check out the OWASP Top 10 for 2013 here:

[https://www.owasp.org/index.php/Top_10_2013-Top_10]

The Problem: Many Web application developers do not realize that input sanitization requires a multi-tiered approach.

Bottom Line: Input sanitization needs to occur at the following layers:

1) User – Make the browser do some of the work; this offloads data processing from the server and catches user errors!

2) Server – What the heck did the user just try to pass into my application?  No, no, no.

3) Database – Use proper database schema!  Selecting proper data types helps avoid bogus data.

Many developers focus on one or maybe two of these layers, but a truly secure environment requires sanitization on all three fronts.  I see various articles related to sanitization that have some great content, but I have yet to come across any “triple-pronged approach” articles, which is why I am writing this article (please leave a comment if you know of one).

I will emphasize checking the 1) length and 2) content of user input, since I view these as two primary focal points.

Again, I am not attempting to provide a comprehensive report on how to secure Web applications from all attacks.  Rather, I am trying to highlight what I see as lacking within the Web development community.  My experience is limited, so I will stick to what I know: JavaScript, PHP, and MySQL for user, server, and database layer input filtering, respectively.


Example Techniques

Layer Language Filtering Techniques
User JavaScript Bounds checking function
Regex-based manual functions
HTML maxlength for inputs
Server PHP Bounds checking function – length
strlen() – length
mysqli_real_escape_string() – SQL injection
strip_tags() – XSS
escapeshellarg() – command injection
escapeshellcmd() – command injection
trim() – general
Database MySQL Proper data types
Prepared statements
Parameterized stored procedures

User-layer Sanitization

JavaScript is the most common language that I have seen used for user-layer input sanitization.  Sadly, most implementations that I have come across are lacking in depth.  Of course, user-layer sanitization can be defeated through the use of host-based proxies (Paros, Fiddler, etc.), browsers with built-in dev tools (Chrome DevTools, IE Developer Tools, etc.), browser add-ons (Tamper Data, Add-N-Edit Cookies, etc.), and other techniques.  Even so, sanitizing input at the user layer is important because it helps avoid user error.

Since user-layer sanitization is common, I do not want to spend too much time on this topic.  Anyways, here we go.

Input length + Numeric content (HTML + JavaScript)

Input_sanitization-01

• In this code segment, we have a text input on an HTML page that should only accept a numeric response that is 1-6 digits in length

• To help the user, the title= section provides a tooltip concerning the required input

• The maxlength= tells the text input area to only allow up to six characters, but we still need to check that the input is numeric before we send it to the server

Input_sanitization-02

• The submit button used to submit user data, including the User ID value

• When the Submit button is clicked, the HTML calls the JavaScript function isNumeric and passes the userid value along with a string “# ONLY,” which is used in an alert if the data is not numeric

Note: I cut out the call to a JS function to also check the input size for the sake of brevity.  Even though we limit the maxlength in HTML, we should still do so in JavaScript if we want to be thorough.

Input_sanitization-03

• This code segment ensures that the passed input is numeric; pretty simple

• This can be accomplished dozens of ways, this is just one that I used

The above code segments are basic and serve as mere examples.  The idea is simply to make sure that you utilize HTML and JavaScript in order to attempt filtering on the user layer.  Avoiding user errors is important, but the next step is to filter at the server, which is critical.


Server-Layer Sanitization

Once the user’s input reaches the server, we need to perform additional, sometimes redundant, input checking.  Again, there are many ways to do this, so I will simply cover some of my preferred methods.

Initial filtering (length + content)

One thing I like to do when filtering input data via PHP is to perform an immediate check of POST data prior to passing anything into other PHP functions.  In other words, once the user submits POST data, I do some basic checks right away.  Here’s an example setup:

1) Within main(), set the variables you intend to use to NULL

2) Draw the user input form to the screen

a. Some devs like to enter the raw HTML within the PHP file, but I like to generate the HTML via PHP functions

3) Check for $_POST["run"], which is the name= variable set for the Submit button on the HTML form

a. If this is set, the user has clicked Submit, so we need to check POST data

b. Otherwise, just draw the user form and exit, awaiting the user’s input

c. For forms with multiple submit buttons, one can simply check for each post value using OR (||)

4) Call a POST data processing function, like processOpts()

Input_sanitization-04

More in-depth input processing occurs in processOpts().  For example:

Input_sanitization-05.1

Bounds Checking

As you see above, I like using a checkBounds() function to check the bounds of various inputs, minimizing the code footprint related to checking input size in multiple places.

The idea here is simple: Rather than checking every dang user input in a dedicated code block, we can pass the input, low bound, and high bound to the function to see if the input fits the desired length.

Input_sanitization-06.1

• The function returns “none” if there is no error, which is processed in the calling function

• You will notice that I prefer to use an array for errors. I subscribe to the idea that errors should be caught during execution and that error display should occur at the end of the program run (whether short-circuited or not due to try/catch error generation).

While I did not include direct examples, a function similar to processOpts() would also pass specific variables to functions that serve to protect against SQL injection, XSS, command injection, etc.

Dynamic Input Checking – a.k.a. The Good Stuff

As we will see in the Database-layer section, the SQL enum data type is invaluable.  Not only can we restrict invalid input at the DB-layer using the enum data type, but the bugger also allows us to implement dynamic input checking at the server layer.  For example, if we have an input field whose values need to match the values available within a defined list, we can pull the list directly from the DB, rather than using a global constant array within the program.

To restrict input to a defined list, developers sometimes include the list of acceptable items within the code itself.  This includes defining arrays randomly throughout classes and/or functions, which can make it difficult to maintain visibility into each array, thereby making it difficult to keep these arrays in sync with changes in the DB.  To avoid this hassle, avoid defining such lists within the code or global constants.  Rather, rely on a SQL pull to find the most current, acceptable input list.  This way, you can just update the values within the DB and the dynamic code will pull the correct values.  Change the enum types in the DB, and both the server- and DB-layers will use this information. BAM!

I could not find my old code for properly exploding the SQL query result in this scenario, so I used code from Jade Krafsig’s blog, located here: [http://jadendreamer.wordpress.com/2011/03/16/php-tutorial-put-mysql-enum-values-into-drop-down-select-box/]. Respect where due — Props to her!

Let’s take a look at a re-usable function:

Input_sanitization-08

• I used mysql:: in this older code, but you’ll want to use mysqli:: (http://www.php.net/manual/en/book.mysqli.php)

• NOTE: The “dynamic” nature of this code breaks if you change your table or field names within the DB. You can somewhat protect against this by using constants (global or not) with these names. Then again, if you’ve already reached the third normal form (3NF) within your DB, and you begin to make name changes… yeah, no, stop it.

We call this bad boy with something like the following:

Input_sanitization-09

This calls for the creation of a function that loops through $enum_options values to see if the user’s input exists within the array, but you get the idea. To throw in a little something extra: This function can be called in your function(s) that create the HTML, such that the enum items are used for items in a drop-down. This way, you can use the same enum list to provide options and to error-check them (win/win).

Now that we’ve taken a look at user- and server-layer input checking, let’s move to the final frontier: the DB-layer.


Database-Layer Restrictions

The third layer, one that I find to be overlooked the most, involves restricting invalid data at the database (DB) itself.  I prefer to use at least the following three techniques to restrict invalid data from DB fields/records:

1) Proper data types

2) Prepared Statements

3) Parameterized Stored Procedures

I will not provide a breakdown of prepared statements or stored procedures, as a simple Web search for either will yield numerous results.  Thus, I would like to focus on using proper data types within your DB(s).  In our case, we are going to focus on MySQL data types, which can be found at the following locations:

MySQL Link
5.7 http://dev.mysql.com/doc/refman/5.7/en/data-type-overview.html
5.0 http://dev.mysql.com/doc/refman/5.0/en/data-type-overview.html
3.x/4.x http://dev.mysql.com/doc/refman/4.1/en/data-type-overview.html

Numeric Types

Given the breadth of data types available, I will not attempt to cover them all, but let’s look at a common data type that relates to our previous User ID example:

Number Types Unsigned Values Signed Values
tinyint 0 – 255 -128 – 127
smallint 0 – 65535 -32768 – 32767
mediumint 0 – 16777215 -8388608 – 8388607
int 0 – 4294967295 -2147483648 – 2147483647

Here, we have the more popular int data types (ignoring bigint, that’s huge!).  In my previous examples, I used a User ID field that request a value between one and six digits (1-6) in length.  In this case, the maximum value would be 999999.  In this scenario, which number data type best fits the situation?  Well, both tinyint and smallint are too small.  Some application developers would choose to use a standard int for this input type.  However, this wastes memory and introduces additional address space that can be filled.

• Concerning buffer overflows, the more memory for variables we provide, the easier an attacker can implement a buffer overflow (DEP or not!)

• Why provide additional memory that is not needed?

• Concerning memory management, every bit of memory (ohhhh! see that?!) that can be saved decreases an application’s memory footprint

• Why use more memory than we need to use?

Taking this concept further, some developers like to use a decimal, float, or double number type when they are not needed.  Why?  I often feel as though the developer wants to have “flexibility” in their code, allowing him or her to sneak a decimal number into a normally non-decimal field.  If you’re like me, you cringed when you read that last sentence!  ‘Nuff said on that matter!

String Types — a.k.a. Enum to the Rescue!!

Another point to drive home is the difference between restricting allowed strings using an enumdata type versus allowing a user to enter any string by relying on something like a varchar data type.

[http://dev.mysql.com/doc/refman/5.7/en/enum.html]

Basically, if we want the user to be able to enter one of five different named user groups, we can rely on the enum data type to restrict content input.  Here’s an example (from phpMyAdmin):

Input_sanitization-07

Here we have two example methods of implementing a user_group field

• The user_group_good field uses the enum data type and allows only one of five different strings: “limited,” “basic,” “power,” “admin,” or “root.”

• The user_group_bad field uses the varchar data type with a length of sevent (7), since the longest string expected (“limited”) is seven characters

The problem with using a varchar(7) data type for this field is that this technically allows any string that is 7 or fewer characters to be input into this field.  By using the enum data type, we can provide a catch for when both the user and server layer sanitization methods are circumvented.  As such, if a malicious user attempts to input a memory address in the form of a string, such as 0x28736, into this field, the DB simply will not accept this value.

I will digress at this point, but do not forget to think about the lower-layer differences between other string types, such as varchar, text, and blob.

Wrap-Up

I have attempted to show why input sanitization should occur at all three layers, and I hope I that I was able to persuade a few readers. Overall, leveraging the restriction capabilities available within each layer allows you as the developer to implement a system of checks and balances. Such a system prevents malicious data from entering into your Web application, even when one or even two layers are defeated via user error or malicious action. In closing, your Web app should say, “You wanna go?! You have to get by my THREE friends first!”

Thank you for reading this article, and please, leave me a comment with your thoughts. I will make the code snippets that I used available upon request.

- 8bits0fbr@in


1 Comment »

  1. […] MalWerewolf has a nice piece on proper techniques for input sanitization, focusing on a PHP web application. Remember students- "All input is EVIL". […]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>



Today is Friday
2014/10/31