Filtering input in PHP
If you're writing web apps in PHP, you need to filter everything your scripts touch. Filtering means only accepting what you expect to find, and it can be as simple as making sure a date is made of numbers, or as complex as allowing your site's users to use some HTML tags and not others when they write comments on your blog.
See also
Filtering input is only one of the things that every PHP coder needs to do all the time. Another vitally important process is escaping your output. You can read more about it here.
Filtering doesn't just make your site safer, it makes your code more understandable. Clearly-marked input filtering clues other people on your project into what data you're expecting to work with, and cuts down on the kinds of errors you need to check for.
How to tell you're doing it right:
If you filter properly, your code should never, never contain $_GET, $_POST or $_COOKIE except in the lines that do the filtering.
First steps: the <form…> tag
Think about where you're expecting your data to come from too - the method="" part of your <form> tag will tell you that. Don't ever use $_REQUEST, it's a grab-bag of input sources, some of which you might not expect.
About the examples:
If you have PHP 5.2 or newer, you should use the new built-in Filter extension for several reasons. Firstly, it's very descriptive - as you'll see below, when you read the code, you can see at a glance what it's intended to do. Secondly, it's maintained by the PHP team, which means that it's one more piece of code you won't have to look after. If the FILTER_SANITIZE_STRING filter is broken, upgrading PHP will plug the leak and you won't have to spend days figuring it out yourself. Third, you can filter a whole bunch of variables at once using the filter_input_array(), which keeps everything in one place that's easy to find and easy to update.
These filters are very straightforward and simple examples. There are plenty of cases where you can (and should) do more, and maybe even more simply. I'll try to add to the list over time. The things that are notably missing are:
- Dates and times
- Fixed input from a list (e.g.
<select>tags, radio buttons and checkboxes) - Phone numbers
- URLs
Sanitising vs. Validating
Finally, the examples below all use the "sanitize" filters, not the "validate" filters. The difference is that validation filters will either give you your data or return false, whereas the sanitising filters will only remove invalid characters, without guaranteeing that the result is usable. For example, FILTER_SANITIZE_EMAIL will tell you that @@@'%+bob is okay input for an email address. It's clearly not, but it contains nothing that you can't use in an email address.
Alternatively
Co-written by PHP Security guru Chris Shiflett, Inspekt is a powerful and elegant alternative to Filter. When you initialise an Inspekt "cage", it forcibly removes the $_GET, $_POST, $_FILES, etc superglobals and requires you to retrieve user data through one of its filtering methods, forcing you to think about what you're doing. Obviously this approach doesn't play well with other third-party code that expects to be able to access those variables, so I've chosen not to include it in the examples below, but if you're serious about security (and you should be!), it's well worth a look.
Numbers
If you're expecting a number and do number things to a string, PHP will try to turn it into a number. Most likely it'll come out as 0, but that depends on the input. If you assume it's a just number and display it without using escaping, you're creating a XSS vulnerability. If you assume it's a number and don't use a prepared statement when you put it into the database, you're creating an SQL injection vulnerability.
If your form asks for an integer number, remove any non-numbers:
$num = filter_input(INPUT_POST, 'myfield', FILTER_SANITIZE_NUMBER_INT);
There's also a filter for floating point numbers (you know, like -183.2810) called FILTER_SANITIZE_NUMBER_FLOAT.
Plain text
An attacker can embed HTML (including Javascript) that, if you don't do any escaping when you display it, will allow them to perform an XSS attack on your site's visitors, or deface your site.
If you want plain text, remove any tags (HTML, PHP, etc) from your variable:
$string = filter_input(INPUT_POST, 'myfield', FILTER_SANITIZE_STRING);
Email addresses
If you use the email address in the headers (like setting the Reply-To, or From headers) when sending an email, an attacker can set additional email headers and use your script to send any email to anyone, in other words phishing attacks and spam. Remember that even if you do filter, you should use confirmation emails both to stop someone using your site to annoy others (like signing someone they hate up to your mailing list), and to verify that the email address is in fact real, not just the right shape.
$email = filter_input(INPUT_POST, 'myfield', FILTER_SANITIZE_EMAIL);
Bear in mind that the email filter will only remove characters that aren't allowed in email addresses. It doesn't guarantee that what comes out is a valid email address, or even looks like one. It'll cheerfully return '!#$%&'*+-/=?^_`{|}~@.[].' If you want to be sure that what comes out looks like an email address, use the FILTER_VALIDATE_EMAIL filter or a regular expression.
Filtering HTML
Using strip_tags()
PHP's built-in strip_tags() function has an optional argument called 'allowed tags'. Do not use it! Allowed tags aren't filtered at all, and can contain invalid or malicious attributes. If you allow the <a> tag, the href="" attribute could contain a javascript: URL. Worse than that, absolutely any tag could contain javascript triggers like onload="" that will run code. Even the style="" attribute can use IE's behavior rule to run code from another web site.
HTML is extremely hard to filter properly, so either remove it entirely with FILTER_SANITIZE_STRING or strip_tags() or use either HTMLawed or HTMLPurifier. HTMLPurifier is the paranoid option, but incredibly flexible, while htmLawed is more lightweight and uses a blacklist. Both are excellent, both are easy to use. Never, never accept HTML input without using one of those tools though, it's far too easy to do it wrong. If you're at all uncertain about accepting HTML from your users, the simple solution is don't.
If you don't filter HTML input properly, then show it on your site, you're handing over control of that part of your site to a potential attacker, and they can use that control to abuse your site's visitors, and to steal information about them that should be private. I'm definitely not saying you shouldn't allow HTML input, just that you should be aware of the risks and spend some time learning about how to deal with them.
You can't do this with the Filter extension, and you should definitely not try to do it with regular expressions. Please don't underestimate the difficulty of this task - the number of potential attacks is dizzying. Even with one of these filters, some attacks can get through. One particularly sneaky attack to deal with is someone creating a link or button that looks like part of your site (rather than looking obviously like it's been typed in by a user) that, when clicked, performs an action as if it was run by the logged-in user who clicks it. You know, like http://www.example.com/myprofile/delete.php?sure=yes.
Don't assume that using TinyMCE or FCKEditor (or anything like it) will protect you. An attacker can very easily bypass your official form (trivial example, they could turn Javascript off in their browser, or save your form to disk and modify it).
Image uploads
This isn't a trivial problem either unfortunately. You'll need to test that file was uploaded legitimately, that it's actually an image, and that it's not too big or of the wrong type, especially if you're going to make a thumbnail out of it.
The first minefield is handling the $_FILES array. You need to check that the file was successfully uploaded first, then use is_uploaded_file() to make sure it's legit. Then you need to get the image type (and make sure it's not from some idiot who thinks that renaming a file to .JPG makes it a jpeg), and more importantly the image size. If you're making a thumbnail using PHP's GD functions, you have to load the entire image into memory in its uncompressed bitmap form, which can very, very easily break PHP's memory_limit setting and crash your script. It's shocking how few scripts actually do this last step, especially image gallery software that should really know better. Here's an example that's as simple as I can reasonably make it.
Fill in the blanks
This code won't work if you just cut and paste it because I've put in some placeholder functions that you'll have to replace with your own code. The placeholders are my_upload_error() and do_stuff_here(). The first should show a helpful message, the second should move the file somewhere and do whatever you were planning to do with it.
<?php
if (isset($_FILES['myfile'])) {
if ($_FILES['myfile']['error'] != UPLOAD_ERR_OK) {
// Handle your error here, the upload didn't work.
my_upload_error();
} elseif (!is_uploaded_file($_FILES['myfile']['tmp_name'])) {
// File was not uploaded legitimately. Handle as you see fit.
my_upload_error();
} else {
// So far, so good. Now to check the file itself.
$image_info = getimagesize($_FILES['myfile']['tmp_name']);
if (!$image_info) {
// The file isn't an image
my_upload_error();
} elseif (!in_array($image_info[2],
array(IMG_GIF, IMG_JPG, IMG_PNG))) {
// The file isn't an accepted type.
my_upload_error();
} elseif (($image_info[1] *
$image_info[2] *
$image_info['channels'])
> 2097152) {
/* This is a CRUDE calculation of the uncompressed size
* of the image (width x height x channels). We're rejecting
* if it's over 2Meg. Of course, that's a hard concept to
* explain to normal site users, so explain it to them in
* width and height terms. 2 Meg is about 1400 pixels square.
* Anyway, this image failed the size test, handle that here
*/
my_upload_error();
} else {
/* Everything looks good. Now you can use
* move_uploaded_file() to put the image somewhere, generate
* a thumbnail from it, or whatever you need to do.
*/
do_stuff_here();
}
}
}
?>
Conclusion:
Filter everything your script touches. Filter like crazy. But sadly filtering isn't always straightforward, and PHP doesn't tell you off if you don't do it or do it wrong. Files and HTML are major minefields. If you're uncertain, then outsource the problem - for example, use Flickr for photo upload and hosting. They've got loads of tools for embedding the results in your site, and it's one less thing to worry about in your code.