Safe CGI Programming Last updated: 1995-09-03 ---------------------------------------------------------------------- Recent exposure of security holes in several widely used CGI packages indicates that the existing documents on CGI security have not taken hold in the public consciousness. These scripts are being redistributed to people that have no programming experience and no way to determine whether they are opening up their servers for attack. This causes considerable frustration for all involved. This document is intended for the beginning or intermediate CGI programmer. It is by no means a comprehensive analysis of the security risks -- its purpose is to help people avoid the most common errors. This document and other CGI security resources are available at Please send comments on this document to Paul Phillips Q: "Why should I care? The server runs as nobody, right? That means you can't do anything dangerous, even if you break a CGI script." A: Wrong. Some of the actions that can be taken in various circumstances are: 1) Mailing the password file to the attacker (unless shadowed) 2) Mailing a map of the filesystem to the attacker 3) Mailing system information from /etc to the attacker 4) Starting a login server on a high port and telneting in 5) Many denial of service attacks: massive filesytem finds, for example, or other resource consuming commands 6) Erasing and/or altering the server's log files Another problem is that some sites are running their webservers as root. I CANNOT EMPHASIZE ENOUGH HOW BAD THIS IS. You are shooting yourself in the foot. Whatever problem inspired you to do this, you must solve it in some other manner, or you *will* be compromised in the future. There has been some confusion as to what it means to "run your webserver as root." It is fine to *start* the webserver as root. This is necessary to bind to port 80 on Unix systems. However, the webserver should then give away its privileges with a call to setuid. The webserver's configuration file should allow you to specify what user it should run as; the default is normally "nobody", a generic unprivileged account. Remember that it is irrelevant which account owns the binary, and the program should not have the setuid bit set. There is a good argument that servers should not actually run as "nobody", but rather as a specific UID and GID dedicated to the webserver, such as "www". This prevents other programs that run as "nobody" from interfering with server-owned files. There is a program called "cgiwrap" that runs CGI scripts under the UID of the person that owns them. While cgiwrap successfully overcomes some problems with CGI scripts, it also exacerbates the effect of security holes. If an attacker can execute commands under the user UID, rm -rf ~ is only a few characters long, and the user will lose everything. Q: "Now I'm scared, maybe my code is buggy. Can you show me some examples of security holes?" A: Now you're talking. The entire philosophy can be summed up as "Never trust input data." Most security holes are exploited by sending data to the script that the author of the script did not anticipate. Let's look at some examples. Foo wants people to be able to send him email via the web. She has several different email addresses, so she encodes an element specifying which one so she can easily change it later without having to change the script. (She needs her sysadmin's permission to install or change CGI scripts -- what a hassle!) Now she writes a script called "email-foo", and cajoles the sysadmin into installing it. A few weeks later, Foo's sysadmin calls her back: crackers have broken into the machine via Foo's script! Where did Foo go wrong? Let's see Foo's mistake in three different languages. Foo has placed the data to be emailed in a tempfile and the FooAddress passed by the form into a variable. Perl: system("/usr/lib/sendmail -t $foo_address < $input_file"); C: sprintf(buffer, "/usr/lib/sendmail -t %s < %s", foo_address, input_file); system(buffer); C++: system("/usr/lib/sendmail -t " + FooAddress + " < " + InputFile); In all three cases, system is forking a shell. Foo is unwisely assuming that people will only call this script from *her* form, so the email address will always be one of hers. But the cracker copied the form to his own machine, and edited it so it looked like this: Then he submitted it to Foo's machine, and the rest is history, along with the machine. Q: "I never use system. I guess my scripts are all safe then!" A: System is not the only command that forks a shell. In Perl, you can invoke a shell by opening to a pipe, using backticks, or calling exec (in some cases.) * Opening to a pipe: open(OUT, "|program $args"); * Backticks: `program $args`; * Exec: exec("program $args"); You can also get in trouble in Perl with the eval statement or regular expression modifier /e (which calls eval.) That's beyond the scope of this document, but be careful. In C/C++, the popen(3) call also starts a shell. * popen("program", "w"); Q: "What's the right way to do it?" A: Generally there are two answers: use the data only where it can't hurt you, or check it to make sure it is safe. *1* Avoid the shell. open(MAIL, "|/usr/lib/sendmail -t"); print MAIL "To: $recipient\n"; Now the untrusted data is no longer being passed to the shell. However, it is being passed unchecked to sendmail. In some sense you are trading the shell problems for those of the program you are running externally, so be sure that it cannot be tricked with the untrusted data! For example if you use /usr/ucb/mail rather than /usr/lib/sendmail, ~-escapes can be used (on some versions) to execute commands. Be wary. You can use the perl system() and exec() calls without invoking a shell by supplying more than one argument: system('/usr/games/fortune', '-o'); You can also use open() to achieve an effect similar to popen, but without invoking the shell, by performing open(FH, '|-') || exec("program", $arg1, $arg2); *2* Avoid insecure data. unless($recipient =~ /^[\w@\.\-]+$/) { # Print out some HTML here indicating failure exit(1); } This time we're making sure the data is safe for passing to the shell. The example regexp above specifies what is safe rather than what is unsafe. if($to =~ tr/;<>*|`&$!#()[]{}:'"//) { # Print out some HTML here indicating failure exit(1); } Or, to escape metacharacters rather than just detecting them, a subroutine like this could be used: sub esc_chars { # will change, for example, a!!a to a\!\!a @_ =~ s/([;<>\*\|`&\$!#\(\)\[\]\{\}:'"])/\\$1/g; return @_; } These regexps specify what is unsafe. I believe them to be a complete list of potentially dangerous metacharacters, but I have no authoritative source to check. The difference between the latter two regexps and the first is the difference between the two security policies "that which is not expressly permitted is forbidden" and "that which is not expressly forbidden is permitted." All security professionals will tell you that the former policy is safer. For maximum security, use both *1* and *2* where possible. USE PERL TAINT CHECKS: Perl can be very helpful with these problems. Invoke it with perl -T to force taint checks; to learn about taint checks, see the perl man page. (The -T option exists only under Perl5.) Q: Can I trust user supplied data if there is no shell involved? A: No. There are other issues as well. Consider this perl code fragment: open(MANPAGE, "/usr/man/man1/$filename.1"); This is intended to allow HTML access to man pages. However, what if the user supplied filename is ../../../etc/passwd Anytime you are dealing with pathnamess, be sure to check for the .. component. Q: "What else?" A: In C and C++, improperly allocated memory is vulnerable to buffer overruns. Perl dynamically extends its data structures to prevent this. Imagine code like this: int foo() { char buffer[10]; strcpy(buffer, get_form_var("feh")); /* etc */ } When writing this code, the author certainly expected the value of the feh variable to be less than 10 characters. Unfortunately for him, he didn't make sure, and it turned out to be much longer. This means that user data is overwriting the program stack, which in some circumstances can be used to invoke commands. This is very difficult to exploit and you probably will not encounter it. Still, it's worth mentioning; a very similar hole was found in NCSA httpd 1.3 earlier in 1995. It is poor programming practice not to check such things anyway. Along the same lines, under no circumstances should the C gets() function be used. It's inherently insecure, as there is no way to specify how large the input buffer is. Use fgets() on the stdin stream instead. Q: "My WWW server doesn't run on a unix platform. Only unix has all these nasty security holes." A: This may or may not be true. The author of this document has limited experience with servers on other platforms, but he is more than a little skeptical that security concerns do not exist. At the very least, the gets() and stack-overflow issues are present on Windows and MacOS as well. Specific examples of other CGI dangers on other platforms are welcomed. *Appendix* Contributions to this document welcomed at . Thanks to those that have contributed to this document: John Halperin Maurice L. Marvin Dave Andersen Zygo Blaxell Joe Sparrow