Torture-Testing Web Servers

Lincoln D. Stein
Cold Spring Harbor Laboratory
April 19, 1999

A few years ago I wrote a small Perl script called "torture.pl" whose purpose in life is to inflict pain and suffering on hapless Web servers. It sends servers increasing amounts of random data at increasingly shorter intervals until they either crashed or slowed to the point of unusability. In other words, the script launches a denial-of-service attack on Web servers.

Before you call the cops and have me dragged off to the special prosecutor's office, let me explain. This script has two functions. First, it can be used to test the speed and responsiveness of a Web server. Second, the script can be used to test the stability and reliability of a particular Web server.

When used for performance testing, you can measure the speed and response time of your Web servers, CGI scripts, and other Web enhancements. Although torture.pl isn't rigorously normalized for cross-server comparisons the way the WebStone metric is, it's good for measuring changes on a single Web server. Worried about the performance impact of a configuration change? Just run the test before and after the change to measure its effects.

When used in torture-testing mode, torture.pl sends large amounts of random data to a server, trying to make it crash. If a server, CGI script, module, or template processor can't handle large amounts of random data, then it's not particularly well written and might even contain security holes.

Using torture.pl for Performance Testing

Torture.pl was written to take advantage of Perl's abilities to multitask on Unix (and Linux) platforms. For this reason, it won't run on Windows machines. However I've got a multithreaded prototype of the code up and running for Unix platforms, and this might be available for use on Win32 machines by the time you read this. Check my Web site (see URL resources).

torture.pl is command-line oriented. To use it, just give it the name of a URL to fetch. This example shows its simplest form:

% torture.pl http://my.site.com/index.html ** torture.pl version 1.05 starting at Mon Apr 19 08:17:38 1999

 Transactions:           1
 Elapsed time:           0.03 sec
 Bytes Transferred:      897 bytes
 Response Time:          0.03 sec
 Transaction Rate:       43.75 trans/sec
 Throughput:             39244.08 bytes/sec
 Concurrency:            1.0
 Status Code 200:        1

** torture.pl version 1.05 ending at Mon Apr 19 08:17:38 1999

In this example, I asked torture.pl to fetch the URL http://my.site.com/index.html using the default settings. It fetched the requested document and printed summary statistics about the time taken to retrieve it. Before we talk about the significance the summary statistics, let's look at another example:

% torture.pl -c 5 -t 10 http://my.site.com/index.html ** torture.pl version 1.05 starting at Mon Apr 19 11:29:51 1999

 Transactions:           50
 Elapsed time:           0.723 sec
 Bytes Transferred:      44850 bytes
 Response Time:          0.07 sec
 Transaction Rate:       69.18 trans/sec
 Throughput:             62058.25 bytes/sec
 Concurrency:            4.5
 Status Code 200:        50

** torture.pl version 1.05 ending at Mon Apr 19 11:29:52 1999

This example differs from the previous one by providing two command line switches. The -c switch sets the concurrency level to 5. This means that the script will fork itself five times to create five identical children, each of which will run the retrieval tests concurrently. The -t switch sets the number of times each child will try to retrieve the URL. In this case, I specified 10. Not used in this example is the -d switch, which inserts a random delay between each fetch to make the tests more like the behavior of real users.

Now let's look at the headings in the output:

Transactions:
The number of times the script fetched the requested URL. In this case, the number is 50, equal to 5 children each trying to fetch the URL 10 times.

Elapsed time:
The number of seconds it took to run the fetches, summed across all the children.

Bytes transferred:
The total number of bytes sent or received, less HTTP headers.

Response time:
The average time it took for the server to respond to each child's request, in this case 0.07 seconds.

Transaction rate:
The average number of transactions the server was able to handle per second.

Throughput:
The average number of bytes transferred per second, summed across all children.

Concurrency:
The average number of simultaneous connections the server was able to handle during the test session. Unless the server is overloaded, this number will usually be close to the requested concurrency level.

Status code XXX:
This indicates how many times a particular HTTP status code was seen. In this case we see that all 50 requests resulted in a status code of 200, which in HTTP lingo means "OK".

The three most important performance measurements are the response time, the transaction rate and the concurrency. The response time is closest to the performance that the remote user sees. It is the total time to send the request to the server and receive the complete response back. Lower response times are better.

The transaction rate is the performance that the webmaster sees. It is the total number of requests that can be processed per second. Because modern Web servers are multitasking and/or multithreading, it is possible for a Web server to support a much higher transaction rate than the response time would at first suggest. The higher the transaction rate is, the better the performance.

The concurrency statistic measures how well the server multitasks among multiple simultaneous requests. The higher the number, the better.

It is important to realize that all these metrics are dependent on more than just the Web server. They are affected by the network bandwidth and latency, the speed and amount of memory on the web server host, and, at high loads, on the speed of the machine that the test script is running on. While preparing this article I tried to "saturate" a web server running on a high-end desktop machine from the torture script running on a low-end laptop. After multiple unsuccessful attempts, I realized that the laptop could never keep up with the Web server, and reversed the test!

To give you an idea of how useful torture.pl can be, look at Figure 1 in which I tested the ability of my Apache-based server (Apache 1.3.4/Linux) to service requests under a variety of conditions:

  1. Apache serving a static HTML page when running in its default preforking mode. This is a mode in which it creates multiple copies of itself at startup time. Each copy handles incoming requests.
  2. Apache serving a static HTML page when running in its non-forking debug mode. In this version, Apache spawns processes only when needed to handle each incoming request.
  3. Apache serving a dynamic CGI page when running in its default mode. The CGI script used to test this condition was written in Perl. It loads the standard CGI.pm module and generates a simple fill-out form.
  4. Apache serving a dyamic CGI page when running the mod_perl embedded Perl interpreter. This was the same CGI script as before, but instead of running an external copy of the Perl interpreter, it uses the version embedded in the server software, cutting load time.

I repeated the test under differing concurrency loads, starting with one copy and increasing the load to 75 concurrent clients. The results are rather interesting. First, you don't see differences between the preforking and non-preforking versions of Apache until you reach high levels of load -- 75 clients or higher. (Note that this is measured on a laptop server; your mileage will vary).

Second, these differences are dwarfed by the huge difference between serving static pages and running CGI scripts. At the condition of maximum load, the server response time had skyrocketed to more than 60 seconds per page, as opposed to about a second for a static page. In contrast, the overhead from mod_perl was much smaller. At lower load levels, the difference between static and dynamic pages was negligible, and even at the highest load levels, the response time had degraded to only 2.5 seconds.

Using torture.pl to Test Server Stability

My original motivation for writing torture.pl was to test Web servers for vulnerability to static buffer overflow problems. To take advantage of this feature, add the -l (limit) switch to the command line. This switch specifies the maximum number of bytes that torture.pl will send to the server. To add a bit of realism to the scenario and to detect subtle overflow bugs that only manifest themselves when certain data lengths are encountered, torture.pl creates random data up to the length limit you specify.

Here is the output you might get when you specify a random data limit of 9000 bytes and ask to run the tests 10 times:

% torture.pl -t 10 -l 9000 http://my.site.com/index.html ** torture.pl version 1.05 starting at Mon Apr 19 15:47:52 1999

 Transactions:           10
 Elapsed time:           3.761 sec
 Bytes Transferred:      65764 bytes
 Response Time:          0.38 sec
 Transaction Rate:       2.66 trans/sec
 Throughput:             17485.19 bytes/sec
 Concurrency:            1.0
 Status Code 200:        8
 Status Code 414:        2

** torture.pl version 1.05 ending at Mon Apr 19 15:47:57 1999

After the test script runs, the same statistics are printed out as before. The difference is only evident when you look in the server log. You'll see that torture.pl requested index.html a total of ten times, each time appending a "?" mark followed by a long string of garbage characters. In effect, torture.pl creates a long random query string.

The interesting part of the results are the status codes. You'll see that status code 200 ("OK") was returned 8 times, while status code 414 ("requested URL too long") was returned twice. Apache has a default buffer of 8192. Twice a random query string was generated that was too long for the buffer, so Apache returned the 414 status code. The rest of the time the random data fits within the buffer, so we get a status code of 200. To really evaluate a server's stability, one would have to run this test many times (-t 1000 for instance), and with several large values for the load setting. Signs of a server malfunction include status code 500 ("internal server error") or the test script hanging indefinitely. To help you sort things out, torture.pl prints out a short error message every time it thinks it's discovered a real problem.

Torture.pl has a number of other command-line switches to adjust its server stability test. These are listed briefly below:

-p (add to path)
Usually the random data is appended to the query string part of the requested URL. If -p is specified, the data will be appended to the URL itself. When attached to an ordinary URL, this switch should cause intermittent 404 "not found" error. When attached to a CGI script's URL, this is a good way of testing for static overflow bugs in the script's parsing of additional path information.

-P (post)
Instead of appending the random data to the requested URL, treat the URL as a CGI script and POST the data to it. You can use this to test for overflow bugs in CGI scripts' form-parsing routines.

-r (raw)
If the -r switch is present, the script will send raw binary data to the server. It is intended for use with the -P switch. Otherwise it will cause sporadic 400 "malformed header" errors. If you see a 500 error, something's wrong.

How torture.pl Works

I'll briefly take you on a tour through torture.pl source's code. It's a good example of how to take advantage of Perl's multitasking and TCP/IP communications features.

Listing 1 shows the main body of torture.pl. The script first brings in five Perl library modules. Time::HiRes provides microsecond timing accuracy, and is a must for this type of performance measurement tool. IO::Socket brings in a simple object-oriented interface for TCP/IP networking, and IO::Pipe provides access to Unix data pipes, which are used for interprocess communication between torture.pl and the child processes it launches. The script imports the WNOHANG constant from the POSIX module, and brings in command-line switch processing routines from the Getopt::Std module.

With the exception of Time::HiRes, all these modules are part of the standard Perl distribution. You can obtain Time::HiRes at any CPAN site (see URL resources).

The script now processes any command line switches and stores them in global variables named $TIMES, $COPIES, and so forth. The URL to fetch is retrieved from the command line and stored in the variable $URL.

After preparing a few additional globals, the script is ready to go to work. It calls the subroutine do_stats() twice, once with a false argument (0) and once with a true argument. The do_stats() subroutine runs the tests and returns a hash reference containing the results. The argument tells the subroutine whether or not to do the URL fetch. When passed a false argument, do_stats() does everything except actually fetching the URL, returning "dummy" timing information that reflects only the script's overhead for generating the URL request, spawning children, and so forth.

Having collected the dummy and real timing information, we subtract the elapsed time in the dummy run from the elapsed time in the real run. These values are stored in the hash reference in a key called "elapsed". We do the same thing for the response time, which is stored in a key called "trans_time" (for "transaction time).

We now print out the adjusted timing results, by passing the adjusted timings to a function named print_results(), print a banner, and exit.

Listing 2 contains the definition for do_stats(), which runs the timing tests. Its job is to spawn the requested number of children, launch the individual tests, and collate the results. The main challenge for this function is to gather the results from possibly dozens of child processes. It does this by creating an IO::Pipe object, which is a communications channel that can be used by the children to send test results to the parent.

After creating the pipe, the do_stats() routine loops the number of times indicated by the -c switch. Each time through the loop, it spawns a new child process by calling the Unix fork() command. fork() returns twice, once in the parent, and once in the newly-created child. The code can tell which process its in by looking at the result code from fork(). In case of a system error, fork() returns an undefined value and the program aborts. In the parent process fork() returns the process ID of the new child, and the program simply continues looping. However, in the child process, fork() returns zero. When this happens, the code calls the IO::Pipe's writer() method to signal that it will be used for output, and selects the pipe as the default destination for output. The child then calls run_test() (which we'll look at later) and exits.

The parent process continues looping until it has created all the children it needs. It then installs a signal handler to intercept and handle the CHLD signal, which Unix sends to parent processes when their children exit. For reasons that are not worth going into here, well-written Unix programs should handle the CHLD signal by calling waitpid() in the manner shown.

Next the parent indicates that it will be reading from the pipe, by calling the IO::Pipe's reader() method. It passes the pipe to tally_results() to collect and tabulate the children's results, and returns the calculated statistics to the main body of the program.

Also see how do_stats() calls time() twice, once at the beginning of the subroutine and once at the end. With the help of Time::HiRes, time() returns the system time to microsecond precision, allowing the subroutine to determine the total elapsed time for the tests accurately.

Listing 3 is the most complex part of the program. It contains the run_test() routine, which is responsible for repeatedly constructing and fetching URLs from the server. It loops up to the count indicated by the -t command line switch. Each time through the loop, the subroutine sleeps for a random period of time up to the delay specified by the -d switch, and then constructs a URL to fetch using the settings specified by the -l, -p and -P switches. If the $doit flag is true, the program tries to fetch the URL by calling the fetch() subroutine (which we'll look at soon).

The fetch() routine returns a three-member list consisting of the server's status code, the server's error message, if any, and the document contents. Note the two calls to time() to determine the response time. This data is now printed as a simple tab-delimited list, preceded by the current process ID retrieved from the magic $$ variable. Because the default output filehandle is the IO::Pipe object, this information is sent to the pipe rather than appearing on standard output.

The random_string() and escape() functions are helpers used by run_test(). random_string() creates a string of any desired length containing random binary characters. escape() takes binary data and translates it into a character string suitable for use as a URL. This is a quick-and-dirty substitute for the LWP library's URI::Escape routines, which are more complete than this one, but turn out not to be thread-safe.

Listing 4 contains the definition of fetch(), a quick and dirty HTTP client. Given an http: URL, and optionally some content to POST, this routine will contact the remote host and fetch the indicated document. The LWP library offers several more featureful versions of this routine, but they have some drawbacks. First, they don't allow one to placed unescaped binary data in URLs, which is a feature I needed to support the -r switch. Second, they don't appear to be thread-safe, which I discovered while working on the multithreaded version of this program. Finally, the full-featured LWP routines are slower; this program places a premium on performance.

The routine begins by parsing out the name of the host and the requested URL path. Unless otherwise indicated, the routine appends ":80" to the hostname to tell IO::Socket to open the standard Web port 80 on the host. It then calls IO::Socket's input_record_separator() method to set the end of line character to a pair of carriage-return/linefeed pairs. This makes it possible to read the entire returned HTTP header in a single operation.

The subroutine is now ready to open a connection to the remote server, which it does by calling the new() method of the IO::Socket::INET class. This is a subclass of IO::Socket which has specialized methods for dealing with Internet connections. If the call to new() is successful, IO::Socket::INET returns a new filehandle connected to the remote server. Otherwise it returns undef, and we return a 503 "connection refused" error message.

It's time to send the request using the filehandle's print() method. We send either a POST request or a GET request, depending on whether there's any content to send. We print out the HTTP header lines in exactly the way a real Web client would, except that various optional fields are missing, such as the make and model of the client software.

Having sent the request, we can read the response. This is a matter of calling getline() once to retrieve the HTTP header. We parse the header into its components. If something goes wrong at this stage, we return a 400 "malformed header" message. This typically happens when you try to retrieve a URL with unescaped binary characters. The server doesn't recognize the request as being a valid HTTP/1.0 header, so it reverts to the headerless HTTP/0.9 protocol.

If there's any document body associated with the response, we attempt to read it by calling the filehandle's read() method until it returns a byte count of zero. The subroutine closes the filehandle, and returns the parsed status code, the status message, and the body of the document.

The last bit of the program, shown in listing 5, is the print_results() subroutine, which simply creates a pretty-printed listing of the accumulated statistics.

Possible Enhancements

The torture.pl script is a useful start, but there are some things that one might do with it to make it an even better server evaluation tool. In addition to porting the program to Win32 systems, the most useful enhancement I can think of would be to give it the capability of reading a list of URLs from a file or pipe. It would then try to retrieve members of this list at random, rather than fetching the same URL repeatedly. Hook this enhanced version up to program that recursively lists the links on your Web site in order to simulate a group of users clicking on your site's links.

Feel free to adapt torture.pl for your own purposes. If you think you've made generally useful improvements, send the modifications back to me so that we can share them with the community.

URL Resources

Comprehensive Perl Archive Network (CPAN)
http://www.perl.com/CPAN/

The torture.pl Home Page
http://stein.cshl.org/~lstein/torture


Cold Spring Harbor Laboratory, Stein Lab
Last modified: Tue Jun 1 07:04:06 EDT 1999