* Anatomy Of A Web Site Crash

When our columnist’s Web site experienced a denial-of-service attack, the resulting devastation might be called the ‘loop condition from hell.’

By Erik J. Heels

First published 11/1/1999; Law Practice Management magazine, “nothing.but.net” column; American Bar Association

A Scottish comedian (Billy Connolly, I believe) was complaining that it’s ironic Alanis Morrisette’s song “It’s Ironic” isn’t really about irony. He quipped, “A traffic jam when you’re already late? That’s not ironic, it’s annoying. Unless of course you were on your way to give a presentation about how you’d solved the city’s traffic problems.” So maybe it’s not ironic that this month’s column is about how important it is for you to back up your data. And about how I learned the lesson the hard way after suffering a major computer failure.

Denial of Service

It all began innocently enough in May. I got an e-mail from a colleague at Red Street Consulting that our Web site (http://www.redstreet.com/) was experiencing a denial-of-service attack. A denial-of-service attack is when a person or program sends repeated requests to your computer in an attempt to bring it to its knees. For example, you could overload a Web server with numerous HTTP requests (issued by simply entering a URL in your Web browser) to the server.

Let me give you an example of just what I mean by “numerous.” In April, our Web site transferred 729M of traffic, well within (about one seventh of) the 5G of traffic our Web hosting provider allows under our current plan. Additional traffic is billed at $0.10 per megabyte per month. In May, we transferred 11.1G, more than twice the amount allowed by our provider, including 700,000 hits on one day from one address. Our first overage bill was for $613.80. In the first two days of June, we transferred 12G of data. We ended the month of June having transferred 16G of data, more than three times our limit (and with a $1,100 overage charge).

Our first challenge was to figure out where the attacks were coming from. The second was to stop them. The third (and, to date, unsolved) challenge was to fix the problem at its source.

In order to figure out where the attacks were coming from, we had to look at access log files that were about 8M large. We use WebTrends to analyze our log files, but it was unable to tell us which file or files were being accessed.

Disk Crash

The first weekend in June, I downloaded the Web access log from our Web server to my Power Macintosh G3. I routinely run insanely complex tasks on this computer without giving it a second thought. So I thought nothing of sorting the 8M Web log file. By sorting the file, I hoped to single out the file(s) that seemed to be the target of the attacks. I started to sort and went to bed, figuring it would take about an hour.

Unfortunately, sorting an 8M log file is beyond the scope of what even this computer is designed to do. Especially when the majority of the very long lines in the file differ by only a few characters (i.e., the date). When I got up in the morning, the sort was still running. I quit the program, but not until more than 1.5 million (yes, that’s right) temporary files had been created on my hard disk. No personal computer file system (Mac nor Windows) is designed to handle this many files, so when I tried to delete the folder with the temporary files, the display said “preparing to empty trash” and then started counting off how many files it was getting ready to delete. I stopped counting at 1.5 million. The total number was probably more like 4M. At any rate, the computer crashed, and its file system was seriously corrupted.

I purchased a Jaz drive to back up the damaged disk, in case I was able to mount the hard disk – but I wasn’t. I was able to look at it with Norton Utilities, which reported that it was unable to fix the “severely damaged” disk. I tried DiskWarrior, and that failed, too. I took it to CompUSA, and they said they couldn’t fix it. They referred me to a Mac repair specialist who said he couldn’t fix it. He referred me to a national data recovery company called OnTrack (http://www.ontrack.com/).

I shipped my computer to OnTrack, and less than two weeks (and 1,300 well-spent dollars) later, I got my computer back. OnTrack was able to recover approximately one third of the data on the hard disk. I had a backup from March, which served as my starting point, but I still ended up losing two thirds of my data from March to June.

Was it worth $1,300 to get one third back? Absolutely. How much are your data worth? And can you say “consequential damages”?

People or Program?

Meanwhile, back at the Web site, the attacks continued – from multiple IP addresses. From our log files and from looking up IP addresses with ARIN’s (the American Registry for Internet Numbers) whois program (http://www.arin.net/whois/), we were able to determine that two of the IP addresses that were attacking our site were owned by NLJ 250 law firms. One was White and Case (http://www.whitecase.com/), the other Haynes and Boone (http://www.hayboo.com/).

Now, we didn’t think much of the attack – at first. In fact, I had been expecting such an attack. The main reason people visit our Web site is because we publish reviews – good and bad – of NLJ 250 law firm Web sites. If a firm received a poor review, it’s not out of the question that our site would be hacked or attacked in retaliation. A disgruntled marketeer, MIS technician or Web site designer would be who we’d suspect first.

Our retaliation theory soon fizzled. We looked at our most recent review of White and Case’s site (http://www.redstreet.com/Reviews/Reviews1998/white case/index.shtml) and discovered we’d given it a less-than-stellar score of 15. Top sites score 20 or more out of a possible 30. But Haynes and Boone scored 24/30 in our 1998 reviews – among the best of the firms we reviewed.

We also contacted both law firms to ask if they knew anything about the attacks. Both were as puzzled as we were. The IP addresses we identified could not be tied back to a particular user in the firm. After a bit more digging, asking and analyzing, we discovered both firms used the proxy server caching program Novell BorderManager (http://www.novell.com/BorderManager/). BorderManager is designed to do many things, including caching popular Web sites for an enterprise, just like individual browsers cache sites for particular users. Further research indicated all attacks, not just these two, were generated by Novell BorderManager.

But why? As it turns out, Red Street includes snapshots of what particular law firm Web site home pages looked like when they were reviewed. Some of these pages include broken images, and the broken images appear in our Museum. When BorderManager encounters a broken image, it tries to load it. The broken image file generates a “404 file not found” error. Web sites can and should be customized to display a helpful message in place of the generic “404 file not found” error message. At Red Street, we do this by adding one line (Error Document500/Search/search.shtml) to our “.htaccess” file on our Web server. If you access http://www.redstreet.com/bogus-filename.html, the ErrorDocument (http://www.redstreet.com/Search/search.shtml) will load instead of the standard boring “404 file not found” message.

Our site was built with NetObjects Fusion. Fusion uses relative links (such as “../../_images/redst.gif”) instead of absolute links (“http://www.redstreet.com/_images/redst.gif”). This works for all the pages generated with Fusion. However, there are pages buried deep on our site (in our Museum) that have broken images. Since the Museum is a collection of Web sites as they appeared at a particular time, the inclusion of broken images is intentional. When the ErrorDocument is accessed from a page in the Museum, the ErrorDocument itself contains broken links because of the relative links in the ErrorDocument file (i.e., the search.shtml file).

BorderManager finds the page with the broken image, the broken image causes the ErrorDocument to be loaded, the ErrorDocument has broken images resulting from the relative links, which causes more ErrorDocuments to be loaded, and a loop condition is created.

Loop conditions have existed as long as software has existed. The solution has been around just as long.

Here is a simple example of a loop: Mary goes on vacation and sets up her e-mail to reply automatically to all messages, explaining she is on vacation. One of the incoming messages is from John, telling Mary he is going on vacation and has set up his e-mail to reply automatically to all messages with a vacation note. John’s note to Mary generates an auto-reply, Mary’s auto-reply generates an auto-reply, etc.

Damage Control

We were in damage control mode at Red Street. We wrote a perl script to download the Web log file every so often, check its size and, if it is too big, grab the IP address that hit us the most, modify the “.htaccess” file to block it out, and send e-mail out to let us know what it did. This at least stopped the attacks (and the overage charges) temporarily.

After we realized that BorderManager was the offending program, we disallowed access to our site from BorderManager. We disabled all BorderManager hits on our site by adding one line to our http://www.redstreet.com/robots.txt file:

User-agent: BorderManager 3.0

Disallow: /

The robots.txt file is designed to tell agents which files they may or may not access. The “robots exclusion standard” is a de facto industry standard (http://info.webcrawler.com/mak/projects/robots/norobots.html) respected by all reputable software developers.

A workaround to the problem is to disable the ErrorDocument and just present the standard 404 error message. We did this, and added absolute URLs to our ErrorDocument. But the real problem is BorderManager. Any site that has the same combination of features as ours will have the same problem. In fact, we heard the RadioShack and The New York Times sites were similarly affected.

Lessons Learned

Denial-of-service attacks can happen to your own computer, although this is rare. The most common way for your own PC to be attacked is by a virus, a piece of software that does destructive or annoying things. But Java applets can cause denial-of-service attacks (see http://java.sun.com/sfaq/denialOfService.html). Personally, I find Java (but not Javascript) annoying, and I browse the Web with Java turned off on my browser. I also use Norton AntiVirus to protect against other viruses.

We continue to deal with BorderManager attacks. Perhaps the robots.txt file needs to be updated to block out new versions of BorderManager. Anybody know a good lawyer?

I now back up my hard disk about every two weeks to my Jaz drive. But what if my office (where I store my Jaz backup) gets flooded? My backup solution is incomplete. Red Street also now has online hard disk backup on its site, which is at least a partial solution.

The problem has been frustrating but admittedly fascinating. Kind of like an orthopedic surgeon who breaks his foot. Sure, it’s painful – but what a case study!