Making Sense of W2K3 Performance Numbers

Vin Diesel made his big breakthrough in the film The Fast and the Furious. Though that movie was about street racing, it might as well have been about server performance. If the servers aren’t fast, the users become furious. That’s why Microsoft has been issuing a steady stream of Windows Server 2003 (W2K3) benchmark results showing it to be much faster than its predecessors and able to play with the big boys — 32-way Unix boxes running things like Oracle, SAP and Siebel.

In addition, Microsoft released a study showing that Windows is more than a high-speed dragster. The latest numbers show that it can hold its own against Linux in an endurance competition. Or can it? While it’s easy to stand at the finish line and see which car crosses first, evaluating computer performance benchmarks gets a bit trickier. So, let’s take a look at a few of these Windows Server 2003 tests and see what they do and do not show.


Microsoft vs. Microsoft

The first race we’ll cover is between W2K3, Windows Server 2000 and NT 4.0. Why did Microsoft compare its latest release with one two generations earlier? Well, despite the fact that Windows 2000 has been out for four years now, many companies still haven’t made the migration to the newer operating system. If Microsoft doesn’t give them a good reason to switch to W2K3, it may lose some of them to Linux when they eventually upgrade. So Microsoft is targeting more of the NT 4.0 users rather than organizations already using Windows 2000. For those users, the performance boost from W2K3 is impressive.

“There are still an extraordinary number of NT4.0 Servers out there,” says William P. Hurley, senior analyst for Enterprise Application Group in Portland, Ore. “Users moving to the Windows Server 2003 platform will benefit immediately from new product compatibility and a sea change in product stability and performance.”

Microsoft hired Lionbridge Technologies Inc.’s VeriTest division to conduct tests showing how much faster the new operating system is. VeriTest used its NetBench (file server performance) and WebBench (Web server performance) tests. In addition it used MindCraft Inc.’s DirectoryMark software, which benchmarks LDAP (Lightweight Directory Access Protocol) performance on directory servers. DirectoryMark was not run on NT 4.0 since Microsoft did not release Active Directory until Windows 2000.

Hardware for the tests consisted of three different Hewlett-Packard ProLiant servers — DL760s with 4GB RAM and either four or eight 900 MHz Intel Xeon processors and a DL380 with 2GB RAM and dual 1.4 GHz Pentium III processors.

So, how did W2K3 do? It consistently outperformed both of its predecessors, sometimes by quite a bit. For example, when using the eight-way DL760, W2K3 outdid NT 4.0 by 483% on the static Web server test.


Microsoft vs. Unix

Microsoft also has been touting the performance numbers of its 64-bit version of Windows Server 2003 Datacenter Edition when running the SQL Server. According to the Transaction Processing Performance Council, the Windows/SQL combination is giving Oracle database running on HP-UX a run for its money.

This is discussed in more detail in an Oct. 8 article. Since then, however, Oracle has boosted its performance figures again on the TPC-C benchmark by another 22%, so it now holds the top two spots on the list, with Windows in third place. But in a contest similar to the MHz race AMD and Intel were having a few years ago, these rankings seem to change every few months, so you should check the latest results before making any decision based on this information.


Microsoft vs. Linux

Peak speed is one criterion to measure server performance, but reliability is just as important. The speed drops to zero when the server crashes. Since Linux has the reputation for being more reliable than Windows, Microsoft commissioned VeriTest to run some tests in an attempt to dispel this perception.

VeriTest set up six Dell PowerEdge 1500 SC servers — three running Windows Server 2003 and three running Red Hat Linux 7.2. Each server was then connected to 20 Dell Optiplex workstations which ran Microsoft Outlook, file sharing and print scripts against the servers.

The tests ran for 61 days and VeriTest compared the uptime for the two operating systems. It concluded that, based on the test results, Windows was just as reliable as Linux.

The numbers that Microsoft is promoting from these tests are all correct, but as with any test, you need to know exactly how they were conducted and what the numbers really show. Before taking them at face value, look at the underlying test methodology and all of the test results, not just the ones given in the headlines.

For example, let’s take a look at the Windows vs. Windows performance tests. Much of the difference in results is because W2K3 makes better use of eight processors than its predecessors. In one WebBench test which used a mix of CGI and SSL traffic running on a DL760 bit only utilizing one of the processors, W2K3 outperformed NT by 50% and W2K by 28%. With all eight processors brought into play, W2K3 beat NT by 330% and W2K by 587%. W2K3 went from 1020 requests per second with one processor up to 2480 with eight processors. NT and Windows 2000, however, performed worse with eight processors (577 requests per second for NT and 361 for W2K) than with one processor (682 for NT and 796 for W2K).

On the Windows vs. Linux test, all the systems had 100% uptime, so the two O/Ses performed identically. Seeing that the two have millions of lines of different code, you would expect there to be some difference in how well they do. But the test, as designed and executed, didn’t reflect any variance. Would the results have been different if the tests were run longer? On different machines? With non-Microsoft applications? With heavier traffic load? We just don’t know.

The bottom line is that Windows Server 2003 is clearly faster and more reliable than earlier versions of Windows. How much so depends on how you use it. If the benchmarks match your own server utilization, then you can use the benchmarks. Otherwise take any such claims with a grain of salt. The figures are correct, but they may not be meaningless as a sign of how it will perform in your own environment.