O'Reilly Network    
 Published on O'Reilly Network (http://www.oreillynet.com/)
 See this if you're having trouble printing code examples

Top Six FAQs on Windows 2000 Disk Performance

by Mark Friedman, author of Windows 2000 Performance Guide

I respond to a lot of questions about Windows 2000 disk performance. In this article I've provided answers to the most frequently asked questions I've received from experienced computer performance professionals. What I've found is when these professionals first start to look seriously at the data available on disk performance on a Windows NT/2000/XP machine, they usually ask one or more of the six questions raised here. These questions arise because something doesn't seem right when they look closely at the data. I have tried to answer the questions succinctly and in such a way that a person who already knows their way around disk performance issues can make immediate sense of the Windows 2000 environment.

1. The Physical Disk % Disk Time counters look wrong. What gives?

Often when you add the % Disk Read Time and % Disk Write Time counters together, they do not add up to % Disk Time. The % Disk Time counters are capped in the System Monitor at 100 percent because it would be confusing to report disk utilization greater than 100 percent. This occurs because the % Disk Time counters do not actually measure disk utilization. The Explain text that implies that they do represent disk utilization is very misleading.

What the % Disk Time counters actually do measure is a little complicated to explain.

Related Reading

Windows 2000 Performance GuideWindows 2000 Performance Guide
By Mark Friedman, Odysseas Pentakalos
Table of Contents
Sample Chapter
Full Description

The %Disk Time counter is not measured directly. It is a value derived by the diskperf filter driver that provides disk performance statistics. diskperf is a layer of software sitting in the disk driver stack. As I/O Request packets (IRPs) pass through this layer, diskperf keeps track of the time I/O's start and the time they finish. On the way to the device, diskperf records a timestamp for the IRP. On the way back from the device, the completion time is recorded. The difference is the duration of the I/O request. Averaged over the collection interval, this becomes the Avg. Disk sec/Transfer, a direct measure of disk response time from the point of view of the device driver. diskperf also maintains byte counts and separate counters for reads and writes, at both the Logical and Physical Disk level. (This allows Avg. Disk sec/Transfer to be broken out into reads and writes.)

The Avg. Disk sec/Transfer measurement reported is based on the complete roundtrip time of a request. Strictly speaking, it is a direct measure of disk response time-–which means it includes queue time. Queue time is the time spent waiting for the device because it is busy with another request or waiting for the SCSI bus to the device because it is busy.

% Disk Time is a value derived by diskperf from the sum of all IRP roundtrip times (Avg.Disk sec/Transfer) multiplied by Disk Transfers/sec, and divided by duration, or essentially:

% Disk Time = Avg Disk sec/Transfer * Disk Transfers/sec

which is a calculation (subject to capping when it exceeds 100 percent) that you can verify easily enough for yourself.

Because the Avg. Disk sec/Transfer that diskperf measures includes disk queuing, % Disk Time can grow greater than 100 percent if there is significant disk queuing (at either the Physical or Logical Disk level). The Explain text in the official documentation suggests that this product of Avg. Disk sec/Transfer and Disk Transfers/sec measures % Disk busy. If (and this a big "if") the IRP roundtrip time represented only service time, then the % Disk Time calculation would correspond to disk utilization. But Avg. Disk sec/Transfer includes queue time, so the formula used actually calculates something entirely different.

The formula used in the calculation to derive % Disk Time corresponds to Little's Law, a well-known equivalence relation that shows the number of requests in the system as a function of the arrival rate and service time. According to Little's Law, Avg. Disk sec/Transfer times Disk transfers/sec properly yields the average number of requests in the system, more formally known as the average Queue length. The average Queue length value calculated in this fashion includes both IRPs queued for service and those actually in service.

A direct measure of disk response time such as Avg. Disk sec/Transfer is a useful metric. Since people tend to buy disk hardware based on a service-time expectation, it is unfortunate that there is no way to break out the disk service time and the queue time separately in NT 4.0. (The situation is greatly improved in Windows 2000, however.) Given the way diskperf hooks into the I/O driver stack, the software RAID functions associated with Ftdisk, and the SCSI disks that support command tag queuing, one could argue this is the only feasible way to do things in the Windows 2000 architecture. The problem of interpretation arises because of the misleading Explain text and the arbitrary and surprising use of capping.

Microsoft's fix to the problem beginning in NT 4.0 is a different version of the counter that is not capped. This is Avg. Disk Queue Length. Basically, this is the same field as % Disk Time without capping and without being printed as a percentage.

For example, if % Disk Time is 78.3 percent, Avg. Disk Queue Length is 0.783. When % Disk Time is equal to 100 percent, then Avg. Disk Queue Length shows the actual value before capping. We recently had a customer reporting values like 2.63 in this field. That's a busy disk! The interpretation of this counter is the average number of disk requests that are active and queued-–the average queue length.

2. I see a value of 2.63 in the Ave Disk Queue Length counter field. How should I interpret this value?

The Avg. Disk Queue Length counter is derived from the product of Avg. Disk sec/Transfer multiplied by Disk Transfers/sec, which is the average response of the device times the I/O rate. Again, this corresponds to a well-known theorem of Queuing Theory called Little's Law, which states:

N = A * Sr

where N is the number of outstanding requests in the system, A is the arrival rate of requests, and Sr is the response time. So the Avg. Disk Queue Length counter is an estimate of the number of outstanding requests to the (Logical or Physical) disk. This includes any requests that are currently in service at the device, plus any requests that are waiting for service. If requests are currently waiting for the device inside the SCSI device driver layer of software below the diskperf filter driver, the Current Disk Queue Length counter will have a value greater than 0. If requests are queued in the hardware, which is usual for SCSI disks and RAID controllers, the Current Disk Queue Length counter will show a value of 0, even though requests are queued.

Since the Avg. Disk Queue Length counter value is a derived value and not a direct measurement, you do need to be careful how you interpret it. Little's Law is a very general result that is often used in the field of computer measurement to derive a third result when the other two values are measured directly. However, Little's Law does require an equilibrium assumption in order for it be valid. The equilibrium assumption is that the arrival rate equals the completion rate over the measurement interval. Otherwise, the calculation is meaningless. In practice, this means you should ignore the Ave Disk Queue Length counter value for any interval where the Current Disk Queue Length counter is not equal to the value of Current Disk Queue Length for the previous measurement interval.

Comment on this articleDo you have other Windows 2000 disk performance suggestions?
Post your comments

Suppose, for example, the Avg. Disk Queue Length counter reads 10.3, and the Current Disk Queue Length counter shows four requests in the disk queue at the end of the measurement interval. If the previous value of Current Disk Queue Length was 0, the equilibrium assumption necessary for Little's Law does not hold. Since the number of arrivals is evidently greater than the number of completions during the interval, there is no valid interpretation for the value in the Avg. Disk Queue Length counter, and you should ignore the counter value. However, if both the present measurement of the Current Disk Queue Length counter and the previous value are equal, then it is safe to interpret the Avg. Disk Queue Length counter as the average number of outstanding I/O requests to the disk over the interval, including both requests currently in service and requests queued for service.

You also need to understand the ramifications of having a total disk roundtrip time measurement instead of a simple disk service time measure. Assuming M/M/1, a disk at 50 percent busy has one request waiting on average and disk response time is 2 times service time. This means that at 50 percent busy--assuming M/M/1 holds--an Avg. Disk Queue Length value of 1.00 is expected. That means that any disk with an Avg. Disk Queue Length value greater than 0.70 probably has a substantial amount of queue time associated with it. The exception, of course, is when M/M/1 does not hold, such as during a back-up operation when there is only a single user of the disk. A single user of the disk can drive a disk to nearly 100 percent utilization without a queue!

3. How was the problem with the % Disk Time counter fixed in Windows 2000?

It may not be fixed exactly, but ultimately, this problem is addressed quite nicely in Windows 2000 (although it would arguably be better had the older, now obsolete % Disk Time counters not been retained).

Windows 2000 adds a new counter to the Logical and Physical Disk objects called % Idle Time. Disk idle time accumulates in diskperf when there are no outstanding requests for a volume.

Having a measure of disk idle time permits you to calculate % Disk Busy equals 100 minus % Idle Time, which is a valid measure of disk utilization.

Then you can calculate Disk Service Time equals % Disk Busy divided by Disk transfers/sec. This is an application of the Utilization Law, namely:

u = service time * arrival rate

Finally, calculate Disk Queue Time equals Avg. Disk secs/transfer minus Disk Service Time, which follows from the definition of response time equals service time plus queue time.

So, measuring Logical and Physical Disk % Idle Time solves a lot of problems. It allows us to calculate disk utilization and derive both disk-service time and queue-time measurements for disks in Windows 2000.

4. Why are the Logical Disk counters zero?

The answer is because you never issued the diskperf –yv command to enable the Logical Disk measurements. When diskperf is not active, the corresponding counters in the System Monitor are zero. In Windows 2000, only the Physical Disk counters are enabled by default (this is equivalent to issuing the diskperf –yd command).

In Windows NT, neither Logical or Physical Disk counters are enabled by default. To enable both sets of Disk counters, issue the diskperf –y command in NT 4.0. You must reboot the system in both Windows 2000 and NT 4.0 in order to activate the new diskperf settings.

5. In Windows NT 4.0, when is it appropriate to issue the diskperf –ye command?

Almost never. I recommend that you use the diskperf –ye option only if you are using the software RAID functions (these include creating extendable volume sets and establishing disk striping, disk mirroring, and RAID 5 Logical volumes) in the Disk Administrator. Setting diskperf –ye allows you to collect accurate Physical Disk statistics when you are using software RAID functions in NT 4.

The diskperf –ye command loads the diskperf.sys filter driver beneath the optional fault tolerant ftdisk.sys disk driver that provides software RAID functions in Windows NT 4.0. When striped, mirrored, or RAID 5 Logical Disks are defined using Disk Administrator functions, the ftdisk.sys module that is responsible for remapping Logical Disk I/O requests to the appropriate Physical Disk is loaded in the I/O driver stack below the NTFS file system driver and before the SCSI Physical Disk driver. When the normal diskperf –y command is issued, diskperf.sys is loaded in front of ftdisk.sys. This allows diskperf to capture information about Logical Disk requests accurately. But because Logical Disk requests are transformed by the ftdisk.sys layer immediately below it, the Physical Disk statistics reported are inaccurate. To see accurate Physical Disk statistics, issue the diskperf –ye command to load diskperf.sys below ftdisk.sys.

Creating extendable volume sets is by far the most common use of the software RAID functions in the NT 4.0 Disk Administrator. You may prefer loading diskperf above ftdisk.sys (using the normal diskperf –y command) to obtain accurate Logical Disk statistics for a volume set.

This problem is addressed in Windows 2000 by allowing diskperf to be loaded twice, once above ftdisk.sys to collect Logical Disk statistics and once below it to collect Physical Disk stats. In Windows 2000, diskperf is loaded below ftdisk.sys by default. To load it a second time, issue the diskperf –yv command to activate the Logical Disk measurements.

6. I am concerned about the overhead of the diskperf measurements. What does this feature cost?

Not much. I strongly recommend that you enable all disk performance data collection on any system where you care about performance.

Even if you don't care that much about performance, you should turn on Logical Disk reporting at a minimum. The Logical Disk Object contains two counters, Free Megabytes and % Free Space, which will alert you in advance to potential out-of-disk space conditions.

The diskperf measurement layer does add some code to the I/O Manager stack, so there is added latency associated with each I/O request that accesses a Physical Disk when measurement is turned on. However, the overhead of running the diskperf measurement layer, even twice, on Windows 2000 machines, is trivial. In a benchmark environment where a 550MHz, four-way Windows 2000 Server was handling 40,000 I/Os per second, enabling the diskperf measurements reduced its I/O capacity by about 5 percent to 38,000 I/Os per second. In that environment, we estimated that the diskperf measurement layer added about 3 to 4 microseconds to the I/O Manager path length for each I/O operation. (On a faster processor, the delay is proportionally less.) For a disk I/O request that you would normally expect to require a minimum of 3 to 5 milliseconds, this additional latency is hardly noticeable.

Besides, if you do not have disk-performance statistics enabled and a performance problem occurs that happens to be disk-related (and many are), you won't be able to gather data about the problem because loading the diskperf measurement layer requires a reboot.

In my view, you can only justify turning off the disk performance stats in a benchmark environment where you are attempting to wring out the absolute highest performance level from your hardware configuration. Of course, you will need to have the diskperf measurements enabled initially to determine how to optimize the configuration in the first place. It is standard practice to disable disk performance monitoring prior to making your final measurement runs.

O'Reilly & Associates recently released (January 2002) Windows 2000 Performance Guide.

Copyright © 2009 O'Reilly Media, Inc.