A New, Improved Visualization for Web Server Logs
Pages: 1, 2
We will now polish the plot in two ways: by jittering each point by a random value in the range ± 0.5 to give a nonintegral value for the z-axis (see
splot command in Listing 3) and by using a log 2 scale. Each of these will spread the data along the z-axis. Converting integer values to real numbers has the effect of spreading the data points from the point 12.0 (for example) to the range 11.5 to 12.5. Using a log scale has the effect of compressing the data as the data begins to thin out at the high z values. In the previous article we touched upon Zipf's Law and noted that hits on a web server follow it. By spreading out data near the origin and compressing it as it goes further, the result is that the dense plot near the origin is evened out. The disadvantage is that less prominent features may hide others, as can be seen by comparing the visibility of the red pillar in Figures 2 and 3.
Figure 3. Improved plot by spreading the data
Specifying a log scale for an axis is accomplished by the simple gnuplot command
set logscale z 2. Adding a random offset to the integer values is done by substituting the plain, unadorned column indicator 3 in the
splot command with
($3+rand(0)-0.5). The expression adds a random value in the range of 0 to 1 and then subtracts 0.5, which, in effect, means adding a random value of ± 0.5 to the third dimension.
Authors of articles on 3D plots face the dilemma of showing one through print or web media. Though constrained by the characteristics of the Web, we can use animation to convey the 3D structure. With the GIF animation feature of gnuplot 4.2 one can rotate the 3D plot and make it visible from several angles. Listing 4 has the commands to move the view point and replot the graph. Each plot command results in a frame of the animated sequence.
Figure 4. Rotating the 3D plot
Before concluding this article, I would like to motivate you to use the methods described here for your own needs. For example, you could take other items from the access logfile instead of the ones I chose. Likely candidates are bytes transferred, time to serve a request, etc. You are not restricted to web server logs; syslog (or event logs on Windows), mail logs, performance data, or mixtures of the these are suitable candidates for such analysis. A picture is worth a thousand words.
The basic tenet of my two articles has been to plot three or four seemingly unrelated parameters on orthogonal axes to visually ascertain clustering and, therefore, relations among those parameters. This, I believe, is one useful way to visualize web server logs that can consist of hundreds of thousands of lines, each of which has a number of items. Other logfiles, when tortured in a similar fashion, may squeal about the inner workings of otherwise hidden processes.
- Gnuplot commands for Figure 1
- Gnuplot commands for Figure 2
- Gnuplot commands for Figure 3
- Gnuplot commands for Figure 4
- Perl script to convert access logfiles to input suitable for gnuplot for Figure 4; unchanged from the previous article
Raju Varghese has a Bachelors in Electrical Engineering from BITS, Pilani (India) and a Masters in Computer Science from the University of Texas, San Antonio.
Return to SysAdmin.