oreilly.comSafari Books Online.Conferences.


Top 7 Things System Administrators Forget to Do

by Tom Adelstein

Out of the plethora of chores that we do each day, which ones make up the top seven activities of forgetful system administrators? To begin, you might ask yourselves if the answer is quantitative or qualitative. Let's think about it for a minute.

You could look at the number of times you forget something. If you did that, you might get a list of things like forgetting to set your alarm clock or showing up to work with unmatched socks. We need something scarier, like forgetting to turn off remote access through a VPN to an employee that just left the company.

In writing this article, I felt like the Simon Cowell of technologists, but in reverse. Out of many worthy candidates I could only pick seven, and Paula Abdul couldn't vote. Beside my own subjective view, I decided to consult other engineers. The men and women I consulted felt the criterion was obvious: administrators forget to do things that matter.

I could cite many reasons why they forget some critical tasks. Those reasons include doing jobs that normally take two or more people, having to provide break-fix services on hardware, covering for absentee help desk personnel or getting involved in pre-sale activities. Regardless, here is the compilation of the top seven things system administrators forget to do.

1. Forgetting to Delete a Former User's Account

When IBM, Novell, and HP hold seminars in the same city around the same week, you find out why you need their identity management systems. Some unnamed Fortune 50 companies forgot to delete former user accounts for five years. Those former employee accounts existed in the human resource and payroll databases, in the computer directory, address book in the SID, SAM, and AD. The vendors will say, you don't have enough system administrators, will never find enough available, and therefore need Tivoli, eDirectory, or OpenView.

Who really knows if the workforce has enough system administrators? In my survey, system administrators complained about their workload, lack of time to plan, and a need to prioritize their tasks. I asked many if they kept a list of their tasks and few did. About 90 percent of the engineers surveyed went to work with their daily schedule in their heads. I counted that as forgetfulness.

I rarely go to the grocery store without a list because I cannot recall what I need. I forget the laundry detergent or some obvious item like vitamins. If I can't remember 15 items on a grocery list, how do I expect to remember the things I need to do at work? I function poorly without a list.

We have to close the door when a user leaves. We also need a checklist to follow and a way to find out who left. You cannot justify leaving former user accounts active. Some things to remember include disabling the user's password. I like to preserve her directory, since someone else may take the her place. I typically move the directory and rename it. We often want to keep the contents of the old user's directory intact.

Depending on your organization's IT policies, you'll want to create a list of actions to take. Remember that you need to do more than simply changing a user's password. If this user ever had root access you might find anything from a trojan system binary to an unknown kernel module. With that in mind we can move on to Rootkits.

2. Forgetting to Regularly Search for Rootkits

Rootkits allow an unauthorized person to gain access to the superuser or domain administrator's account. The same software can let an intruder hide his or her tracks, steal or remove files on a system and so forth. A rootkit can allow someone to maintain access to a hijacked computer. A programmer can write a rootkit for any type of operating system. If you have read about companies losing 40,000 customer files, then you will usually find a rootkit to blame.

User-level rootkits are easy to detect and remove. At this level, the software replaces one or more of a legitimate user's applications with a modified program. On Unix-style and newer proprietary systems you can detect a user-level rootkit if you trust the kernel. Programs like AIDE and Tripwire can detect this type of rootkit.

Kernel level rootkits are difficult to find, since you cannot trust the kernel on which the rootkit exists. We've seen kernel-level rootkits delete logs to hide an intruder's tracks and replace system calls. Kernel level rootkits can exist as a Linux Kernel Module (LKM) or a service on a Windows server. Recently, I found a rogue service running on a Windows 2003 R2 server in a test environment. Some examples of LKM rootkits are Afhrm and Synapsis. Earlier Windows kernel mode Trojans included Slanret, IERK, and Backdoor-AL.

Since you cannot trust the kernel, security specialists install packet sniffers on unaffected machines. The specialists look at packets sent to and from the machine on which they expect a rootkit exists. Another way to detect kernel level rootkits involves booting from a live CD. The live CD has a kernel you can trust and will allow you to investigate the drives.

Monitor your system with file integrity checks by looking at the machine for changes. Make a fingerprint of a newly installed OS image or after adding new software. A fingerprint uses cryptography to make a hash of all the data in a file. Once you have the hash you can compare a stored hash value with the running hash value. You can then detect changes and see if someone put a rogue program on your system.

3. Forgetting to Use a Trouble Ticket Tracking System

Did you know that an RFC exists for a trouble ticketing system? RFC 1297, NOC Internal Integrated Trouble Ticket System Functional Specification Wish List, is an Internet Engineering Task Force specification. The author of the RFC compares a trouble ticket to a patient's hospital chart. Both define a problem and assist in coordinating the solution with people working on it at different times.

Initially, an internal client creates a ticket that moves though a support system. The ticket identifies an issue and helps determine the skills and expertise needed to solve the problem. Until the person or persons assigned to the ticket resolves the issue, the ticket remains open.

A trouble ticket or trouble report tracks the actions performed by the experts and reports to a case manager on the progress of the problem's solution. In tracing the origins of trouble ticketing systems I found they originated in manufacturing as a paper-based reporting system.

Today, almost all trouble tracking systems are web-based applications. Forgetting to use a trouble tracking system leads to the kinds of problems I describe below.

4. Forgetting to Set Up Technical Documentation and Creating a Knowledge Base

Back in February, I interviewed for a job as a Linux system administrator. The company had 30 Linux boxes running mission-critical applications for a global VoIP network. I came close to accepting the position, until I asked for technical documentation. The resigning system administrator replied that they had the code, what else did I need? I asked the managing director to refrain from making me a job offer after learning the state of the company's documentation.

I wonder how many times I've forgotten the solution to problems I would ultimately see again. It seems silly when you realize you could have just written it down and filed it away. Instead, we duplicate the same effort it took to find the answers we need.

In November 2001, I discovered that our support staff had a backlog of 85 days. After that, support simply deleted any emails requesting customer service. I took the issue to our system administrators and development team and declared the situation a crisis. I also found out that our system administrators had responsibility for backing up our customer service department. The programming team stopped all development and looked for a quick solution to this fiasco. We found Request Tracker (RT) from Best Practical Solutions LLC and implemented the system. Within 10 days, we cleared every item on our list and turned RT over to our newly hired customer service representatives.

While looking for solutions to our ticket tracking system, we realized that any traditional knowledge base system wouldn't work for our company even with RT in place. We could not afford trained authors to write the content we needed. At that time, Practical Solutions did not have their RTFM knowledge management system available.

Our solution involved automating a bridge between closed trouble tickets and a web-based FAQ software system. After our customer service crisis, we couldn't afford to allow a technical issue to go unnoticed again.

Pages: 1, 2

Next Pagearrow

Sponsored by: