Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples

Top 7 Things System Administrators Forget to Do

by Tom Adelstein

Out of the plethora of chores that we do each day, which ones make up the top seven activities of forgetful system administrators? To begin, you might ask yourselves if the answer is quantitative or qualitative. Let's think about it for a minute.

You could look at the number of times you forget something. If you did that, you might get a list of things like forgetting to set your alarm clock or showing up to work with unmatched socks. We need something scarier, like forgetting to turn off remote access through a VPN to an employee that just left the company.

In writing this article, I felt like the Simon Cowell of technologists, but in reverse. Out of many worthy candidates I could only pick seven, and Paula Abdul couldn't vote. Beside my own subjective view, I decided to consult other engineers. The men and women I consulted felt the criterion was obvious: administrators forget to do things that matter.

I could cite many reasons why they forget some critical tasks. Those reasons include doing jobs that normally take two or more people, having to provide break-fix services on hardware, covering for absentee help desk personnel or getting involved in pre-sale activities. Regardless, here is the compilation of the top seven things system administrators forget to do.

1. Forgetting to Delete a Former User's Account

When IBM, Novell, and HP hold seminars in the same city around the same week, you find out why you need their identity management systems. Some unnamed Fortune 50 companies forgot to delete former user accounts for five years. Those former employee accounts existed in the human resource and payroll databases, in the computer directory, address book in the SID, SAM, and AD. The vendors will say, you don't have enough system administrators, will never find enough available, and therefore need Tivoli, eDirectory, or OpenView.

Who really knows if the workforce has enough system administrators? In my survey, system administrators complained about their workload, lack of time to plan, and a need to prioritize their tasks. I asked many if they kept a list of their tasks and few did. About 90 percent of the engineers surveyed went to work with their daily schedule in their heads. I counted that as forgetfulness.

I rarely go to the grocery store without a list because I cannot recall what I need. I forget the laundry detergent or some obvious item like vitamins. If I can't remember 15 items on a grocery list, how do I expect to remember the things I need to do at work? I function poorly without a list.

We have to close the door when a user leaves. We also need a checklist to follow and a way to find out who left. You cannot justify leaving former user accounts active. Some things to remember include disabling the user's password. I like to preserve her directory, since someone else may take the her place. I typically move the directory and rename it. We often want to keep the contents of the old user's directory intact.

Depending on your organization's IT policies, you'll want to create a list of actions to take. Remember that you need to do more than simply changing a user's password. If this user ever had root access you might find anything from a trojan system binary to an unknown kernel module. With that in mind we can move on to Rootkits.

2. Forgetting to Regularly Search for Rootkits

Rootkits allow an unauthorized person to gain access to the superuser or domain administrator's account. The same software can let an intruder hide his or her tracks, steal or remove files on a system and so forth. A rootkit can allow someone to maintain access to a hijacked computer. A programmer can write a rootkit for any type of operating system. If you have read about companies losing 40,000 customer files, then you will usually find a rootkit to blame.

User-level rootkits are easy to detect and remove. At this level, the software replaces one or more of a legitimate user's applications with a modified program. On Unix-style and newer proprietary systems you can detect a user-level rootkit if you trust the kernel. Programs like AIDE and Tripwire can detect this type of rootkit.

Kernel level rootkits are difficult to find, since you cannot trust the kernel on which the rootkit exists. We've seen kernel-level rootkits delete logs to hide an intruder's tracks and replace system calls. Kernel level rootkits can exist as a Linux Kernel Module (LKM) or a service on a Windows server. Recently, I found a rogue service running on a Windows 2003 R2 server in a test environment. Some examples of LKM rootkits are Afhrm and Synapsis. Earlier Windows kernel mode Trojans included Slanret, IERK, and Backdoor-AL.

Since you cannot trust the kernel, security specialists install packet sniffers on unaffected machines. The specialists look at packets sent to and from the machine on which they expect a rootkit exists. Another way to detect kernel level rootkits involves booting from a live CD. The live CD has a kernel you can trust and will allow you to investigate the drives.

Monitor your system with file integrity checks by looking at the machine for changes. Make a fingerprint of a newly installed OS image or after adding new software. A fingerprint uses cryptography to make a hash of all the data in a file. Once you have the hash you can compare a stored hash value with the running hash value. You can then detect changes and see if someone put a rogue program on your system.

3. Forgetting to Use a Trouble Ticket Tracking System

Did you know that an RFC exists for a trouble ticketing system? RFC 1297, NOC Internal Integrated Trouble Ticket System Functional Specification Wish List, is an Internet Engineering Task Force specification. The author of the RFC compares a trouble ticket to a patient's hospital chart. Both define a problem and assist in coordinating the solution with people working on it at different times.

Initially, an internal client creates a ticket that moves though a support system. The ticket identifies an issue and helps determine the skills and expertise needed to solve the problem. Until the person or persons assigned to the ticket resolves the issue, the ticket remains open.

A trouble ticket or trouble report tracks the actions performed by the experts and reports to a case manager on the progress of the problem's solution. In tracing the origins of trouble ticketing systems I found they originated in manufacturing as a paper-based reporting system.

Today, almost all trouble tracking systems are web-based applications. Forgetting to use a trouble tracking system leads to the kinds of problems I describe below.

4. Forgetting to Set Up Technical Documentation and Creating a Knowledge Base

Back in February, I interviewed for a job as a Linux system administrator. The company had 30 Linux boxes running mission-critical applications for a global VoIP network. I came close to accepting the position, until I asked for technical documentation. The resigning system administrator replied that they had the code, what else did I need? I asked the managing director to refrain from making me a job offer after learning the state of the company's documentation.

I wonder how many times I've forgotten the solution to problems I would ultimately see again. It seems silly when you realize you could have just written it down and filed it away. Instead, we duplicate the same effort it took to find the answers we need.

In November 2001, I discovered that our support staff had a backlog of 85 days. After that, support simply deleted any emails requesting customer service. I took the issue to our system administrators and development team and declared the situation a crisis. I also found out that our system administrators had responsibility for backing up our customer service department. The programming team stopped all development and looked for a quick solution to this fiasco. We found Request Tracker (RT) from Best Practical Solutions LLC and implemented the system. Within 10 days, we cleared every item on our list and turned RT over to our newly hired customer service representatives.

While looking for solutions to our ticket tracking system, we realized that any traditional knowledge base system wouldn't work for our company even with RT in place. We could not afford trained authors to write the content we needed. At that time, Practical Solutions did not have their RTFM knowledge management system available.

Our solution involved automating a bridge between closed trouble tickets and a web-based FAQ software system. After our customer service crisis, we couldn't afford to allow a technical issue to go unnoticed again.

5. Forgetting the Risks of Flash Memory Drives

USB flash drives can transport large files to colleagues or client's remote office and access data without worrying about compatibility. You can take work home or travel with data without needing a laptop. Unlike a CD-R disk, you can edit documents or data on the flash drive directly. You can also backup files.

But, flash drives can be a system administrator's worst nightmare. Viruses can be brought in from home, employees could make a "home copy" of a corporate software package, or, in the worst case, flash drives could be used in corporate espionage (e.g., where sensitive data like trade secrets or customer lists are stolen).

A poll of taken in the United Kingdom corporate IT managers revealed that:

6. Forgetting to Manage Partial Root Access

Many administrators believe that using sudo in Unix-based systems or "run as" in Windows is a panacea to help delegate some system responsibilities to non-administrators without giving away full root access. sudo uses a setuid root binary to execute commands on an authorized user's behalf, after he has entered his current password.

While this may allow you to give out limited root access without giving away the root password, it is really only a useful method when all of the sudo users can be completely trusted. As an organization grows in size, administrators will often forget who has partial root access. Changes in personnel, management, users, and a lack of resources can leave ordinary users with access to programs that have known exploits. For that reason, in dynamic business environments, you cannot afford to lose control of the sudo users group. A solution to the problem involves centralizing management of sudo users.

7. Forgetting Courtesy

I wonder how many times this comes up. A month ago, a young lady in our office attempted to move a large conference table. The CTO and I made a valiant attempt to help her. We failed. The table weighed too much for us to move. The CTO looked around and asked two of our IT guys to help. You might think that they would have jumped at the chance to please the boss. The IT guys gave us the Mohamed Ali look. The young lady and I simultaneously uttered, "Don't ask them."

I had just joined the company and couldn't believe the stories I heard. The troublemaker came out in me and I went to my immediate supervisor to ask if the support people from the IT department really cast an evil eye when someone requests help. He answered in the affirmative and asked, "Aren't all IT guys like that?"

I understood the sour attitudes exhibited by our busy admins. I pulled weekend all-nighters many times. Fortunately, during my early days in help desk and call center training, someone instilled in me the need for a smile and a helpful attitude no matter how many hours of sleep I had. Courtesy and diplomacy became the hallmark of my work ethic.

Now, I said I have this troublemaker side. So, I wrote up a generic job description of technical support personnel. I put the description together from several job requirements listed on Monster and Dice job boards. I then presented it to my boss and made sure to read it over with him. Soon afterward, I saw a closed door and heard something like computer parts smashing against walls. The IT guys came out of their office looking ready to remove my head. They marched to the data center and didn't come out for a week. But a funny thing happened when they emerged from the data center; they had cooled down and both gentlemen apologized. They became models of courtesy.

I began asking people in other divisions within our company if their IT people acted like jerks. I learned we hadn't cornered the market of system administrators in need of anger management training. Somewhere along the line, a sour disposition took hold, and it never changed. It happens a lot in our world.

If you want support from management, consider remembering that the user you offend today could wind up on the board of directors. Regardless of that possibility, system administrators should always remember that their clients are internal and if you want to keep your job, be good to your clients.

Final Thoughts

Do system administrators really forget to do things because they're lazy or do the pressures of the job keep them from getting everything done? If the latter is true, then the less important to-dos may not get done. My experience tells me that a person can only do so much. If you have to work a 15-hour day just to get the basics done, then management needs to re-evaluate its commitment to the IT department. I believe that's the case.

Tom Adelstein became an author in 1985 and has published and written non-fiction books, journalistic investigative reports, novels and screen plays prolifically ever since.

Return to O'Reilly SysAdmin.

Copyright © 2009 O'Reilly Media, Inc.