Why Isn't System Administration Evolving?

by Luke A. Kanies

Now that I've gotten the introduction out of the way, I can get to the meat of why I'm blogging here in the first place. This is a long one, but I think you'll find it's worth it.

I'm not writing Puppet because I think I'm right or whatever; I'm writing it out of desperation, because no one else is even trying. Not only are people not trying to make better tools than we have available today, they're not even using the crappy ones we do have available, which is just sad. Imagine if the computing world had just refused to write any code until C (or better yet, Ruby) showed up; where would we be now?

I'm less interested in why sysadmins don't use the existing tools, though, and more interested in why they don't publish their own. Most of the rest of the technical world seems to have figured out how to solve their problems using code and then how to turn that code into a self-sustaining project, either open-source or commercial. Sure, this stuff was complicated fifteen years ago, but it's pretty straightforward now; and yet, instant messaging tools have larger development communities than sysadmin tools, and the majority of sysadmins spend their days toiling with a bunch of little one-off scripts that no one else will ever see or use and that the next sysadmin will gratefully /dev/null as soon as possible.

I've heard all of the standard excuses -- we don't have enough time, we can't risk it, we spend all day doing computers and don't want to do it at night, my company won't let me, etc. Every software project that has ever evolved out of an internal project has exactly these same excuses, and yet they have somehow succeeded. Why have so few sysadmin tools evolved this way? Why are sysadmins so willing to believe their excuses?

48 Comments

bronto
2007-02-07 03:39:41
Hello Luke


Interesting post! I subscribe your point of view about the need of configuration management tools in our job. Why a lot of people don't use them? From my experience, I can list some:


* they have a learning curve that is quite steep at the start
* they force you to work in an ordered, organized way: some people still prefer to work by the inspiration of the moment
* there are a lot of people that over-esteem themselves and they believe they don't need configuration management tools


I sincerely don't want to start a flame war here, but since I am a cfengine user I'd like to hear from you on how Puppet is better than cfengine, and what you think are the weakest point of it.


Ciao and thanks
--bronto

Luke Kanies
2007-02-07 08:36:25
bronto, I agree that people avoid these tools for the three problems you mention, but I'm guessing it's the third that stands out more than the first two. I'd be surprised if many people took the effort to learn one of these tools and then decided to use it, since (even with all their problems) they're still so much better than hand-rolled stuff.


Whatever the reasons, they're more like excuses than justified reasons.


As to the differences between Puppet and Cfengine, I maintain a page on that.


yodicoyote
2007-02-07 13:47:56
Hi Luke,


I actually agree with you pretty much entirely - I tried joining sysadmin guilds and what-not but it really just turned into pissing contests about what version of unix was better. That may also be part of the problem: *nix is so varied, where one thing works on Solaris, it won't on AIX or Linux etc. Do Windows sysadmins have the same problems? Do they automate (can they)?


As to the excuses, true, we are masters of the dodge, but at the same time, having time to do the development work on automation tools is hard to find. You said in your first post "for the record, I'm now a full-time developer working on Puppet, and I haven't been an operational sysadmin (thankfully) in a few years)" which kinda shows that you had to "retire" from system administration before you could really get into puppet.


Again, I agree with you and find the topic really worth talking about! I've been waiting for this SysAdmin feed to wake up (yes, yes perhaps I should CONTRIBUTE rather than complain - i'm working on that!).


yodi

matt
2007-02-07 15:07:02
2 other simple possible reasons


1) sys admins aren't developers (so writing said system doesn't necessarily interest them as much as say fiddling around with vcs or zfs)
2) i think a lot of sys admins are scared by automation in the field. They're scared they might automate themselves out of a job. Look at a similar field like DBAs. How much of that is automated and reusable? Why aren't DBAs creating similar tools?


As to who to invite as leaders in the field, why not snag some folks from the various vendors (Opsware, Splunk, Fiveruns maybe) as well as large scale operations (Limoncelli or haberlach maybe (both work at google now)).

Luke Kanies
2007-02-07 15:35:18
Matt, I agree on sysadmins not being developers, but why aren't they? Other fields (e.g., almost anything involving the web) have managed to move to seeing themselves as developers and acting accordingly, but sysadmins, who generally actually write a good bit of code, can't seem to.


As to being afraid of automating themselves out of a job, I think that's just silly. The service quality of a typical sysadmin is so poor right now -- no real metrics, no graphs showing network state and trending deltas, slow response times to new functionality requests, inconsistencies throughout the network, low adoption of new and better systems -- that we've got a lot of headroom before things are so good that the sysadmins can get fired.


Heck, every place I've set up automation has resulted in the sysadmins having more power, because they end up so responsive and so dependable that the company naturally trusts them more.


On your last note about the leaders... I don't think OpsWare is a leader in this space. From what I know about their system, you couldn't pay me to use it. Sure, they're making some money, but they're not changing the game, they're just automating current practices. I could see inviting someone from one of the monitoring companies like Splunk, FiveRuns, Zenöss, but that's just monitoring; none of those tools can really help you actually make changes to your network, they can only help you analyze it. It's true that this seems to be the main area of investment in the systems space, though, so it's worth trying to get one of them if the panel gets accepted.


I hadn't heard of Adam Haberlach, I think, but my article discusses Tom. Tom might be a good contrarian perspective.

matt
2007-02-07 16:09:20
Luke,


"As to being afraid of automating themselves out of a job, I think that's just silly." I totally agree; however, I have actually heard this several times. Yes, when we get a good start on automating things and having the sysadmins think differently about what they're doing (making that large mental shift that you need to think about things as modelled components as opposed to just files and what not), they often end up becoming more productive and becoming proactive as opposed to reactive.


As to "From what I know about their system, you couldn't pay me to use it.", then I don't think you're educated about their system. IMO, they have changed and are changing the game. They've had a lot of aspects of what you do in puppet in the system for years (such as higher level platform-agnostic modelling, model inheritance, etc.) I'll email you offline if you'd like to discuss further.


As to splunk, zenoss, etc. Yeah. I think that they are, on the other hand, trying to take some of the repetitive things and automate them (granted it does tend to lean on the troubleshooting, RCA, and monitoring side of things). Still, I think anybody making any effort towards somewhere better in the field should be commended.


With that, I commend you for trying to push forward the conversation (here and on config-mgmt lists).

John Martinez
2007-02-07 16:23:46
Great article. Adding my 2¢ in here.


A lot of the "magic" around system administration has been lost as the IT field has become more service focused. Now, service is a good thing, but it isn't retaining the technical among us. What happened is that in many large private organizations, the decisions have turned from technical to political. Many times, acquiring products and services is no longer a technical decision. IOW, the sysadmins can't/aren't allowed to make technical recommendations of products, including configuration management. You have to deal with what you're given, which in many cases is a shrink-wrapped product that was either too much or too little for the problem.


Just giving a different point of view.

Tim
2007-02-07 16:41:12
The thing with sysadmins is that they are constantly building new setups and maintaining their old ones. It's cool to try out a new tool or spend some time writing a tool you need, but you can't expect us to spend too much time on it. Think about the legacy systems which aren't easily converted to the new systems. Sure, I can use a system in my new setup, but hey, next month a new tool appears and I'm going to use that! Sysadmins make their job easy by using the same tools everywhere. Puppet is cool, but implementing it for all the machines we currently maintain, without affecting the maintenance we still need to do on those machines and without effecting the services they provide, is undoable. I want my machine landscape to be mostly the same, so if I do write a little script to help me with something, I can use it easily at more places!


Having a large framework is nice, when you start and stick with it. It's a gamble: Will the next system that arrives be even better? How will I be switching to that while still providing the same level of service? It's not so easy to make that decision.

matt
2007-02-07 17:19:56
@Tim:


What's wrong with taking baby steps? For example, I would assume that you don't build all your machines by hand and probably use either jumpstart, kickstart, or some windows automation. If so, why? You can apply this same logic to other things as well.

void
2007-02-07 20:13:24
Luke,


I'm still gathering my thoughts on your post, which at first glance I mostly agree with, but one question comes immediately to mind. Have you talked or written to Tom Limoncelli about adding such a discussion to the 2nd edition of TPOSANA?

Luke Kanies
2007-02-07 20:48:01
matt, I agree that it's very good that zenoss, splunk, et al are doing what they're doing; I'm just concerned that we've got a lot of people in monitoring and not many in management. As to OpsWare, you're probably right that I don't know it well enough, but I'm also unlikely to find out considering that (last I heard) their user guide was considered proprietary confidential. All I could find on their web site is white papers, so they could be awesome and I'll never know. I'm not exactly an open source bigot, but I'm definitely fond of open information.


Tim, your point about frequent system changes is good, but isn't that an argument for better tools and shorter tool generations, not fewer and longer generations? If you're rebuilding everything anyway, why not use a better system this time?


void, I've not talked to Tom about it, but I might at some point. I'll at least mention it to him the next time I see him.

gene
2007-02-08 05:17:20
There does seem to be a re-occuring theme in some of the posts. It's dealing with the learning curve on many of the tools. Making the transition to something like puppet or bcfg2 takes time and energy that the admin in the trenches probably does not have. That admin won't even have it AFTER he gets it into production. The fact is that now that something is allowing things to be done quicker or better, other work will probably be added to his queue. So, now when the next great tool comes along, you are in the same spot you were before. No time to spend learning how to deploy it. In fact you might argue that you now have less time to spend on it because your rsponsibilities have grown and if you need to let something slide for a bit it will have a bigger impact on those that depend on you.


The chances of you automating yourself out of a job is probably not going to happen. It might IF the budgets really get tight and the company thinks they can just let things slide for a while, but that will happen anyway. More likely is that because of your automation, you are going to be removing the new position from the job board. By automating things, you've improved the system. You have more time, and can do bigger/better things, so that new, somewhat related project gets tossed onto your lap instead of them bringing in someone new.


Getting back to the tools side of things. One of the biggest problems is that there isn't an easy way to find the really useful easy hacks that people have put together. Things that you can just load and go to handle some of the smaller tasks that are done every day. You can go to sourceforge or other places and find 100's of things that might do what you want, but which of them actually work? I might look at a couple then munge it to do what I need. I'll submit the changes back. Having a site that has reviews on good tools to get started would really help many in the field. It would really be great if there was an automation path available for people to take. If you have nothing, start here. Once that is working, push this way. It's the baby steps. In the end, here's how to roll that stuff together into the MCP. (tron reference)



steve
2007-02-08 09:46:39
I'd say anyone who as been working as a sysadmin for any length of time will probably join me in saying it feels like this post is talking directly to them. Great job, Luke. Keep them coming.


Many of the posts have talked about automation in particular, but I think the real root cause of the lack of automation is mentioned a couple of times in the article, that being the lack of standardisation/best practises. Forget the argument about us having standards as a profession, in many shops I've worked in getting a standard way of doing things, even having a written server build standard, is too much to expect. People have their own styles and habits and it can get to a point when you are asking who owns/built a particular server so when you log in you have some hope of guessing all the customisations that have been applied to the server without tripping over them.


The best maintained shops I've worked have always had a standard build and a decent level of change control. Without these you can't even start to think of automation as you're trying to handle lots of edge cases and that's where things break. I've seen decent systems of automation throw out just because some single server was broken by an automated update that worked on 200 others.


On a side note, there was a comment about not being able to make technical recommendations. Who is more qualified to do this than the sysadmins who actually maintain the systems? If you are in this position then it's time to go somewhere else as your company sees you as an operator, not a sysadmin. There is a fundamental difference.

Luke Kanies
2007-02-08 12:49:27
gene, I agree about the learning curve. I'm doing everything I can to make Puppet's learning curve very friendly, but I know I haven't succeeded yet, which is why I'm still focusing mostly on companies who know they need something like Puppet, rather than marketing as much to the general sysadmin.


That being said, Puppet is already significantly easier to use than anything I used as a sysadmin, including all the stuff I wrote.


As to automating yourself into more work, that's certainly a concern, but fundamentally, if you're at a company that doesn't reward you for work well done and only continues piling more work on you no matter what, you're stuck. Nothing can help that. My personal experience has been that the vast majority of times the tools are either flat-out rejected by the company, or the company ends up pretty positive and encouraging.


Your point about there being no good community site for people to share code snippets is also salient. I've personally tried creating such a site three times (as I alluded to in one of my comments to the lopsa-discuss list), but they were all miserable failures. Please, make one, change the world. :) I still haven't given up hope entirely, though.


steve, I agree that standards are critical, but it's never too late. I've often spent a significant amount of time writing throw-away tools just to convert from ad-hoc configurations to standardized versions. Overall, though, yeah, having standards and the consistency they bring is a big part of it. I have found that standards enforced by humans are generally not really enforced, though; you need tools to do it, otherwise variety shows up between the cracks in the standard.

Luke Kanies
2007-02-08 13:40:12
This post has also resulted in a bit of a thread on the lopsa-discuss list.
Action Jacks
2007-02-08 13:57:59
any failed services restarted, database inconsistencies repaired, broken links fixed, new minimal servers in the netgroup webservers configured as secure webservers and subversion 1.4.3 installed across all of them when I walk into the office in the morning at sometime between 9 and 10 am. How cool is that! Now that my friends is power and promotion material! I've had 2 promotions off the back of it!


Now I get to drive IT strategy and policy, however repeatably I see people doing the same things again and again and again! Some of my guys rebuild laptops over and over again, I've even had one guy say to me I find doing installs manually "relaxing!" **!*You should have seen the looks of horror, pity and discust on my face!**. Why? I sometimes have to explain over and over again the benefits but they just don't get it, i.e. why automation, standards and repeatability are good.


I sometimes think System Administrators are so busy trying to do the "I've got a "BIG" S on my chest act" that they miss the BIG picture, most companies do not sell "IT Support" as a product it's an end to a means if your work improves the bottom line then you will get noticed, I provide a real life example (I know it's in real life because it involves me):


Two system administrators, Sysadmin one is deploying an anti-spam system, there's an issue Sysadmin one pulls an all nighter gets the system in, up and running.
Now Sysadmin 2 has to upgrade 100 machines in a HPCC cluster, he spends twice the effort ahead of time developing, testing and implementing an automated build system and management system that can scale to 1000's of machine rather than just the original 100.
Guess who got the promotion?


Mr "I have more caffeine in my blood vessels than blood after working 18 hours" or Mr "Wow! I'm glad I spent all my time planning and implementing up front because the entire HPCC upgrade took 20 minutes and engineering we'ren't impacted"


Guess what the customer saw at the time:
Christ! SysAdmin One! What a superstar he worked all night to get the systems back up and running! I need a mouse but he's off because he worked all night! Nutz! Man I'm gonna complain!


Hey SysAdmin Two. Hey how do you manage to install 100 machine in under 20 minutes with no hassles, go out to the pub and get out the door by 17:30! Wow you make it look so easy! And you saved us over 100K in engineering development time! Is there anything you can't do?


3 months later:


SysAdmin One pulling an another later night in order to roll out another anti-spam gateway...


SysAdmin Two catching some Z's because the extra 200 machines he was told to order 7 days ago, arrived this morning and have been installed in under an hour.


Guess who get's ahead.

Action Jacks
2007-02-08 14:00:07
Whoops sorry! Finger trouble before!


I boil it down to two key issues:


Laziness and Attention Span


Bad System administrators aren't lazy enough and they have long attention spans.


I use CFEngine and have done for the past 3 years, it works, its stable (I wouldn't mind trying puppet though once it hits stable).


I started using it because I was sick and tired of ssh into'ing this and that and running the same set of commands over and over again. I'm not a patient guy, I want to see results and I want to do interesting "NEW"" things while providing "best practice" IT services and solutions to my client base. I want to duck out of the office and spend my time with my family and buddies, doing the same thing more than twice struck me as tedious and a waste of my time especially beer time!


At the moment I'm sitting at home with a Sierra Nevada (Yay! We can get it the UK now!) safe in the knowledge that when I'm asleep at 4 am all my 200+ machines will have their configurations audited, the latest patches updated, any failed services restarted, database inconsistencies repaired, broken links fixed, new minimal servers in the netgroup webservers configured as secure webservers and subversion 1.4.3 installed across all of them when I walk into the office in the morning at sometime between 9 and 10 am. How cool is that! Now that my friends is power and promotion material! I've had 2 promotions off the back of it!


Now I get to drive IT strategy and policy, however repeatably I see people doing the same things again and again and again! Some of my guys rebuild laptops over and over again, I've even had one guy say to me I find doing installs manually "relaxing!" **!*You should have seen the looks of horror, pity and discust on my face!**. Why? I sometimes have to explain over and over again the benefits but they just don't get it, i.e. why automation, standards and repeatability are good.


I sometimes think System Administrators are so busy trying to do the "I've got a "BIG" S on my chest act" that they miss the BIG picture, most companies do not sell "IT Support" as a product it's an end to a means if your work improves the bottom line then you will get noticed, I provide a real life example (I know it's in real life because it involves me):


Two system administrators, Sysadmin one is deploying an anti-spam system, there's an issue Sysadmin one pulls an all nighter gets the system in, up and running.
Now Sysadmin 2 has to upgrade 100 machines in a HPCC cluster, he spends twice the effort ahead of time developing, testing and implementing an automated build system and management system that can scale to 1000's of machine rather than just the original 100.
Guess who got the promotion?


Mr "I have more caffeine in my blood vessels than blood after working 18 hours" or Mr "Wow! I'm glad I spent all my time planning and implementing up front because the entire HPCC upgrade took 20 minutes and engineering we'ren't impacted"


Guess what the customer saw at the time:
Christ! SysAdmin One! What a superstar he worked all night to get the systems back up and running! I need a mouse but he's off because he worked all night! Nutz! Man I'm gonna complain!


Hey SysAdmin Two. Hey how do you manage to install 100 machine in under 20 minutes with no hassles, go out to the pub and get out the door by 17:30! Wow you make it look so easy! And you saved us over 100K in engineering development time! Is there anything you can't do?


3 months later:


SysAdmin One pulling an another later night in order to roll out another anti-spam gateway...


SysAdmin Two catching some Z's because the extra 200 machines he was told to order 7 days ago, arrived this morning and have been installed in under an hour.


Guess who get's ahead.

Luke Kanies
2007-02-08 14:17:25
Great story, Action Jacks; it's good to hear success stories.
spp
2007-02-08 15:38:28
Hey Luke, take a look at http://lopsa.org/node/1128 for a partial answer to some of what you mention. However, for "Action Jacks", let me provide the contrarian view I've often seen.


SysadminOne is looked at as a superstar, because he goes above and beyond the call of duty. He's there all the time, he can be seen to be busy. SysadminTwo is looked at as being lazy, because he isn't seen to do anything. He's only there 40 hours a week, when he is there he's often sitting around "surfing the web" (researching) or at the coffee shop taking a break (thinking and planning).

Ernie
2007-02-08 18:39:55
I am a super star sysadmin, although my current title doesn't reflect it. I've built a ton of little apps, just like everyone else I guess. Since I am turning 30 and getting old and lazy, I decided to (finally!) look into server automation. I looked into CFengine years ago, but it was overly complex. I looked into puppet recently, but the documentation is a bit sparse. Well, its a lot better than a lot of open source projects, but what it really needs are more examples and a "quick start guide". A 5 minute thing where you can say "Ok, here's a common scenario, you need to get NTP installed on 30 servers, and half of 'em need 1.2.3.4 as a server and the other half will use time.apple.com ... Go!" Then talk about how you'd normally do it, either tough it out and SSH to 'em, or maybe if you are smart cluster SSH, or whatever, then talk about how puppet can do it ... THAT might get more people interested.
ben
2007-02-08 20:09:55
Our biggest problem is that most of us work in homogeneous environments. Tools take tinkering across environments. Every automated thing we do needs to take into effect the input, the output and the impact to any other things we're working on. Change one thing, and you change them all. I can develop one liners to do this over and over, but not when a totally different environments get thrown at me.


The second is that we are sysadmins. Not developers. We are also not application admins. Installing and maintaining an RHN satellite server, and keeping up with everything its doing on a big network is practically a full time job if you utilize all its features. When I start being a redhat satellite server administrator, I stop doing what I love - which is solving new and different problems that will crop up no matter what I use.


Which brings to the final point - sysadmins tend to granularize problems, and what seems like a similar problem on the surface ends up being a completely different problem in the end. Because we can see these difference, we have to mold out solutions for them. I'm not going to hit my delicate machine with the fix-it toolbox. We are surgeons, not boxers.


And to address the whole point of your post: we *do* have tools that have been developed and published over time that we all use. cron, nfs, syslog, sendmail, bind, sed, awk, ssh, grep, bash, blah blah blah are all tools that have been developed over time to solve the same problems. A good sysadmin knows how to use these tools, and put them together.


ben
2007-02-08 20:10:57
That should be - most of us work in non-homogeneous environments...
Luke Kanies
2007-02-08 21:25:01
spp, if SysadminOne is rewarded over SysadminTwo, then you've got a problem regardless. You don't want to work at that company anyway, and the job market isn't that bad. I've had those jobs too, and the only regret I have about those pink slips is that they didn't come earlier.


Ernie, I'm constantly working on the documentation, and I think Puppet is the easiest tool to get running right now, but I agree it should be easier to know how to do useful things with it faster. Check the wiki, things are improving.


ben, I agree on the heterogeneity problems, and I've done a lot to address some of those issues in Puppet. All Puppet resources support multiple providers so it can handle portability issues -- I currently have 18 package providers and can manage just about all the major package types, for instance.


I know that there are more differences than that -- for instance, the ssh daemon seems to have a different name on every platform, so Puppet supports multiple names for each resource, so you can have a canonical name to use when specifying relationships and a localized name for each platform.


I'm not convinced that the personality traits of sysadmins have the right causality; I think they're caused by the broken state of affairs, not by the natural demands of the job.


As to the tools you mentioned... The most recent of them is SSH, and I know I was using it in 1997, meaning that the most recent is at least a decade old. Do you really think there haven't been enough changes to computing in the last decade to justify some new tools?

Michael Gorsuch
2007-02-09 07:20:52
Luke, I like the article. It's striking up controversy and conversations all over the place. It should be.


Sysadmins do need to organize themselves better. I really think that a big part of the problem is the lack of community. In my experience, few of us know about LOPSA. I brought it up amongst some of my prior collegues, and not one of them knew about it.


If we have strong community and better interaction, we might be able to move the industry forward.


In response to your article, I built Simply SysAdmin yesterday. It's my goal to start building a community of sysadmins, and to start talking about the big problems out there facing the community.


Anyone is welcome to write and participate.


Thanks again for stirring up the dust.

Doug
2007-02-09 12:45:27
Hey Michael,


I'm really curious, help me understand something. On the one hand you mention LOPSA (who is trying to build a community of sysadmins), and then you mention starting a new website (Simply Sysadmin) with the goal of building a community of sysadmins. If the goal is to build a community, why not work with the people who are trying to build a community? Many hands make light work. There's something inherently ironic here that I simply do not understand. If everybody tries to build their own community with their own separate blogs or websites or whatever, you end up with a lot of little communites but no real community. I'm confused. Is LOPSA not worth community effort? I know we at LOPSA are dying for people with time and energy (hopefully in equal amounts) to help drive the community forward. Discussion is good!


Luke Kanies
2007-02-09 13:34:22
Michael, it's great that you're taking the initiative to get something going. Now you just have to spend the marketing effort to get people using your site and committed to it.


Doug, I know that I'm not working with LOPSA because it seems like the same basic people have been talking forever and not getting much done. It's not other people's responsibility to use your community, it's your responsibility to make it so compelling they can't afford not to.

Doug
2007-02-09 22:17:36
communities don't happen because one builds a website. They happen because people want to be a part of them and contribute. LOPSA tries to enable community by providing the resources. We offer free blogs for any member to use. The question that I really, honestly want to know the answer to is why start another Drupal (or other) blog site in the name of sysadmin community when one exists for this purpose?
If I knew the answer, maybe I could do something to address it. So far only a handfull of people have blogged. What do we do to make it more attractive? Do we not want a central place for sysadmin resources? It only works if people contribute.
(Sorry, I don't want to hijack the thread, but this seems important enough to ask the question).
Deb
2007-02-12 18:04:31
So how does one become a SysAdmin and learn how to install an OS on a zillion servers in no time? Here I'm a sysadmin because I convinced the powers that RedHat on Intel beat Solaris on Sun - so now I am the RH sysadmin, as in if its not Windows its Deb's. I read, I play, I try stuff in between doing the odd assortment of things that make up my job. Some days I'm sure they'll realize how little I know, but then I remember they know even less than I do.


I built a set of scripty-bits to help me install RH+Oracle faster and with less error. Silly little bits of code really and probably could be done better by the teen next door but I like them, use them and plan to refine and make them better (once I learn to write code:) I'm having fun at work and I want to get better but where to start now that I've been in this pond for a couple of years? And how do any of us find time to work, study, and play with the kids and spouse?

Doug
2007-02-12 20:16:18
Deb, check out System Installer Suite. (http://sourceforge.net/projects/sisuite/) or FAI (fully automatic install). SIS is a golden-image solution. You setup one 'gold' server that has everything working, then you image it onto your install server. You can have as many images as you like, and you use PXE to install the machines. PostInstall scripts take care of machine-specific customizations. It turns machine installation into a piece of cake. FAI has a higher learning curve, in my opinion.


Read and research voraciously. Keep the users happy, and they are forgiving when you have *real* problems and tell them that you're very busy right now, but should be able to get back to them soon.


P.S. Solaris10 is really cool, you should give it a chance. dtrace beats any other tool out there for telling you why something isn't working the way you think it should be.

Tom
2007-02-13 05:56:35
I've been sysadmining since '92. I had 1 site w/ over 500 systems and 6 different Unixen. One site was so paranoid about security that monitoring was almost impossible. Then there's the places where the users (engineers) spec/order the equipment & throw it over the wall to you w/o thinking about infrastructure to support it.


Each environment was custom. Much of the time, Unix is an addon to the main Windows systems that people have at their desks. Unix is for servers and engineering labs where you have lots of onesies.


My current environments consist of a Solaris 10 install w/ 5 systems + 2 PCs as X11 terminals. It's off the internet in a locked room. When I got the equipment, I had to add a network switch, DNS server, space for /home, CVS server, fibre optic cable. Everything in/out of that environment needs to be on a CD. Automating that environment probably isn't worth the time investment. I will probably never build those systems again.


My other environment is a lab with users plugging in all kinds of network devices, PCs that hook to the corporate AD services and 2 Solaris systems. For 2 systems, building a jumpstart server isn't worth it. For the PCs, I leverage corporate as much as possible. They're managing over 200 PCs.

Luke Kanies
2007-02-13 09:37:31
Doug, I can't give specifics about what's wrong with the LOPSA stuff, but I know I wouldn't be willing to spend a ton of my time building up the site because I'm not convinced that the group has a chance of success. If the board spent more time producing content and creating a compelling site and less time talking about how to add membership, we might already be there.


The short answer seems easy: You have to create a site where people can talk about the things that matter to them, and then you have to spend a ton of energy marketing that site and getting them to use it. Of course, to do that you have to figure out what things matter to sysadmins, and then you have to find out how to find the sysadmins in order to get them to talk. Both of those are high-energy iterative processes, so the only chance of success is to have a few people who are willing to put out that energy and aren't afraid to make mistakes and be called on them.

Matthew M. Austin
2007-02-15 14:07:05
In my experience the real nuggets of sysadmin wisdom are found at the level of the individual application and OS/distribution sites - their web forums and email lists. The diversity doesn't really lend itself to centralization.
John Warburton
2007-02-15 21:46:10
In much the same theme as Luke's blog, is a presentation from the last LISA - "The Future of System Administration: How to Stop Worrying and Learn to Love Self-Managing Systems" by Alva L. Couch, Associate Professor of Computer Science, Tufts University


http://www.usenix.org/events/lisa06/tech/slides/couch.pdf


People complaining that System Administration is not exciting enough technically may want to muse over this slide:


Can you be replaced?
• Autonomic systems exhibit:
• narrow but substantial technical expertise
• compulsion to protect themselves to the exclusion of other considerations
• limited communications skills
• no social skills


Align yourself with the business, or get out.

Stephen Smoogen
2007-02-16 12:15:12
Luke, lets look at the standard Systems Administrator personality.


We are mostly control freaks... if we aren't by nature when we start the job.. we are within a year or two after some vendor installed piece of software not only removed all the data on the machine, but also somehow deleted the on-line backups. Or the time someone's script crashes the system because they pushed something from development to production without any testing (if we by some chance have a development and production set).


We are paranoid. We see that people are out to get data and control of 'our' systems all the time. After a bit we see how many break-in attempts occur from both inside and out.. and we tend to be lest trusting of others because we become aware of how much we (ourselves) fail.


We are cynical. We know the vendor is here to sell us stuff and will tell our managers or customers the most outlandish claims and then leave us with the fixing of getting whatever application working before some deadline. We also know the only time we get recognition is when systems fail... otherwise we are supposed to be invisible in most organizations.


We get very clannish. AIX people wouldnt talk to Solaris people and no-one talked to the HP-UX people. We come up with the oddest reason why some OS is better than the rest.. We know who our 'friends' are and don't trust someone else's system because it uses debs versus rpms versus tar-balls.


We are usually not sysadmins by training. There are very few places one can get a 'real' Bachelors of Science degree in Systems Administration.. maybe an associates or some 'certificate'. Many people end up being systems administrators out of being the one person who knew something about computers at the wrong time :). We aren't programmers or coders by trade.. but usually pick it up over time and are probably never very comfortable with it. A good many sys-admins will install the way they learned the first time because a) they feel lucky they got it working that time, and b) they don't feel they have the time, energy, self-will to experiment and fail. It is the reason why so many of us are afraid of automating ourselves out of a job. We feel lucky we have a job in the first place.. and well how much is the boss joking when he says "Well after we get that puppet thing in.. maybe we wont need as many sysadmins around here."


I say all of the above because I have done every one of them.. and have spent a long time talking to over several hundred other Systems Administrators who have done the same thing.


So you have a large audience who are paranoid, cynical, control-freaks who are scared to fail.. because they are not too sure why they are doing this job. This would be the big reason why there are few Systems Administrator organizations.. SAGE tried for years.. but seems to get bogged down in clan wars and 'people getting busy'. I only heard of LOPSA last night when I opened my SysAdmin magazine and saw an ad that I thought was a bunch of guys accosting the school nerd versus trying to be helpful (see cynical above).


I realize that is a bunch of negative stuff, but it really needs to get the light of day before we as sysadmins can move ahead. We have to be able to laugh at ourselves and basically self-check our natural tendencies to not-communicate so that we can share and grow.


Matthew Sporleder
2007-02-16 12:36:16
Consider it contributed. Here's parallel execution of any command you want. Just update the array, or get even smarter and use files, or databases, or whatever you feel like grouping together.



#! /usr/local/bin/perl -w
my @bsd_servers=(host1, host2);
foreach $s (@bsd_servers) {
$pid = fork();
die "Cannot fork: $!" unless defined($pid);
if ($pid == 0) {
print `ssh MYUSER@"$s" $ARGV[0]`;
exit(0);
}
}

Rytis Sileika
2007-02-21 04:12:52
Hi,


Very interesting. Should add something to it as well. I have a feeling that SAs somehow managed to escape and avoid various processes and techniques that developers are quite happily using.Configuration management is one of those things that are not standardised or at least well thought of.
Surely, SAs aren't developers, but developers are happy bunnies, when SAs aren't really...
Anyway, some of my thoughts can be found on http://operenv.blogspot.com/

John M.
2007-02-21 21:14:01
I think one of the problems encountered in sharing tools, from my perspective is that they tend to become encumbered if one does not approach it correctly. Many scripts I've written as a sysadmin belong to wherever I worked at the time, same for documentation. I was unable to convince management to license them differently.


Also, the majority of them are not fundamental tools. They are symptom fixers and glue scripts, localized to a particular organization's IT environment, working on top of whatever layers of poor implementation may have been forced at the time.


I think until solving fundamental IT problems becomes the priority
that is going to be the state of affairs, generally. Most business do not seem to want to pursue things at that level.


What happens when sysadmins go over the fence is things like Plan 9...


Where I'm working now is currently having to go down almost that far, literally back to fundamentals because of certain issues.


For example, I had to draft a very general document on backups just so management could conceptually organize their thinking about data, retention, value and risk. This is before evaluating new tools and hardware to meet their requirements, first we had to be able to intelligently discuss those requirements.


We have come a long way, though. In the 6 months I've been here, with the team turnover at about 75% and a 45% headcount reduction in sysadmins, we have unified authentication, SSO, a cfengine server, PXE/kickstart server, new backup environment, and a nagios/cacti monitoring system on the way. Not bad... Could be better. My scripts are still nothing terribly useful...


I'd hardly call a collection of python & perl scripts to ease some of the burdens of querying databases and creating accounts to be worth sharing since they are tailored to the needs and structure here.


I'd love to see advances in how system administration is done at a fundamental level. Unfortunately to some "advances" means put a gui on it.


- John

John M.
2007-02-21 21:33:50
Oh and it was hard to convince my co-workers that passwordless sudo + ssh-keys + ssh-agent and a for loop was a just a stop-gap until I could deploy cfengine. I got some funny looks, and narrowly avoided being burned as a warlock.


But for doing an iterative task over a list of servers, it beats how they were doing it... ssh into each one, type your password...
cut and paste the commands... If they were being "efficient"...


I wanted to cry when they showed me that...


so now when I do:
for i in `cat list_of_servers`; do ssh i$ "sudo whatever"; done;


I can smile benevolently when they complain how much extra typing sudo inflicts... My eye hardly twitches at all any more.
The fundamental tools are pretty good if you want to bother to think how to use them.


Another key issue I see is sysadmins who are very used to managing one or two boxes. The need for the techniques and tools you will reach for in large scale environments just isn't pressing. And they tend to gain their formative experiences on small installations. Then eventually they get 20-30 boxes, but it still works, it's not too painful. Until one day you have to do something on 500 systems that aren't setup with that in mind.


Ouch.
I guess I'd say don't stop thinking, don't avoid changing your technique to better suit the circumstances.


A good place for Linux/Unix sysadmins to expand their thinking a bit is http://www.unixtips.org/ -- I tend to point them there when they show signs of growth.

Deb
2007-02-25 11:41:59
Love Stephen Smoogen's paragraph "We are usually not sysadmins by training." and idea we got the job because we knew/appeared to know more than guy in next office. This isn't first job I got that way - for me, my first one had everything to do with cows, sheep and goats and nothing to do with PCs or servers. I answered 'yes' to a question about feeding cows during flu-induced staff shortage and spent next 3 years in the field and loved it:) I could get immediate help with a big oh-no problem but better yet learn signs,symptoms and cures during informal training, question and answers that are part of mentoring relationships.


Problem is, back then I had other other folks I could talk with- co-workers as mentors, teachers, knowledge experts who freely shared their tips, tricks, expertise and cow tales. Today I am on my own as the "Linux person" among a whole bunch of Windows folk who avoid *NIXes at all cost. I'm having fun but sometimes wonder if I am doing it all wrong, reinventing the wheel, or some such thing.


I'm not worried about losing my job - only doing it better, more efficiently. And I don't mind making mistakes as long as I'm only user who gets whacked. I am card-carrying member of the 'Click the button, See what happens' club - it maybe not best way to learn but tends to be memorable. But its lonely, not being able to bounce ideas and problems around with co-workers, and our Linux server count continues to grow.


...makes me miss my cows and my colleagues with their cow tales...

Carwyn Edwards
2007-03-15 08:01:07
I think the fundamental problem for many organisations with respect to employing best practices within systems administration is communication. Specifically being able to communicate the business level advantages of using said best practices to the people that release the time reseoursed required to make the initial transition to the new working models.


I think the first hurdle is gathering up a stock set of "this is why we should do this" answers that the people in the field can use to start convincing people. Best practices are useless without motivation to use them.


The other more controversial matter I think many sysadmins suffer from is inherent to the type of person that becomes a sysadmin. Many of these people are highly introverted and certainly not excuberent enough to start arguing with the powers that be. Luke mentioned something about pink slips? If you contrast this to the broader software development industry where artistic flair (e.g. web 2.0 or whatever it is this week) there are many more people willing to stick their necks out there and argue for different ways of doing things.


There are as many, if not more, people problems than software problems in this area in my mind. The mindset must come before the software will ever be written (or shared).


A related issue that I've only really realised recently is evident in how many systems admin positions are advertised. The classic "BSc in Computer science desired/useful/preffered." or "or relevant experience". Sorry, but someone that's learned their sysadmin skills tinkering away on home linux boxes at high school is not going to appreciate the vastly rich world of theoretical computer science when considering how to edit a few hundred /etc/whatever.conf files. Conversely one could argue that the Computer Scientist isn't going to appreciate the pracical implications of systems admin work. Please stop asking for one or the other. You need both.

Harold
2007-03-16 07:30:58
Luke, this was a really inspirational post. Thank you!!
Nancy
2007-04-12 17:32:26
Great post, Luke.
I have to say that you got my brain going on the topic of who are our heros. I've been a UNIX Sys Admin (Solaris, HP/UX, Linux) for over ten years, and I had to strain myself to come up with a couple of names. Of course, we've all worked with people we'll never forget. When I first started out in this field, I remember that there were about 15 people on our team. And there was this one guy who everyone knew could solve anything. I admired him so much that I decided I wanted to be like him. I wanted to be the one who could figure it out, no matter what it was. So, over the years, I've scripted, I've automated, I've looked for tools. A lot of people are better at scripting than I am, but I ain't bad. And, when it comes to automating repetitive tasks, I'm your gal. Nothing gives me greater satisfaction than knowing that all my ducks are in a row and all I have to do is throw a switch for all the lights to come on.


But who are my heros? David Chapa came to mind. But his specialty is in NetBackup. Tom Limoncelli also came to the front. Tim Maher came to mind, but he's more of a Perl guru. I got to thinking that maybe our field is really just a collection of areas of expertise. Just as networks are as individual as fingerprints, so are the admins who design and maintain them.


I like the idea of us forming better networking circles. Someone mentioned lofta, and there's sage. Still, I know what you mean when you say that we're stuck.


Thanks for the thought food.

David Nalley
2007-04-15 21:41:00
Hi Luke,


Several comments from my perspective. System Administration is often ignored - and almost treated as a commodity, even among IT companies. It is the rare copmany who realizes that it is important to think about such things. It is akin to how people think about HVAC or electrical service - ie they don't, they have abstracted all of that away, they see a service that they pay for and can't imagine that someone had to engineer a solution for their specific installation.


As for the comments on Tom, his book seems to differ. His lists of first three things to accomplish is an automated OS/application installs.


There are also a number of automated installation tools, like kickstart, openqrm, unattended, etc.


That being said, I have looked at puppet, and it shows a lot of promise, I plan on trying it out in the coming weeks and I'll let you know my reaction.

Bill, St. Louis, MO USA
2007-05-22 20:23:06
Great post!
I have a point to make please bare with me here...
I returned to college late in life and earned my BS in computer science at 45. I have since been trying to get experience as a "systems administrator". What I am finding is
1) No one really knows what a systems administrator is.
2) Everyone seems to be keeping there secrets to themselves if they do know.
3) Where does one start. Where is the check list of 'core' things one must know?


I am trying to get my foot in the door of an area of the IT field that seems to have a hard time defining itself. You mentioned passion. Well, I think it is very interesting that when I call a company who has a listing for a systems administrator "they" are not very passionate about me being passionate about wanting to earn experience. Why is this? Do I need to write a too (Then I'm a programmer) to make regular expressions easier?


The point I'm trying to make is here I am trying to find knowledge and information about how to become a systems administrator and I'm no finding any one overall straight answer. So I think the lack of best practices or published tools is not there because there are no defining edges within the systems administration field.


So here is my .02 worth:
1) Ensure that every single computer in your network is identical (stop shouting).
2) Document always and often.
3) Develop a log of your daily activities and how much time you spent doing them and then share this log with other systems administrators then graph the results. What would that look like?


P.S. How do I become a systems administrator and when am I really there? Thanks for listening.
Bill, St. Louis MO.USA

Luke Kanies
2007-05-26 15:45:53
Hi Bill,


I agree with your general complaints about being a sysadmin, and it's these complaints that have led me where I am now -- trying to build tools and community to push the whole field forward.


There aren't good answers for many of your questions, unfortunately; you have these questions and problems because the field is broken, in my opinion. Companies see sysadmins as a cost center rather than enablers, but that's really our fault, not theirs, and it's our fault that we haven't built a sufficient community to really define what our jobs are and to make it easier for new sysadmins to come into the fold.


Instead, it's like a geek club, and the more esoteric your skills the higher in the club you are, which seems pretty silly.


As to when you know you're a sysadmin... I'd say it's when you're getting paid to maintain computers.

Ken Brush
2007-10-11 14:54:16
I think you miss out on alot of the community that is out there.


Sysadmins seem to flock together and form these little social groups.


For solaris admins there was the solaris pm mailing list. I believe there were some lists for AIX admins as well. Poor poor HP-UX admins are screwed though.


Anyhow, back to the social group idea. For the most part we create one-off solutions that we then automate and forget about. That is, until a friend goes, "hey do you know how to fix X?" and you go, "Oh yes, I have this script..." (you find sysadmin mailing lists full of these).


It's really like an informal sewing circle. I think because there are so few of us at any one place. Even large datacenters like Bank of America only need about 8 Unix admins. So fewer people doing the job, specialized knowledge that is mostly system specific. That leads to a closed sharing group, just because of the small amount of generalized stakeholders.


Couple that with how many sysadmins are actually *good* at what they do. Small population with smaller population of *great* sysadmins == small social circles.

troczellet
2007-11-22 16:11:19
monlipa
Roger
2008-01-04 09:34:46
Why do we as Sys Admins continue accepting jobs where we have to support legacy systems? I've been a Sys Admin off and on for 30 years; its harder now than it ever was.


I have stopped looking at jobs (and turned down offers) that require supporting a legacy system. The hassles are too great.


The whole point of evolution is that the old stuff will die out and new stuff will fill its niche, providing a more hospitable environment in situ. The whole point of economy is to use minimum resources for maximum profit. In this case there is a massive conflict.


If Sys Admins would take a specific stance internationally and just tell the world "These are the minimum levels of systems that you should aspire to, and we won't work on anything else." After the first few million/billion dollar corporations start falling apart, you can bet the rest of the worlds' economies will start toeing the line.


I met a sys admin the other day who has an MS in Computer Science and worked internships with Unix and Exchange Server 2003 (Active Client) systems for a year. His first post-grad job is working with a WinNT network for a 4 branch bank system. He'd had the job for 8 months and was desparately looking for a way out. Though I really did feel sorry for the guy, it was kind of a "Sucks to be you" moment.


Its sad to see so many kids graduate with that kind of talent and end up either prostituting themselves to necro-legacy systems, just to get browny (experience) points in the industry, or keep going to school to get a job schlepping (teaching) the info to other newbies.