Java's Operations Problem

by Tim O'Brien

How many people out there have the following problem (as summarized in the following statement)?

"My application isn't performing well in production because of some heap settings, and there are some configuration changes I need to make in production, but my operations group won't give me access to the systems I need access to. I'll talk about JNDI configuration of the DataSource, the heap size parameters, how to start the JMX console, but it is like my words are hitting a brick wall. Our platform is Java, but our system administrators don't know the inner workings of the JVM heap or the significance of a stack trace. Every time something happens to a JVM, they escalate it to management as an unacceptable bug, but more than often is was a misconfiguration problem. To compound the problem, I haven't been able to find Java programmers who are aware of the concerns of our operations group, and they've reached a very inefficient equilibrium. The administrators are always pointing at a stack trace and blaming the developers and the developers are always blaming the administrators. Every week there is another political blame festival, and I'm getting tired of the conflict."


read more...



10 Comments

JCN
2007-01-11 12:30:04
Hell, I'd be happy if we had an operations group.
Tim O'Brien
2007-01-11 12:33:37
@JCN, well there's that problem too.
magoo
2007-01-11 14:30:34
just thinking out loud...


If it wasn't for the applications, there would be no need for a system to administer. Of course, if the applications don't work then then the customer goes away :)


Part of the reason for the Admins getting all BOFH is the seperation of responsibilities you talk about, it becomes a blame game - and who gets the blame when the config changes that make application_new run better screw up application_old ?
1 - Of course, you should have some sort of isolation between the two, but while your example talks about JVM settings, in this day and age it could quite easily be firewalls, network configuration etc.
2 - What's more important - the stability of application_old or application_old ?


I like your second fix (the knowledge transfer), but there can be issues with defining how much data is enough.... in my current environment (ERP Administrator for an outsourcer), my group would need the following:
data on at least 4 DBMS from the DBA team,
data on at least 3 OS (not counting teh "subtle" variations between AIX 5.3 and DEC (sic),
and we'd need to provide data on aproximately 12 years worth of releases (SAP R3 2.2 --> NW2004s).
Anyway, I have enough trouble keeping up with MY field let alone keeping up with everyone elses stuff !!!


You should never be porting critical / important (the only distinction is whether the customer thinks they're important or you think they're important) apps directly to a different archetechture - if you develop on windows for a *nix environment, then there needs to be *nix staging error for appropriate testing.

Tim O'Brien
2007-01-11 14:45:08
@magoo,


"You should never be porting critical / important (the only distinction is whether the customer thinks they're important or you think they're important) apps directly to a different archetechture - if you develop on windows for a *nix environment, then there needs to be *nix staging error for appropriate testing."


Right, and I guess here is where I part with Javaland. I'm assuming a controlled production environment. Assume I'm talking about a web site like nytimes.com, forbes.com, or merck.com. f your production platform is, let's say, Linux, then your development team should preferably be coding on the Linux platform. if that's not possible they should at least be staging to a Linux platform.


The idea that Javas platform-independence means that your developers don't have to be aware of the target platform - that's the myth that a lot of organizations hold on to. (i.e. we all develop in Windows, but deploy on Linux)


2007-01-11 23:42:34
magoo hit the nail on its head.
I've worked in an environment where the blame game was deadly serious.
At some point it started to almost become corporate policy to pick on the development team, even for things we had nothing at all to do with.
Like the time we got the blame for a harddisk failure at a customer site, after all the customer had called that the application wasn't working (not joking, the person taking the call assigned it to software development to "fix", and when we found out it was a hardware problem the problem ticket wasn't removed from our count (department performance was judged based on the number of reported problems assigned to them). Essentially everything that went right in the company (like a successfully created and deployed piece of custom software) was credited to sales or sysadmin, everything that went wrong anywhere (like a project overrun when sales had ignored estimates and sold a project calculated to take 500 hours for 100) was blamed on software development.


That company may have been extreme in that, but I've seen the pattern everywhere where people or departments get punished in some way for problem reports against their services/products (which is most companies).

Mike E
2007-01-12 00:34:03
In my experience there's a little of both going on. In fact it's more likely you'll run across an administrator who can tune heap sizes, fix up config issues in the j2ee container and do a little heap profiling, memory leak debugging etc, than a developer. That's a little harsh (disclaimer - I come from a system admin background but have worn both hats). Honest opinion - it's a partnership - give your admins read access to the source and encourage them to use it. Give your developers read access to the boxes and encourage them to use it. Get the admins opinion during development, and post launch make sure that when the brown stuff hits the fan, that the admin and developer work together - they'll both learn something! (in short, I semi agree)
Carfield Yim
2007-01-16 23:59:35
Nice article, however some environment separate development and admin very distinct, Developer cannot touch product machine and they cannot talk directly to the person who can touch those product machine (admin)
Kisakookoo
2007-01-23 22:21:14
Hi! Why I can't fill my info in profile? Can somebody help me?
My login is Kisakookoo!
Kisakookoo
2007-01-24 15:29:50
Hi! Why I can't fill my info in profile? Can somebody help me?
My login is Kisakookoo!
Kisakookoo
2007-01-25 14:02:16
Hi! Why I can't fill my info in profile? Can somebody help me?
My login is Kisakookoo!