Automation, Continuous Integration, and the Follies of SQL Server
by Tim O'Brien
You can't write a large application without automating builds and launches. If your build process is repetitive and tedious, you have two options: pay a few full time employees to do nothing but configuration day in and day out, or pay one dude to write some elegant scripts for a few weeks. If you are lucky, your organization has well-rehearsed scripts and an operations team that speaks fluent Perl, Python, or bash. Your process involves continuous integration, and a product launch is a non-event - no one wonders if it is going to work. It just does. Read more...
Have you ever worked on a project that didn't have automated builds and deployment scripts? Or, if you have worked on a project that was flawless, what technology did you use for your build and deployment process?
Any interesting views on SQL Server? Am I just plain wrong about Windows? If you love SQL Server, tell us why.
Other JDBC drivers
The jTDS project builds on the Free TDS project to create an open-source JDBC driver for SQL Server/Sybase. For the Perl folks you can use DBD::JDBC with one of these to create a JDBC proxy -- not for production servers, but likely good enough for continuous integration/testing.
Note that you can also do some scripting with the 'osql' command-line tool, through which you can feed your SQL scripts to run against whatever you want. (Dunno about DTS stuff though...) I call it from a maven script for some simple drop db/create db/create tables/add data functionality and it seems to work ok.
Do not automate too much
Very nice article, except for one point:
"If you need to run it more than once and it involves more than 3 steps, automate it!"
If you follow that literally, you will fiddle with tools all day long and never get any work done. I think you really mean to say "Don't be afraid to automate if it will save a significant amount of time, or if it will reduce errors in a critical repetitive task."
The real rule is probably something more like this:
- If you have already run it by hand more than twice, AND
- You probably need to run it again at least five more times, AND
- It involves more than three steps,
then run it one more time by hand and automate it.
Other JDBC drivers
Great comment! jTDS and TDS are great tools. I'm currently using an ancient version of JSQLConnect which works, but I can't stand spending money on a JDBC driver. For all I know, I may be already using jTDS unwittingly, I found this statement on the jTDS project page:
"Quite a few of the commercial JDBC drivers out there are based on jTDS (or FreeTDS), even if many of them no longer acknowledge this." - jTDS project page
I'm amazed at the number of people already using Maven release candidate 1. Jakarta Commons is in the process of moving builds over to Maven, and people either love it or hate it. Maven is a great thing, but I'm finding that Ant still reigns supreme on the projects I'm involved with. It would be interesting to poll people about experiences with convincing people to move a build process from Ant to Maven.
Do not automate too much
I know this is a somewhat extreme position. Taking the time to automate the process may not always reasonable, but I've seen the other side. Employees that spend one hour a day executing the same series of commands by hand to rotate a log or run a build script. I guess my extreme position is more appropriate for an organization in need of remedial help.
Maybe we're not going to come up with a series of simple rules - every situation is different. My amended rule is now:
- If you experience frequent bouts of deja vu or if your job reminds you of the movie Groundhag Day, automate.
This sounds like a personnel issue to me.
If the DBA does not feel comfortable with Perl/Python/VBScript, bcp, or osql/DBI/ADO, that is a concern.
However, if you are shipping to less technical clients, and they need to maintain or modify the data flow, it is probably a good idea to look at DTS. Keep in mind that you can programmatically create DTS packages.
If your code is really that large and unmanageable, try collapsing a few steps into a VBScript task, PerlScript task, or more complex TSQL task/stored procedure (procs are faster and more secure, generally, anyway). You can also break packages into more general, reuseable sub-packages, to be called by a master package.
Everyone should know how to script or program, but that will likely require some pretty diverse environments.
Other JDBC drivers
I've been using Maven for about six months or so and like it quite a bit, although I wish it didn't use Jelly for scripting and that the learning curve wasn't quite so steep. And I don't use maven for all the reports, generating websites and such, mostly just to:
- easily track and maintain dependent library versions,
- provide a carrot for standardizing directories and such -- creating a new module (subproject) takes about two minutes since most of the functionality (compile, JAR, deploy) is done elsewhere
Back on track: the task I use to create the database and drop the tables is actually just an Ant 'exec' task that calls osql and passes the appropriate parameters.
Do not automate too much
In a brilliant piece by Tim O'Reilly, the paranoia regarding Google's GMail is discussed shortly and dismissed, point by point.
Tim then goes on and focuses on the real issues regarding Googles take on information and information management. O'Reilly...
Personal Issue Agreed
I appreciate that DTS with SQL Server 2000 isn't the finished article, but I can only say see what SQL 2005 has to offer. I'm sorry to say this article is just another MS v Java "handbags at dawn fight". Grow up!
"you'll have to work with a diagram that ends up looking like a rat's nest of complexity" I thought you were a Java coder and isn't Java OO based? Im not saying for one second that DTS is OO, but all I can say is "Break it down" like I do to all my staff when they don't write object based code. Nest DTS packages within a main dts packages, create stored procedures to do specific tasks and so on. Have packages (object) dealing with specific tasks - encapsulate and re-use!
As for things going wrong e.g. "problem in SQL_DTS_TASK32". Again are you a coder, do you use error handling in Java e.g. Catch?? Have a workflow to traps failures and report that error in a error table or event viewer etc stating what went wrong and where. In my packages I know what step it failed at, the err num, err desc and so on. Easy to track down!
As for VB written critical apps, its one of the most popular RAD enetrprise tools out there isn't it?
I think Java is great and if the coder has a good solid grounding in OO techniques then the world is your oyster (see MS people don't slag off the "other sides stuff"!), but it is a world of "horses for courses". Some companies will adopt the Microsoft route and therefore such tools as VB6, .Net, SQL Server etc etc wil be commonly used.
If you are going to put a technology down, then know it inside out as the arguments that you give against SQL Server DTS are weak at best. I would never dream of writing negative things about Python as I don't know it well enough.
"you'll have to work with a diagram that ends up looking like a rat's nest of complexity" I
thought you were a Java coder and isn't Java OO based? Im not saying for one second that DTS is OO,
but all I can say is "Break it down" like I do to all my staff when they don't write object based
code. Nest DTS packages within a main dts packages, create stored procedures to do specific tasks
and so on. Have packages (object) dealing with specific tasks - encapsulate and re-use!
Sounds good, but reuse through a series of modular DTS diagrams sounds like a recipe for even further disaster. Now you've taken code that should be easy to navigate, easy to version, easy to export to some external text format, and you've essentially created a fancy nested Visio diagram that "captures" your database import/migration logic. You've put probably the most critical piece of code in you entire system into a GUI tool.
Unfortunate A: "Hey, something went wrong in last night's data import, I've got an error message 'Error in Preparser Master List'"
Unfortunate B: "Sure, see the red squiggly line up there in the right hand corner? Right click on that sucker to open up a sub-diagram that should contain a bunch of lines that all connect to a big brown dot. See that? OK, right click on that, you should see a new diagram with a script on it. Doulbe click that script, that's were we had an error?"
My point is that DTS diagrams as a foundation for your business are a less than transparent technology. The tool doesn't provide a sensible mechanism for exporting packages to a text format that can be easily versioned, and once you start using it you tend to be locked into SQL Server and the DTS tool. If as an organization you are putting most of your logic in Stored Procedures than that's a good thing, but relying on DTS as the glue between a series of stored procedures still puts you in the same situation.
Now SQL Server 2005, who knows? Could be a different tool entirely?
That what I was essentially trying to state. By breaking things down into sub DTS packages AND stored procedures etc, you end up with smaller more maneagble chunks. This way you can open sub DTS packages and not be daunted, and from there see stored procedures being called etc.
As for the errors occuring, as I said in my last post, you can trap errors and throw to an error table stating what went wrong and where.
As for "Unfortunate A" it is only like support seeing one of your errors in an event log etc and seeing something like:
"Error Number -123456
Error Description Invalid Procedure Call
How is that different to my error table of:
"Error Number -123456
Error Description Cannot insert null into ID field
DTS package Customer
To anyone who doesn't know a UML class diagram or in my case how to navigate round a DTS package, then anything can look terrifying and strange.
As for "Unfortunate b", it is only like me saying we have an error in class xyz and method abc whilst point at a UML class diagram.
If one is unfamilair with a given environment, then DTS encapsulation and / or OO encapsulation can look daunting.
I totally agree that it is crazy to place all you intelligence in process flows i.e. DTS, but most of my DTS routines have 80-90% of their intelligence in stored procedures. I use Active X scripts within DTS to control processes and set variables etc and use workflows purley as guidance.
But what I cannot understand is what is so bad about being tied into a MS technology? It is only like being tied into Java, Perl and python etc.
A company will make a business decision on which technology to adopt and if it's the MS route, then so be it. It doesn't mean that everything they do will be done with DTS. The business foundation could be (and is most likely to be) built on a .net framework solution(s) and utilising other tools such as DTS, Biz Talk and XML as ways to get certain tasks done e.g. For ETL we'll use DTS and schedule it in SQL Server.
Like all technologies, they go through lifecycles. People who built there systems round COBOL (and tied themselves into it) now have a MS or Java based system which they are tied into.
DTS do have versions, try select * from msdb.sysdtspackages WHERE [Name] = 'dtspackagename'. You will see versions of that package.
You can export the contents based on that statement to a text file, place it SourceSafe (or Harvest or whatever you use) and you have versions. If you want just the latest version, then you could use:
SELECT T1.* FROM MSDB.dbo.SysDTSPackages AS T1
(SELECT [Name], [id], MAX([CreateDate]) AS [CreateDate]
GROUP BY [Name], [id]) AS T2
ON T1.[id] = T2.[id] AND T1.CreateDate = T2.CreateDate
WHERE T1.[Name] = ?
Agreed that this isn't as easy as sticking a dll or class into SourceSafe, but as I said in my first post DTS is still growing up into the entrprise tool that it will be one day. Like everything in its infancy, they will never be a finished article first time round.
I was just thinking on your “Unfortunate B” example. The same could happen in any enterprise n-tier system where errors are not propagated in a system with complex inheritance. We could go through the same scenario in Java based server side N-tier system: "See that human object, that is inherited by the male object which in turn is inherited by the PoilceOfficer object, well in that object in a Method called UseGun is probably where the problem is"
I’m not saying that DTS should be used for every single task known to man, but as an ETL tool, it is a commonly used and in my opinion a good tool.
Like anything, the work is only as good as the coder who wrote it. I've seen some nightmare spaghetti code in VB6, VB.net and C# systems that I've had the pleasure of code reviewing and providing advice to clients on re-writes and re-works. A poorly designed (and complex) DTS package will present and put off people like the example you highlighted.
DTS is just an ETL tool and nothing more. The fact that is comes with a GUI rather than a file from a text editor, should it matter? To some people, seeing a file full of script may appear like a rats nest.
Plus to go back to your original article, it seem like you are using DTS to deploy an application? It sounds like the tool is being used incorrectly; no wonder you don't like it!
Anyway, I'm not here to convert a man who prefers script files, but what happens when your scripts go wrong and the dude who wrote them has moved on to another contract? They in my opinion are far worse to debug and isolate faults and this is coming from someone who has work on UNIX systems.
I don't want top start a slanging match, but I don't like to see a tool talked down when it is proven. Especially when the example you painted is not what it is intended for. DTS is not a deployment tool, it is a Data Transformation Services Tool.