Avoiding CMS Data Lock-in

by Scot Hacker

Related link: http://journalism.berkeley.edu/

When an organization prepares to rebuild a large, static web site, the usual buzz is "Which content management system should we use?" On the webmasters mailing lists to which I subscribe, the question is more or less constantly aloat in one thread or another. The question that's rarely asked is, "Is a CMS right for us?"

After nearly a decade, we're finally getting ready to re-deploy an institutional web site at work. The current site is around 1400 mostly static pages, all with content deeply enmeshed in layout tables, full of font tags and tons of non-validating code. The gameplan for the redesign is:

  1. Determine which outdated pages can be discarded

  2. Discard old pages and internal links to those pages

  3. Develop a semi-automated way to make all remaining pages XHTML-compliant

  4. Suck the guts out of each page and into a CMS or centralized templating system

  5. Make sure no critical layouts or formatting have broken in the process

  6. Apply new design template at the CMS or templating system level

  7. Refresh old content, add new content

As I looked into and thought about content management systems, I came up with a list of challenges I'd have to overcome:

Data lock-in

With a static site, you can switch HTML authoring tools as often as you like. But once you adopt a CMS, you're committed to a product in a way you weren't before. If you ever decide to switch to a new CMS, getting your content out of the existing one and into a new one will probably be a major chore. Even though we publish a lot of student sites with Movable Type, the idea of moving our entire site to a CMS, i.e. walking into the data lock-in scenario, gave me a slight case of the willies. Fear of commitment, you might call it.

Maintaining existing URLs

With a large and much-linked-to site breaking incoming links is not an option. Either the CMS we choose has to give us the ability to specify the output location of each database record precisely (keeping in mind that we have a mix of .html and .php pages, even though both extensions are PHP-parsed in our case), or do some very intense work with mod_rewrite. I was encountering a lot of mixed reports from CMS adopters on keeping old URLs intact.

Integration with existing editing tools

The few people who have editorial access to our site are died-in-the-wool Dreamweaver users. I'm a card-carrying BBEdit freak. Moving to a CMS would mean editing all content through web forms. Any CMS we chose would need the ability to send content out to a desktop editor of choice and round-trip it back into the database. Some CMSs can do this, but most can't. This requirement greatly limited our choices.

Ability to mix existing static and dynamic content

Good CMSs provide a platform in which to both provide static content and to build custom web applications in PHP or other languages. Over the years, I've built a ton of custom PHP/MySQL solutions -- jobs database, alumni database, course catalog, internal contacts database, student and faculty story submission processor, etc. Any CMS we chose would need to let us seamlessly integrate those existing applications into its framework. Probably possible, but could be tricky or difficult with many CMSs.

Ability to easily search/replace through content

It's trivial to search and replace text strings through a static site with command-line tools, BBEdit, etc. But not all CMSs offer search/replace functionality inside of databased content.

While it would probably be possible to find a CMS that satisfied each of those potential problems, it would still leave us with data lock-in and, in some ways, less flexibility than we have with a static site. Given that my main goal in considering CMSs to begin with was to achieve total separation of form and content, I started to question whether a CMS was really what we needed.

After a few days looking at ways to attach templates to pages with simple PHP, I found what I was looking for. Smarty Templates is an off-shoot of the PHP project. Without having to adopt a CMS, Smarty gives me server-side caching for fast execution, an easy way to attach a single template to thousands of pages, a methodology for separating out the logic of web application programming from design (so designers can make changes without having to risk breaking PHP code), a ton of plugins... and satisfied every one of my design goals. No data lock-in, people can still use their preferred editing tools... I think we've found the perfect fit. The only big CMS feature I would have liked to have had is user-level management, so I could give permissions to various staffers to edit their own content -- we'll still have to rely on filesystem permissions for that. All sites have unique needs, of course, so my conclusions will be different from yours, but I think we've found a good fit here. Still in the early stages of this project, but it's going to be an interesting summer.

Am I missing the boat on this one? Let me know what you think.


2005-12-19 23:10:21
Institutional repositories
Good article, and a nice solution for the more tech -minded members of any organisation.

Does Dreamweaver have a PHP 'mode' like it does for some of the other CMS's products? That would make things even easier.

Another option on the same lines is to use SSI (server side includes) in the same way you use PHP - it lacks the power of PHP but can still be pretty good.

You could also add versioning by convincing your users to use CVS or some other versioning system.

As for the preservation issue, I believe a new generation of system called 'institutional repositories' are working to solve the preservation issues you mention while offering more sophisticated management facilities, and better usability for end-users.

AFAIK these system offer all the facilities of your 90's CMS, while taking advantage of advances in preservation, interoperability standards, and distributed systems.

DSpace, FEDORA(not the Distro - see www.fedora.info), and HIVE (Harvest Road)