How are you syncing files across systems?

by Brian K. Jones

So I've been taking an informal poll of the sysadmins I know to find out how people are managing the synchronization of files across a server farm. Looks like there are three popular ways of handling this, which I'll list in no particular order:

First, there's NFS. There are numerous places out there that have a central file server, and then the server farm mounts, say, /opt or /usr/local or something, and then there are lots of configuration files and stuff underneath those trees somewhere. The benefit of this method is that you can make a change in one place and have it take effect everywhere more or less immediately. The downside, as I see it, is something I call the "christmas light syndrome": if the file server goes down, any services on any hosts relying on the mounted files become unavailable.

There's another upside to the NFS scenario, which is that your server farms can mount config files read-only, which offers some protection should the machine be compromised in some way.

The second popular method is to use rsync. The upside to this method is that all files are local to the machine, so services on your hosts don't depend on the availability of a file server. The downside is that generally there is some glue code and duct tape involved, which means you're maintaining code to take care of all of this, which means there's not really a standard procedure per se for handling file synchronization with rsync. In addition, you don't have the benefit of having your config directories mounted read-only, which is just one less protective measure.

The third method is cfengine, which is still hanging on my list of things to make friends with. I tried using it during the version 1.x days, when it was quite a bit more difficult to use. I'm aware that the 2.x versions are much, much easier, more robust, supports RSA keys, and all that jazz, and I promise that once I get through the three projects currently on my plate, cfengine is number 4.

If you're using other means of handling file synchronization, or you just wanna plug your favorite feature of cfengine, or have a cool rsync hack or something, please share!


12 Comments

Kurt
2006-10-31 01:49:19
I had fun with NFS a couple of years ago... Got this idea from AFS where you can mount your last backup read-only. I wanted a live backup volume on a 2nd server to mirror a group server. If the server were to loose its disk, just pull the backup user disk from the backup server, shove it in the server and boot. The backups needed to maintain permissions and ownerships, but nowing my users, they would see the free space on the backup server and start using it for work. The script I came up with did an rsync to the backup server. The backup disk was mounted under a tree only readable by root. Then I used nfs to export the volume readonly back to an area that users could see. This way, when they accidentally nuke some critical file, they ssh to the server and grab the backup.
Simon Hibbs
2006-10-31 04:55:47
>First, there's NFS. ....The downside, as I see it, is something
>I call the "christmas light syndrome": if the file server goes
>down, any services on any hosts relying on the mounted files
>become unavailable.


If you use automount, you can specify alternate locations so that if one configuration server goes down, you can fil over you're /usr/local mount to access a different server.


Of course that means keeping the configuration files on the two servers synchronised, but nothing's perfect.

William
2006-10-31 07:17:13
I've been using Subversion, with a central server and automated checkouts on the client machines. The clients don't have upload access, so it is a pull-only system, and by running periodic diffs I can see if a machine has had it's local files modified and roll back to a known good state. There's still a fair bit of scripting overhead, but I like the control and auditing abilities that it gives me.
Jeremy Fluhmann
2006-10-31 07:20:34
I'm not a sysadmin (yet), so I've never managed a server farm. I also don't know what files necessarily need to be synchronized, but I was thinking the other day about how to synchronize config files across multiple servers and keep a historical information about them.


I'm currently in a programming position and have started utilizing subversion for version control. My thought was, what if config files (or whatever else) were kept in subversion? A person could modify the config file on one server, make sure it works like it's supposed to, commit the changes, and have the other servers automatically request an 'update' from the repository. Keeping the files in subversion would also provide a way to revert back to a previous configuration should something go wrong with the changes.


Feel free to let me know that I don't know what I'm talking about. As I said, I'm not a sysadmin, so I don't know what's out there and I'm not up-to-speed on best/common practices.

Ronnie
2006-11-01 09:08:28
What i need to sync are the websites to get a backup and also an email back up, so i wrote a small script put it in the cron and it has been working ok for some time now.
For my needs rsync does the job perfect.


Chaim
2006-11-02 12:14:16
Jeremy, Your idea is a good one. So good in fact that sysadmins have been doing it for years. :-) They call it configuration management and used CVS before Subversion came along.
Joris Vuffray
2006-11-07 03:48:05
Unison is also a powerful alternative to rsync: http://www.cis.upenn.edu/~bcpierce/unison/
Tom
2006-11-07 06:50:06
I've been using unison over SSH to sync my ~/bin (and a data directory) on my home linux server and my work desktop running cygwin. Unlike rsync, it does two way syncs.


I use google browser sync to do my bookmarks.


At work, I also have a laptop and I have My Documents as a folder redirect. It will sync when I attach to the network.


Ideally, I'd have $HOME as an NFS/Samba mount and My Documents would be in there, but IT keeps Unix and windows seperate.


Andrew
2006-11-10 09:48:21
It's pretty much Linux only, but I have used and NBD patch for the Linux kernel.


This allows you to create a network block device which mirrors writes across machines. Throw in a heartbeat application for failover and you can have a cheap (and dirty) hot-cold NFS cluster.


Cfengine is great for what it was designed for - configuration management, but it is a little slow when it comes to syncronising a larger number of files. Perhaps the lack of performance is because I am using md5 for file checks...

TG
2006-12-01 10:39:59
I make heavy use of the copy action, relying on md5 checksums to determine when a copy is appropriate and activating shellcommands classes when the copies take place to take care of restarting services. It's been a *lifesaver* for our company.
djlosch
2007-01-24 18:17:35
I wrote a huge write-up on this (http://www.djlosch.com/post_retrieve.php?pid=106). However, my scope is more of a cross device sync. I expect this to become a much bigger issue as phones become handhelds (many functions) rather than single taskers. My design uses something more like rsync than roaming profiles (aka central server profiles) because with portable devices, there is no guarantee that the server will be available when $HOME is requested.
jigar
2007-04-03 23:56:48
Well, I am a newbie in all these server stuff..I have a task in which I need to cross mount 3 server data, I need to provide only one login to access data of all 3 servers..I want to know how I can do this.