Continuent: clustering software for databases

by Andy Oram

Related link: http://www.continuent.org/



Because databases are important repositories and the lynchpin of any
application that uses their data, clustering is a critical technology.
Most database vendors provide their own clustering solutions, but they
might not be suitable to average users who simply want to throw
together some systems and say, "Do what you were doing before, but
just replicate everything and execute an automatic failover if
necessary."



Now there's an open-source solution for this simple clustering
configuration.
Continuent
is a database-independent project that handles clustering and provides
simple management interfaces.



I talked last week to Continuent spokesperson Emannuel Cecchet. The
project has employed eight engineers since January of this year and is
funded by
Emic Networks,
a long-time provider of clusters for MySQL. The code is released under
the Apache Public License.



The idea behind Continuent is that you can simply run its basic
software, known as Sequoia, on two or more systems that host databases
and have it handle your clustering. Any query directed to one system
is automatically broadcast to the others.



Sequoia handles transaction scheduling, and allows all systems to be
updated aynchronously at the speed of the fastest node. Sequoia also
ensures that the user always gets data from a fresh copy where all
updates have been applied. Failover is accomplished automatically.



The group communications behind Sequoia replication is based on a
component called Hedera that allows developers to plug in various
implementations. Hedera currently comes with the popular JGroups group
communication library.



The broadcasting is more coarse-grained than the scheduling that
databases do on their own, but it ensures that the software is
database-independent and requires no special hooks into the
databases. It has proven efficient enough for moderately heavy
database use, particularly in read-heavy applications (about 80%
reads) that are the norm. But it also scales well with heavier write
workloads.



Continuent grew out of a project called c-jdbc, which was hosted at
the ObjectWeb Consortium and proved quite popular with 50,000
downloads. As the name suggests, the project is written in Java and
started with a Java interface. It is now expanding to offer a C++
interface (called Carob) and to replace its cumbersome ODBC-to-JDBC
bridge with a native ODBC implementation.



Management is through an Eclipse plug-in named Oak. The team hopes to
work with the Eclipse database tools project to do further
integration.



While Sequoia is usually employed with homogeneous database instances,
some sites find it useful to help them migrate to new versions of a
database. New versions can be dynamically and transparently added to
the cluster while the administrators work out kinks.



A few intrepid sites have also mixed databases from different vendors.
For instance, if they consider it necessary to do sensitive and
mission-critical work on Oracle, they may create a cluster with the
critical data on Oracle and less critical data (such as static
content) on MySQL. Different tables can be stored on different cluster
nodes, and Sequoia directs queries to the appropriate node.



Although c-jdbc was originally released under the LGPL, its team found
that the APL was more suited to this project. This is mainly because
the main interface and library are Java, and it's unclear how to apply
the LGPL to Java code. Cecchet said the team sensed that many
potential contributors were keeping all their code proprietary because
they could not be sure how to split it between free and proprietary
components.