MySQL conference wraps up with a vision

by Andy Oram

Related link: http://www.mysqluc.com



On the last day of the

2005 MySQL conference
,
I finally heard a speaker who stretched the audience's assumptions and
pointed toward a liberating path forward. This is the sign of a good
conference, incidentally--most of the sessions deal intensively with
the problems of today, but one or two keynotes prepare the listeners
for tomorrow.



I wrote in

my earlier weblog about this conference

that MySQL was becoming conventional. Many people are doing innovative
things with it--I sat in today, for instance, on a session about MySQL
as an embedded server or library--but the largest attendance has been
reserved for traditional topics such as replication and performance
tuning. MySQL AB itself is concerned with catching up to its
competitors in terms of SQL features that centralize more and more
control in the database engine.



Adam Bosworth, in his keynote today, threw all that out and set his
ship headed in a different direction. The problem he found with
centralizing processing--with stored procedures and triggers and so
forth--is that it doesn't scale. His talk also implied that it
restricts users from making innovative connections. Google, his most
recent landing place during Bosworth's long and impressive career,
illustrates an entirely different way to handle data.

Adam Bosworth's view of an open data query protocol



The promise of the Web was to aggregate the contributions of
individuals everywhere and make retrieval easy along any lines one
chose to use. As the volume of content became unmanageable, XQuery was
supposed to provide a Web-aware search mechanism, and Web Services the
infrastructure and protocols to connect sites. XQuery and Web Services
were too big and came too late, however. Nobody actually wants to use
them, even if they know how.



So the gap has been filled with RSS, the model highlighted by Bosworth
for the next stage in search. RSS and Atom are lightweight and easy to
understand. The put control in the hands of the content providers and
the potential viewers.



Bosworth's extended vision is for a protocol that provides raw access
to data, somewhat as XQuery is supposed to do. It would be a very
simple and database-independent protocol that would make all data in
the world open. Then, he says, everybody could do what Google
does. And more--we could provide distributed updates too.

Where to impose structure



The Google approach to data, carried through in Bosworth's vision,
runs head-on up against the ideals of the relational database model.
The entire relational approach, from the canon of Third Normal Form
(three is a holy number) to the enormously complex collection of
analytic functions, subqueries, and other ways to impose structure in
SQL, is an attempt to be as precise as possible about the data chosen
and returned.



Bosworth isn't interested in that. If the user gets a few hundred
results and has to scroll through them a little bit, that's fine. We
don't need no stinkin' metadata or knowledge management.

The philosophical debate underlying relational database design



Bosworth evoked earlier debates that I've found valuable and aired
several concerns of mine; his views of the XML specs and RSS/Atom are
familiar. But his brief critique of the trend toward putting more and
more features into the database engine--a critique that he whisked
through on the way to grander visions--left open a question about the
basic philosophy of SQL.



When MySQL was bare-bones and lightweight (which it still is compared
to commercial database management systems or PostgreSQL), it put
responsibility in the hands of the application programmer. If a value
was supposed to be limited to a particular range or two columns were
supposed to be entered in tandem, it was the application programmer
that made sure of it.



In contrast, traditional database design takes as much control away
from the application as possible and puts it in the database. A
constraint or trigger or stored procedure or foreign key can make sure
that no one gives someone an absurdly high salary or fires an employee
while leaving his phone number in the database.



This centralized control is a relic of the 1970s, when corporate staff
would sit at command-line processors and type in SQL to do what they
wanted. Nowadays, when an application and even a Web interface stand
between the user and the database engine, the never-trust-the-user
philosophy is less valid. At the very least, an application has to
know the rules the database is enforcing and translate error messages
into something the user can understand. The wall between application
and database engine is porous, so the application can take on more of
the validation and logic.



But both philosophies are valid, and now MySQL offers a choice. I
suggested to Arjen Lentz, the organizer of this year's conference,
that he offer a debate next year between the application-aware
philosophy and the database-aware philosophy--when is each
appropriate?



Most of us still need to find that phone number for an employee and do
other everyday tasks; we'll be using a relational database for that,
and MySQL will be providing that service for more and more sites. The
people with day jobs who came this year to find out whether MySQL
could bring home the bacon got their answers. But MySQL can also
support fun applications, and I hope to see more coolness next year.


3 Comments

InsightKnowledge
2005-04-21 14:45:44
MySQL, data, information and knowledge
Dear Mr. Oram,
I read your article with interest and would like to share some thoughts in response.


I agree with Mr.Bosworth that putting more and more features into a database engine is the wrong approach. In order to describe why, I have to take a step back and look at the relation between data, information and knowledge.


Data is managed by DBMS, information by knowledge bases, faq systems and the likes, knowledge - well, there are a few out there which actually correlate information to allow for new conclusions.


To build information- or even knowledge mgmt. functionality in a database system, is to build a system to do something it was not designed for.


Wouldn't it be better to define flexible interfaces and functions (services?) to be able to be used by info/knowl. mgmt. systems ?


For my own experimentation, I am looking at tikiwiki as a system to evaluate knowledge mgmt. with. it is build using mysql (or other dbs) as an underlying dbengine and does a great job.


Knowledge is distributed and unstructured and changing all the time. To discover knowledge, a system would need to be able to be distributed, manage unstructed information and adapt to changes.


Could this be a feature for DBMS ?


Thank you,
Stefan Lafloer
CKO InsightKnowledge Inc.
http://www.insightknowledge.com

ajeru
2005-05-01 07:21:05
google voodoo
I don't quite see how it is useful to apply the requirements of an internet search engine to all types of data processing. Adam Bosworth's only concern seems to be how huge amounts of poorly structured data can be narrowed down to a few hundered links. That's what we get from google today (thanks by the way) and what we got from Altavista many years ago in roughly the same way.


The technology is actually so primitive that I can't even find a book written _by_ a particular person without finding all the rubbish people have written _about_ this person. I would be very interested to hear how RSS is going to solve this problem. And I would be very interested to hear how people are supposed to analyse their sales figures using text queries on top of Atom.

terris
2005-05-04 18:23:30
google voodoo
"sales figures" ...


RDBMS isn't going anywhere. Sales figures will continue to be computed by a SQL query processor and archived on other systems such as data warehouses.


Has XML lived up to its promise? What were those promises? How about "data everywhere?"


1. Is the data available by a consistent standards-based interface?
2. Is the data consumable by a consistent standards-based interface?


XML has done a good job satisfying the latter but not the former. HTTP is closest to satisfying that goal but it is too open ended. SOAP has failed due to its (very much intended) complexity.


XML has not lived up to its promises. However, if you look at RSS as a publishing mechanism, it has succeeded very well.


How about an RSS feed for updated sales figures? I use RSS to track new CVSNT releases.


Why not highly targeted RSS feeds? And, if they are not highly targeted, why not have RSS feeds that accept a query?


Is this better than using SQL*Plus? Damn right it is. Can I write code to get the data via RSS? I sure can. What would my application do with the information? Who knows. How about making an RSS feed for current stock prices?


Are there other solutions to RSS/ATOM? You bet. Pick your poison (API). The advantage of RSS/ATOM, like ODBC and JDBC, is that the interface is well known and tons of free software, sample code, and tools already support it.