Graying MySQL, and MySQL learns a second language (early conference report )

by Andy Oram

Related link: http://www.mysqluc.com/



MySQL is graying in several metaphorical ways. Of course, it is simply
getting older--aren't we all? But it is by no means over the hill.
More significantly, its adherents are getting less colorful and
reflect instead the grayness of the corporate settings it is
conquering. Finally, MySQL is graying the distinctions that separated
it from Oracle and other heavy-duty database engines. MySQL, in short,
is becoming conventional.



The early achievements of this disruptive technology were to bring a
high-performance relational database down from the top shelf where
only those of means could afford it, and put it in the hands of
students, enterpreneurs working out of their homes, and modest web
site developers. This was a revolution dubbed situated software by Clay Shirky. Although MySQL was already being used by sites that
could afford more expensive databases (and the computer systems and
expert administrators who came in tow), these did not drive its
initial popularity.



Now MySQL AB has built a formidable marketing machine and carried
their product into the database mainstream, following a path similar
to Linux. Their trappings are starting to evince familiar themes. They
have salespeople in at least a dozen cities around North
America. Their new support and update mechanism, MySQL Network,
reminds me of a similarly named support system from Red Hat. MySQL's
development of an online FAQ called a Knowledge Base, and the slogan
"MySQL Everywhere" plastered all around this conference, are
reminiscent of another large software vendor.



But MySQL AB has not forgotten the little guys who want a DBMS that
runs lean and fast, with near-zero administration. These users will
probably continue to be its largest base. Significantly, under the
conventional trappings I mentioned, I believe MySQL AB is still
structured in a fundamentally different way from a conventional
propriety vendor, and is still behaving like a network of brilliant
independent software developers. They have always listened closely to
their users--you can see that at their conferences, where dozens of
developers turn up in distinctive shirts and attract flocks of
petitioners for new features--but they now are listening to paying
customers in the same intense, investigative manner.



For instance, I saw one of their leading engineers walk around an
evening reception recruiting representatives from international
customers to sit in on a session about internationalization, just so
he could hear their perspective on some problems he had been told by
other customers.



It takes a certain financial and time commitment to attend a
conference, so for those who pony up the money to do so, the theme at
this one is "bigger and better." Sessions on Java interfaces,
clustering, scaling, high availability, and replication decorate the
calendar for the next few days. One panel is even called "Challenges
in the Enterprise."



And what are the newest features MySQL is pushing hardest? There are
no breakthroughs here (and I wouldn't expect any, because relational
databases are a mature area in a research sense). The announcements
focus on things that competitors have had for years: stored
procedures, triggers, views. MySQL is not leading the conquest of new
territories. Rather, MySQL is catching up. That's something they're
proud of, and rightfully so.



I attended one session last night on a feature that will implement a
tiny snippet of the SQL standard, XPath support. In effect, MySQL,
which has always understood the SQL language, is learning a second
language--not a natural language (although MySQL offers increasing
support for character sets and other internationalization features)
but the complex world of XPath.



I find this feature an odd way to support XML. Most XML users carry
out XML/database interaction by using Java or some other programming
tool to break down the XML into constituent pieces of test and store
those pieces in a database structure that mirrors the XML. But SQL's
XPath support buries the XML without alteration into a field in the
database.



The idea of XPath support in the database is that you start by storing
a string such as

<p>Why do <em>you</em> want to represent <em>structured text</em>?</p>

bodily in a text column. This text column can be any standard text
datatype in SQL (although MySQL will add a special XML tag eventually,
to support validation and some optimizations).



In itself, this doesn't help deal with XML. But MySQL will also
provide a couple functions such as ExtractValue and UpdateXML that
manipulate the XML with XPath queries. You could tell it to extract or
change, for example, the second <em> entity in the string just
shown. Full text searches can reduce the time it takes to search large
collections of XML by two orders of magnitude, in comparison to
database queries without indexes.



The design of the XPath support is oddly disconnected from the
traditional structures of a relational database. As already shown, the
storage model jams all the XML into a single column, so that the XML
structure is handled independently from the schema of the table.
Furthermore, an XPath query that returns multiple strings from
different parts of the XML document concatenates them together,
space-separated, in a single row. I would have expected them to be
granted individual rows in the results.



There are many uses for XPath support in a database. One could extract
and display all the titles of different documents. One could run a
traditional SELECT to retrieve data from other columns or tables and
join it to XML content. One could find everything within
<price> tags and let the database perform some
calculations such as averaging. The more XML processing you can do in
the database, the less data has to be sent over the wire to the
client.



This new MySQL feature--not planned until 5.1 or even later--is
probably less useful with data-crunching XML (which has many small
pieces of text within multiple tags) than with documents, which are
flatter and have a high ratio of content to tagging. However, one
participant in last night's BOF suggested the feature could be applied
to storing SOAP queries too.



MySQL's turn to the mainstream is being reciprocated by its intended
audience. Attendance at yesterday's tutorials was impressive; a couple
tutorials sold out, and the halls were filled with people at break
time. Today's sessions and exhibitors will draw even more.


9 Comments

merlyn
2005-04-19 07:25:17
MySQL is VHS to Postgresql's Beta though
I'm really sad to see so many people still pushing MySQL, which was definitely the right choice in 2000, and skipping over PostgreSQL, which has become the superior choice in the last few years. {sigh}


Once again, because of marketing, the marketplace loses.

adamsj
2005-04-19 08:05:48
MySQL is to DW what PostgreSQL is to OLTP
I agree with you fifty percent, merlyn. PostgreSQL is the clear choice over MySQL for any sort of OLTP. Only now is MySQL adding features which are needed for OLTP.


However, those same features are of little or no value to MySQL as a DW platform--they're cruft. Why in the world would a properly designed data warehouse require triggers?


I understand why every RDBMS is now adding such featurs--after all, Oracle has them, so everyone else needs them, too. I wouldn't use Oracle for a DW on a bet, though. It's not capable of scaling to a sufficient size for a large enterprise, and it's not a good vslue for a smaller business. I do understand why MySQL, partnering with SAP, might feel the need to match Oracle feature for feature.


That's a shame. MySQL is built nicely for high performance in the DW space. Adding views was absolutely necessary, and even updatable views have their place. (Last year, I had to use updatable views to get around an incredibly stupid, incredibly popular etl product's inability to do what we needed.) But most of what MySQL has picked up in database features is unuseful for DW.

chromatic
2005-04-19 08:56:23
How Vague the "Superior"!

Once, I saw someone writing a small, single-user application to manage recipes. He used MySQL. "My word, man!" I said. "You need updatable views, triggers, transaction logs, and hot-swappable disks! What if by some quantum miracle you split into multiple users and try to update the database simultaneously? What if you sell this application to a bank sometime? Don't you care about the data integrity of customer financial transactions?"


Sadly, he didn't listen.

danielhanks
2005-04-19 11:52:20
MySQL is to DW what PostgreSQL is to OLTP
"I wouldn't use Oracle for a DW on a bet, though. It's not capable of scaling to a sufficient size for a large enterprise".


What???


Can you elaborate on that statement? What do you mean by scale in that context?

adamsj
2005-04-19 12:07:35
Probably I'm showing my bias, or my limited experience
I've had the good (or bad) fortune to never work on a data warehouse of under a terabyte. If you want something really big like, oh, Southwestern Bell has, you need DB2 or Teradata (SWB's choice).


I don't work extensively with Oracle, but in my limited experience, it runs out of horsepower somewhere near a terabyte. Maybe the newer parallelism features will fix this.


One session I really wanted to attend this year was Multi-Terabyte Data Warehouse and MySQL. Having seen the challenges involved in getting systems of that size to perform efficiently, and thus being a skeptic when someone claims they can do so, I'd really like to hear what he's got to say.

danielhanks
2005-04-19 13:45:18
Probably I'm showing my bias, or my limited experience
I see. Where I work we have an 11T data warehouse running on Oracle. I don't work directly with it, but it does provide daily reports, so it must be working ok enough.


Just my $0.02.

adamsj
2005-04-19 14:20:28
Is this a single installation, or do you use datalink to go from machine to machine?
I can't argue with a working installation, and my Oracle skills are rusty at best.


But I'm curious--can a user open up SQLPlus and (assuming appropriate privileges) join two arbitrary tables without intervention by an administrator? If not, by my lights it's not a data warehouse.

rpbourret
2005-04-20 23:25:04
XML support makes sense
Storing XML in a single field whose data type is XML makes sense, just as breaking it into relational pieces makes sense. It really depends on the schema of the XML.


Lots of data-centric XML maps easily to relational tables. On the other hand, document-centric XML does not map well to relational tables, nor does some data-centric XML, such as when documents can contain arbitrary XML.


The real question is how the XML data type and XPath processor will be implemented. One assumes that the XML will be heavily indexed and XPath queries executed against those indexes -- parsing the XML documents at run time would be a performance disaster.


For what it's worth, the other major relational databases (DB2, Oracle, Sybase, SQL Server) have all implemented / are implementing similar functionality, although they are supporting XQuery instead of just XPath.


And even thought it would be nice for MySQL to natively transfer data between XML documents and relational tables, there are plenty of middleware products to do this.

Tabooyah
2005-12-16 11:00:21
Is this a single installation, or do you use datalink to go from machine to machine?
I worked as a business intelligence analyst for a number of years where one of my primary duties was to perform not just 2 joins, but up to a dozen joins in order to collect data and format it into meaningful reports. Oracle has a number of great features that allow you to perform adhoc queries against million and billion-row tables very quickly. One such feature is materialized views which allows you to precompute dimensional summaries much like a MOLAP tool and store them in the database which can be access and just like a normal view. Another nifty Oracle feature is table partitioning which allows localized indexes and logic data grouping which cut query times down significantly. Analytic workspaces are another great Oracle DW feature which require more heavy lifting at the application layer.


Anyone claming that Oracle isn't a well-suited DW application obviously has little experience with the product or isn't using it correctly.