CD Baby rewrite : changes in database structure

by Derek Sivers

When switching CD Baby from MySQL to PostgreSQL, it gave me the perfect opportunity to fix some things in our database structure that needed changing. Among many other changes, here are some interesting ones:

ALBUMS ARE NOW ITEMS:
OLD: one big table called "albums", that has everything we sell
NEW: one common table called "items", with sub-tables depending on the type of item


CD Baby was written to sell only one thing : CDs.
When people ask if we can sell their T-Shirt, we say no.
When they ask if we can sell digital downloads, we say no.
Can we make a bundle of albums sold for a discounted price? No.
Gift certificates? No.

My "albums" table was the rusty axle to this wheel. Everything revolved around it. Every line of code ever written was dependent on this "albums" table and the way it worked.

So - I imagined a future CD Baby that could sell many different types of items, looked for the basic common things that they ALL have (name, price, description), and made it an "items" table.

If item is an ALBUM, it pulls in details from the "albums" table. (An album can be any format: CD, vinyl, download, whatever. It's a collection of songs.)
If item is a BUNDLE, it foreign-key joins a list of other items, and gives it one combined price.
If item is MERCH, it's the "merchant ships it" thing, where we collect the money, the vendor ships it directly to customer, then we pay them after it's proven shipped.
If item is DOWNLOAD, it foreign-key joins to a song: a piece of music. (A download album, then, is a BUNDLE of DOWNLOADs.)


ADDRESSES STAND ALONE
OLD: customer had address, city, state, zip, country. invoice had address, city, state, zip, country
NEW: table called "addresses", linked to by not only customer but lineitems


I always thought it was kinda cool that the big online stores remembered my multiple addresses I've ever used (a one-to-many relationship between customer and addresses). Then I realized that this same "addresses" table could be used to not have an address per-order, but rather an address per-ITEM inside an order!

Yeah I know this sounds obvious, but it's fun figuring out this shit on my own, with no mentor or instruction book telling me this is how the big boys do it. Very satisfying.


SOUNDS-LIKE: A LIST OF FAMOUS ARTISTS
OLD: artists tell us they sound like "bob dylan ani difranco and early zepplin" in a single text field
NEW: album_soundlike joins album_id to list of id#s of famous artists


This is going to be one of the biggest converting challenges.

For 6 years, I've asked 80,000 artists to "tell us three famous artists people say you sound like". This info was entered into a text field, however they gave it to me. Some would even write, "I don't sound like nobody" or "Chili Peppers back before they got lame". When people would search CD Baby, I'd just do a full-text search of this field to see if it returned what they were looking for. Now it's time to try to organize that data, so we really know which famous artists the artists are referring to.

I got a list of famous artists from the brilliant Robert Kaye at MusicBrainz. I stuck these 131,000 artists into a database with only their name and an auto-generated id#. Then a join table to link the album's ID to the multiple IDs of the referenced artists.

UPDATE: passed this part of the project to Robert, directly, since he's the king of music metadata. He's found a great solution using PyLucene that he will open source, too. More on that later, I guess.