Designing a database-driven PHP App? Don't Forget the Data!!

by Brian K. Jones

If you have a sourceforge account, and are on your way to becoming the best thing to happen to the web since Yahoo or Google, then I beg of you to put a call out for people who understand database design fundamentals.

Designing an interface with PHP is one thing. Designing an "application" is quite another, as it includes designing the architecture of the application, how the various components of the application will interact and communicate, and also how the data used by the application will be managed and stored. This last piece is a decidedly un-sexy part of application design, and is often also (and unfortunately) trivialized by developers.


Nick Lo
2006-04-30 16:43:16
"I don't really want you to learn how to design a database. No, really. I don't."

In that case I'd be interested to know which articles/books you would not recommend we read in order that we don't improve our knowledge of database design? I wouldn't want to accidentally learn anything which may just lead to troubling improvements in my design.

Seriously though, It seems as php developers we are moving well ahead with OOP, design patterns, unit testing, etc and there are plenty of sources of information for these, however, I've not yet been able to find a definitive selection of database design books that focus more on the application side rather than the database admin side, so I'd be interested in any recommendations.



Bart van Kuik
2006-05-01 23:43:27
0_o, I couldn't agree more. I've done a lot of Oracle projects and have gotten exposed to a good database design, however I have to say that it takes discipline to see that your database design contains a flaw and then actually fix it along with the PHP code... In the middle and long run, it of course pays off a hundred times.
jeremiah foster
2006-05-02 08:50:20
Good database design is incredibly important if you want your application to scale. This means you also probably want to compile the database yourself, optimize its variables beforehand, practice good SQL coding standards, load balance, replicate, cluster . . .

LAMP has become really easy. But doing it well is hard.

Javier Sanchez-Galan
2006-05-02 23:58:24
Im not a guru in this kind of matters, but i agree with 0_o and Jeremy Foster.

"LAMP has become really easy. But doing it well is hard.".. nowadays everything is auto!

I think what most people fear of db design is relational algebra and normal forms, they are complex, yeah, but if you use them it will be way easier to design a good db.

Ryan Bates
2006-05-03 10:37:15
Great article! There's only one thing I don't quite agree with:

"Here's the story: If you're a PHP developer, I don't really want you to learn how to design a database. No, really. I don't. I want you to write PHP. There are few people who do both things extremely well, because both take a good bit of time."

Although it may be true that few do both extremely well, every PHP developer should learn good database design. If you compare the two (code design and database design), they both have the same basic principles: use consise yet easy to understand names for variables and methods, remove duplication in your code through refactoring, don't let design suffer through premature optimization, optimize later when you know what needs optimizing - don't guess. These principles, and many more I'm sure, are also shared with good database design.

Learning database design helps the developer see these core principles in a simpler context. I am certain that learning database design has improved my code design immensely.

2006-05-17 14:12:59

Your comments are sad, but true. Data modeling is hard, so folks don't want to do it. I, too, have never seen a properly normalized database (aside from small examples) and I keep getting bitten by this. Unfortunately, as databases grow, it's possible to have different database schemas for the same data and yet still have some disagreement over which is "more correct" (one database merging project I know of broke down when the DBAs couldn't agree over how to define "color" in the database). And changing business rules dictate different data layouts, further compounding the problem.

One example is how a customer address is to be stored. This address is almost always kept in the customer table. This becomes problematic if you want to query all addresses for which you don't have a customer. Businesses are constantly buying address lists because they want to send bulk mailings but don't know which addresses they have real customers at. Also, how do you represent that address? Should the street name be in a different table from the street numbers? Theoretically, they should. Recently in Portland we had a street renamed from "Front Street" to "Naito Street". In a properly designed database, you would only have to change the name of the street in one place. I've yet to see a database which implements this, though.

There are all sorts of little design issues which make designing a database far harder than folks think. The address example is where two performance issues come in. One is the programmer overhead of constantly writing more complicated queries and the other is the performance of the queries themselves. Almost always you'll see "123 Washing St, Apt 3" in an "address" field rather than breaking those down to their component parts. Further, while we should always design our databases correctly, real-world data doesn't always map cleanly to the relational structure in the database. Anything which requires a recursive definition shows how painful this can be.

Bah. I'm rambling (I tend to do that). What I meant to say was "thanks!" I'm very glad to see that some programmers are conscious of this issue. We need more like you.

2006-12-09 17:05:46
intertidal zone food web
2006-12-15 07:53:16
Good points. I'm using Splunk Base myself and shared some of these concerns as I began submitting samples. In the end, my fears were allayed and I ended up submitting many raw log samples with out any annonymization. On the other hands, certain logs were obfuscated while preserving the vital bits. This approach provides maximal benefits to the community while still preserving my sometimes paranoid attitude towards systems under my care.


2006-12-19 18:38:46
Actually, it's almost always true that newly written code, at least in the short term, contains more bugs than legacy code. This is purely because the legacy code has had so much testing and so much time fixing corner cases. Rewriting the code from scratch loses a large amount of the knowledge that is gained from all that maintenance work on the legacy code and it is rarely documented accurately or completely enough to avoid the loss.

2007-02-05 17:37:44

2007-02-13 00:25:05
2007-04-13 00:36:45
great article! bytheway, database design is a fun and interesting practice, but i guess it does depend on the person....

and to OVID -

i agree that the correct way to store an address is a situation where there are more than one correct answers to, but storing the numbers in a table separate from street names would never be done.

Also, why wouldn't you just:
update customers
set address = REPLACE(address, 'Front Street', 'Naito Street')
instead of over-normalizing the address just because you might have to change street names? i dunno, but seems like more work to me... ;-)

2007-05-13 09:11:53

John Geewax
2007-05-23 10:36:08
The article brings out a great point: that people don't design their databases for flexibility. I agree that in most cases they should, but it shouldn't be forgotten that when one spends too much time focusing on the generality/flexibility of a database, he/she usually won't get around to building the application. The hole gets deeper due to the fact that as the database gets more flexible, the queries become more complex, taking more time per feature. This leads to a time mis-calculation by 50% when deciding on the level of generality of a database.

I have become a huge fan of short iterative development with many many prototypes and alterations. And this is not just limited to the software, but includes the database design as well.

With PHP's reasonable OOP support, using the correct layering of code, the changes to the database need to be coupled with changes to the second tier of data access (the first being a database query abstraction layer) but each method/function above this second layer should not need to be altered at all except adding new methods that the new level of generality allows.

That being said, it's important to design a database to an appropriate level of generality via the normal forms, but "appropriate" is dependent on the current stage/iteration of the software, not what the software is envisioned to be one day far in the future.

2007-09-05 08:41:21
2007-09-08 18:08:09
2008-03-21 14:51:29