Denormalizing

by Stéphane Faroult

I have been recently involved in trying to improve the performance of an application that primarily handles messages (more and more of them). It is a kind of state machine. Three different types of entities are managed (one of the types happens to be sets of another type), and messages are received that tell what is the most recent status of each entity. So far so good, except that everything revolves around the current status of each entity, and that the main bottleneck seems to be queries that hit repeatedly a table MESSAGE_HISTORY(msg_id, entity1_id, entity2_id, entity3_id, status, changed_by, timestamp), finding out which is the most recent status for a given entity1_id, entity2_id or entity3_id (as you may have guessed, only one of those contains a value in each row), associated with the greatest msg_id for the enity in question.
The solution that looked best was what was presented to me as "denormalizing", and associating its current status to each and every entity.