On the update of primary keys (PK)

by Stéphane Faroult

It's funny how people tend to side naturally with one school of thought or another. Database topics often polarize practitioners, and one of the issues that are hotly debated happens to be the 'natural key vs surrogate key' question, that is whether the primary key should have some significance outside the information system or merely being an internally generated number. I don't see any reason why I would call 357914358 something that I could call 'SPADE', and I use natural keys whenever I can. But a friend had a very interesting reaction to my mentioning that primary keys should not be updated:



We don't update PK's because of the inherent difficulty of propagating changes to related tables (assuming that we are using natural leys). This is one reason people go with surrogates. But what uniquely identifies a row can and will change. For example, what about a Companies table, and a company goes through a name change. Is the PK partly based on the name of the company? ... This idea that we don't update PK's seems rooted more in the difficulty of updating PK's when natural keys are used.



Cough, choke! There seems to me to be a confusion between the identification of a row (what the primary key is about) and the subtly weaker condition of distinguishing one row from another. As it happened, my company changed its name some time ago. Why could we keep our banking accounts without having to close and reopen them? Why could we keep our contracts running? Simply because the registration number attached to the company when it was incorporated didn't change. The name of a company distinguishes it from another. But it's not what truly identifies it. Some might argue that a registration number is a surrogate key of sort; and indeed it might be considered a shorter alias
for a company that was created under a given name at a given place and date by some particular people and as a particular type of company. It's a surrogate key, but it's rooted in the real world. It seems to me quite acceptable to use a surrogate key to identify a company, as long as we
clearly understand that it's merely a short-hand for information such as incorporation details that we have no use for in our model otherwise.


If you were to change what truly identifies something, how would you know that it is the same thing? There is no way to distinguish an update from a delete followed by an insert. "Updating a primary key" implicitly acknowledges that you have some out-model knowledge that the before
update and after update values truly represent the same item. What truly defines the row isn't in your model. Don't blame the theory, blame your model.