Too Many Fields in the Primary Key

Here’s another Database Design Mistake to Avoid

Once I saw a table that clearly had too many fields in the primary key.

Too Many Fields in the Primary Key

For every field that you add to the PK, the maximum possible number of rows in the table increases.

Maximum possible rows in the table =
(Unique Count of Field1) x (Unique Count of Field2) x (Unique Count of Field3)

In the case I saw, the key was intended to be a code, and the time, in seconds. For any one code, there should have been a maximum of 86,400 possible rows (seconds) per day.  In actuality, there would have been only a few dozen rows at the very max each day.

By including a third field in the PK, the maximum possible rows per day now became 86,400 squared.

Which is, approximately 7.4 billion rows, per code, per day, would be allowed!

The problem result:   Duplicate data.

Multiple rows with the same code and timestamp in the first two fields were inserted, that caused problems.

How did the rows get into the table?  Well, the primary key allowed it.

We had to manually delete the duplicate rows.  Which, defeated the purpose of automation.

Correction of Too Many Fields in the Primary Key

The correction:

Remove the second date field from the Primary Key, and make it, Not Null, if it is required to be populated.


2 Responses to Too Many Fields in the Primary Key

  1. Gary says:

    It’s not the number of fields, but the number of values. I’m working on a system where, because someone read that the ideal primary key is one column, they have combined multiple “fields” into a single column, like ‘AAAAAA-AAAAAAA-nn-nnnnnnn’. Oh, and they changed the constituents of the PK on one table so sometimes the old data has a primary key with three components and the new data has four components.

    This four component PK is AFTER recently updating every primary key in the database to remove two redundant components from the key.


    • rodgersnotes says:

      Dude I feel for you.

      Sounds like this someone is not responsible for fast responding queries. Try retrieving on data that will be now be found down the middle of that key. Without adding a whole bunch of function based indexes.

      Lots of code, CPU cycles, and people time necessary in either case. My belief is that a good design should eliminate code, not cause more of it.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: