MUCK – Massively Unified Code-Key, Generic Three Table Data Model


Another Database Design Mistake To Avoid.   Actually, one of the worst DB design mistakes you can make.

One of the worst things you can do to a database system is to make a generic data model of three tables for the whole system. A variation is to use three tables to combine multiple lookup tables into one.

Instead of standard tables and columns, three tables are used for everything.

One table each for:
entity/table
attribute/column/field
values of the data

MUCK - Massively Unified Code-Key, Generic Three Table Data Model

Instead of a number of columns in a row, you now have a single column, and multiple rows.  One writer, Don Peterson, referred to this method, as MUCK, for Massively Unified Code-Key.

This has proved to be such an incredibly bad idea that it has already been discussed online in a few places.  Unanimously the articles state why you should never do this.  A few well written articles can be found here.

http://www.simple-talk.com/sql/database-administration/five-simple–database-design-errors-you-should-avoid/

http://www.simple-talk.com/sql/database-administration/ten-common-database-design-mistakes/

http://www.projectdmx.com/dbdesign/lookup.aspx

Tom Kyte also wrote about it on this in “Effective Oracle By Design”, on page 35.  “But how does this model perform?  Miserably, terribly, and horribly.”

All these articles reach the same conclusion, and give even more (different) reasons why you shouldn’t use this idea.  Do check them out.

The results of MUCK:

Awful Performance:

This kind of system, is actually DESIGNED to go slow.  For even the simplest of queries, you now have to join at least three tables.

Pivots:

Another reason for the bad performance is that each value for each attribute, is stored in a separate row.  So, if you have five attributes, the data will be stored in five different rows.  To get them all back on a single row, you have to pivot the data.  This is of course more work than if all the data was on a single row.

Multiple Inline Views:

Since each piece of data is on a different row, if you have more than one field to retrieve, you have to create an inline view, for each field!  So, to retrieve five fields with their appropriate values on the same row, you actually have to create five inline views!  The queries get to be really big and complex, fast.

Data Type Problems:

To make this system work, the field that holds the value is made of type VARCHAR.  This will hold a representations of other data types, such as a number.  But it will be stored in the character datatype “8”, not as a number, 8.  At some point, you will need to convert it from a string, back to the number.  But eventually, a non numeric value will sneak in, such as “A”, that can’t be converted back into a number.  The query will fail, and you will be debugging, and fixing it manually.  More people time.

Really Complex Inserts:

For an insert on a normal system, you would insert one row with the values for (say) five different fields.  To get the equivalent in the MUCK system, you would need to do five separate insert statements!   More people time.

People Time:

Lots more people time is necessary for even the simplest of queries.  For a simple query, you will sometimes have to query the lookup tables first, before you even compose your SQL statement, so that you can find the codes to filter on.  “What is the code for ???”

Too Confusing For Even The Brightest:

Once my department had a task to double check a lot of data from these tables, and compare them to another system.  With regular tables and columns, we would just select the fields, and got it done in a few minutes.  But we spent a number of hours trying to ensure these complex queries were correct.

And yet, the three of us came up with three different answers!  Which one was correct?  This was in spite of the fact that all three of us knew SQL really well.  How’s that for accuracy?

Cost:

What is the dollar value of the people time to use and maintain this system?  Compared to whatever advantages there were for doing it in the first place?

It’s Cobol:

If you have to look at other fields, or run processing to know what a particular row or field represents, you are basically running a Cobol system.  More on Cobol systems in another post.

Correction:

JUST SAY NO!     Standard database design practice please.

About these ads

7 Responses to MUCK – Massively Unified Code-Key, Generic Three Table Data Model

  1. rodgersnotes says:

    I have since found out that MUCK is also known as the Entity Attribute Value (EAV) Model, or Open Schema.

    This wikipedia entry describes it.
    http://en.wikipedia.org/wiki/Entity–attribute–value_model

    But remarkably, the article comes off as if EAV is actually a model to consider! It does mention the problems of pivoting, and comes up with three very costly solutions to it. A separate warehouse. Materialized views. And in memory data structures. More complexity upon complexity.

    But still no real added benefit to just doing it right in the first place.

    Amazing how such an incredibly bad idea is made palatable and sold.

  2. michiel says:

    “doing it right in the first place”… Have you read the article? Medical symptoms come in the range of 300,000 items, each required to be a column in a table describing all symptoms. So to track possible symptoms of a patient, we’d have a table structure like
    PATIENT(PatientID Primary Key, S1, S2, S3…….)
    Most modern databases would collapse on simple queries or inserts.

  3. rodgersnotes says:

    Uh, yes I read the article.

    Actually, I wrote both the article, and the response.

  4. michiel was referring to whether you have read wikipedia’s article, not yours. EAV and other metadata-based techniques can indeed be overkill and become an anti-pattern, but in some cases they provide solid trade-offs: flexibility, backwards compatibility, etc.

    • rodgersnotes says:

      Re: clinical example on wiki.
      An interesting problem. A few thoughts.

      One idea is that perhaps RDBMS is not the best tool
      for the clinical job. Objects perhaps? Or, some of the NEW/NOSQL databases? I’ve never tackled 100K + columns.

      But if EAV was so difficult for a few dozen columns,
      how much more difficult will it be for 100K?

      —-

      An RDBMS idea would be to not have columns at all.
      Certainly not 100K or more.

      Instead, have an intersecting table for the M:N relationship

      Patient
      Patient_Symptom
      Symptom

      In this way, you use the least amount of storage.

      The rows might need to be pivoted from rows into columns.
      What could be problematic: after pivoting, what is the column name to search on?
      Temp tables perhaps?

      —-

      The 100K columns are 1:1 relationships.

      This could be a good case to split up the many columns into multiple 1:1 tables.

      So, symptoms specific to cancer, or columns that ruled out others, would go into their own tables, still preserving the 1:1 relationship. They would be only queried, inserted, updated, or joined when occasionally needed.

      I do recommend similar for BLOBs:
      http://rodgersnotes.wordpress.com/2010/09/19/combining-identifying-data-transaction-data-and-lobs-in-a-single-table/

      I would like to see comparisons of EAV versus different methods.

      As I’ve said in other posts, it’s very unusual for companies to test multiple designs and prototypes first. Usually, once they have “the design”, they start throwing more and more code on top to make the awful beginner prototype work.

      See my series on Database Design Mistakes To Avoid

      Also, see the technical limits as to the number of columns in a table.

      Discussion at DBMS2.COM, on various databases
      http://www.dbms2.com/2011/03/13/so-how-many-columns-can-a-single-table-have-anyway/

      Oracle 10g has a 1000 column maximum
      http://docs.oracle.com/cd/B19306_01/server.102/b14237/limits003.htm

  5. rodgersnotes says:

    That’s a really strange post.
    http://sqlblog.com/blogs/aaron_bertrand/archive/2009/11/19/what-is-so-bad-about-eav-anyway.aspx

    The guy actually recommends going to an EAV model because he does not want to add a column to a table. Huh?

    He should read my series here on DB Design Mistakes to Avoid.
    Starting with the first one:
    http://rodgersnotes.wordpress.com/2010/09/14/database-design-mistakes-to-avoid/

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: