Introducing a number of posts on database design.
A System’s Database Design is like the Foundation to a Building
Database Design is one of the most critical areas in databases. It is like the foundation to your house. If you have a bad foundation, you either cannot build at all. Or, you have to do all kinds of things to compensate for the bad foundation.
A great example of a bad foundation is the leaning tower of Pisa.
The tower was not built with the right foundation for the local clay. Three floors were built, and the tower began to lean because of the weight of the building. Rather than tearing down the structure and building it again properly, they just decided to wait 100 years for the clay to settle. And then they built four more floors, parallel to the ground, attempting to compensate for the bad angle of the foundation! But more problems arose.
“In 1838, the architect Alessandro Della Gherardesca decided it was a perfectly good idea to dig out a walkway around the tower so that visitors could see its carefully crafted base. Predictably, this only exacerbated the tower’s lean. Benito Mussolini was the next to try his hand at straightening the tower. He ordered the foundation to be filled in with concrete, but the concrete sunk into the wet clay and the leaning tower continued its prolonged descent towards the ground.”
“Several plans have been tried over the years to stop the tower from falling. Some of them have been almost disastrous. In 1934 an Italian engineer drilled 361 holes into the base and filled them with mortar. The tower promptly leaned over some more. In 1993, 650 tons of lead were hung from the North side of the building to try and stop the lean increasing. For a while it worked.
In 1995, they decided to try and increase the foundations under the South side of the building. They froze the ground using liquid nitrogen, to stop it moving, and then started to remove stones, so they could insert metal rods. What they didn’t know was that the stones they were removing were part of the original foundation of the building. That is the nearest the tower has come to disaster. In one night the lean increased as much as it normally increases in two years. They quickly added another 250 tons of lead and decided to rethink the whole thing.
At this point everyone was just about ready to give up. Then a British engineering professor came up with yet another idea. His plan was to remove ground from under the high side, instead of trying to add ground under the low side. In 1999 work began, and was done very slowly, so that the building wouldn’t get a sudden shock. At the beginning of June 2001, the work was complete, and the tower had been straightened up by about 16 inches, which returns it to the position it held in 1838.”
Trying to make an inappropriate design, work:
So many database designs, that I didn’t build, but have had to struggle with, are so much like the Leaning Tower of Pisa. Someone charged ahead and made a rough “design”. Then rushed to put some objects on top. Because the design didn’t work right, they had to use lots of code to correct the design. And then more code. And then someone comes up with a bright idea to fix it. That makes things worse.
DB Designs Are Hardly Ever Corrected Or Refined:
The unfortunate thing in database designs is that unlike programs, they are rarely, if ever, reworked. If a program doesn’t compile, the code is corrected until it does compile. And then if the program doesn’t produce the right output, the code is again corrected until it does. I’ve seen one relatively simple program be refined and put into production at least eight times.
The database design is usually thrown up as quickly as possible, because there are a whole bunch of programmers, business analysts, and managers waiting for “the design”. I’ve even been asked, in interviews, “how do you design for speed?”!
Immediately objects are created on top of the database design. Views, procedures, loads, java code, etc. As time goes on, everyone gets a better understanding; the requirements, the data, the user refines what they really are looking for. An architecture actually emerges. You would think that the DB design would be refined and corrected. Like programs are.
But I’ve never seen that. Why? For a few reasons. Source code would need to be changed to accommodate the different table and field names. Most managers and programmers see this as too much work.
Other reasons are emotions. Like sunk costs in an old car, they feel they have to get their money’s worth out of it now. They have already “invested” in their bad design. Other emotions are probably pride, “look how ingeniously I added all this code to make this strange design work”. If you correct the design to simply the code, the programmer doesn’t have such complex code to show off now. Or laziness. “It’s too much work”. But the amount of work is never quantified. Even if it is only 5 minutes of a simple search and replace. Or politics. One database expert is outnumbered by 5 or more java people.
The Design Is The Database:
In 1992, I learned DB Design on a summer job using DataPerfect. DataPerfect was a great product to learn DB Design with. Really, why? There was essentially no source code. The Design WAS the Database!
If the design didn’t work, then the system didn’t work. So, I didn’t have the “luxury” of a whole bunch of source code to “fall back” on. The design MUST work.
DataPerfect also didn’t use SQL, which is probably one factor that lead to its demise. So, while most programmers today struggle with learning the paradigm of SQL sets of data, I didn’t have to. My whole focus was on the design.
I was the whole computer department that summer. The architect. The designer. The builder. I had no internal politics to worry about.
The system I built was a Contact Management system. Every time someone contacted the organization, they would keep a record. Some of what I thought were good ideas didn’t work. One example was a page I made to list all the key people in an organization; President, Comptroller, VP of ???, etc.
But as I got the data to enter, the data didn’t conform to my preconceived ideas. Few companies had the positions I’d thought of. The business cards had strange titles I had never heard of. Interestingly enough, the titles kept getting longer and longer. I started with 15 characters for the title. Then 20. Then, 30, 40, and eventually 50. And still one title didn’t fit in a 50 character field. Executive Vice President of blah, blah, blah. One thing I concluded that summer was that, really important people have short titles. Such as, president.
So, I refined the design. I exported the data I’d already entered, changed the design, and reimported it. Repeat. Rinse and Repeat again. Through the whole life cycle. Until the system was perfected.
As I did this, I learned database design concepts intuitively by experience. One to many. Many to many. Cool! When I returned to university, I studied intro and advanced database, and aced the classes.
The main point being, that summer, the database design was rebuilt probably dozens of times. For a relatively simple system.
Sometimes I wonder how many database designers have had been able to learn from that kind of experience. Since working in enterprise environments since 1995, I have never seen database designs redesigned to any significant degree. As bad as they are, they remain set in stone, like the Leaning Tower of Pisa.
A Good Database Design Eliminates Thousands of Lines of Code:
Given my experience, I’ve concluded that a good database design should eliminate thousands of lines of code. Not require thousands of lines of code to make it work. In triggers, procedures, middle tier code, front end code, etc.
Database Design Mistakes to Avoid:
Since I started working, I’ve usually come into companies that already had DB designs in place. Bad designs. And really pathetic designs. Designs that I knew were clearly wrong from the moment I saw them. Here, I will document some of these mistakes. And what the ultimate results were.
These mistakes are not necessarily going to be defined from a theoretical standpoint such as “violating third normal form”. Although there may be a theoretical basis for the same conclusion, it is a mistake for more immediate considerations: inaccurate results, cartesian products, really slow performance, dupe data, etc.
Indeed, as I saw these mistakes over the years, I realized that a number of them don’t fit into the theory. Or, the theory doesn’t address it.
Hopefully, by seeing them, you will benefit from the mistakes of others. And learn enough to not do them yourself next time.
I’m sorry that I have taken so long to get around to this. I had hoped to present these concepts at an Oracle User Group meeting first, but have not had the opportunity yet. If you would like me to present, please contact me.
To better systems.