More Architecture Mistakes To Avoid
One thing that should be blatantly obvious to anyone who has ever had to fix some code on the weekend, or in the middle of the night, is the importance of excellent error trapping and logging routines.
When you need to fix something, now, you don’t want to be reverse engineering the code or the whole system. You want to know what happened, and what to do. Now.
You need what I call, instantly digestible error messages. I don’t want to hunt it. I don’t want to kill it. I don’t want to clean it. I don’t want to cook it. I just want to eat it. I want both the cause, and the solution, in the same message.
The error trapping routine must be totally reliable. Let me list some strange things that I’ve seen in production.
Complex and Useless:
I mentioned in another post, that one place actually split the error lookup table into two tables, vertically! If that wasn’t enough, there was no way that errors that hadn’t previously been imagined, would ever make it to the error log. The only errors that could make it to the error log were program logic errors, not underlying Oracle errors.
Any DBA should get chills at the thought. There are all kinds of strange errors that Oracle can suddenly throw at you such as a mysterious Oracle 600 error. There is no way in the world that you can expect, all the unexpected.
The third problem with this alleged error trapping routine, was that they didn’t allow even adding a DBMS_OUTPUT debug statement in the middle of the program to say what was happening.
The fourth problem with this “system”, was that there were in fact, get this, SEVEN OBJECTS to do the error logging. I can’t remember them all, but I do remember counting seven of them. Two lookup tables. At least one logging table. A number of procedures. Nested deeply.
I concluded it wasn’t possible to add more complexity. How about you? Can you think of any more complexity to add to the error trapping?
Nesting Error Trapping Routines, aka Not Doing Structured Programming:
In my early days I once went for an interview. Until that time, I’d mostly worked with ProC, not PLSQL. During the interview, they kept asking me, “What about exceptions?”. I told them that I used Proc’s WHENEVER SQLERROR, and how I used it. But they kept asking the question, which I thought was pretty odd.
When I returned, I looked up PLSQL’s EXCEPTION error handling. It was pretty simple. I kept wondering, did I miss something?
I didn’t get the job. But later I realized what their problem was. It wasn’t how to trap a PLSQL error. It was that they were not doing standard structured programming. They were adding more and more code, into the error handling routines themselves. Nesting them deeply into spaghetti, without GOTO statements. Was this procedure called from the main body of procedure x, or the error trapping routine of procedure y? Or the error trapping routine of procedure z? Or something else? No wonder they had problems.
Writing The Error Routine So It Itself Crashes:
At a big financial firm that I was at, there was a lot of really bad code. One program kept bombing. Not only that, but the error trapping routine itself, bombed, as well as the procedure.
At the beginning of the PLSQL routine, there was:
Then, a procedure was immediately called, which had a commit in it. This of course obliterated the Savepoint, created just a fraction of second before.
There was no error trapping through the program, only a single catch all at the end:
When Others Then
Rollback to A;
When the program gave us trouble, we never knew the cause, or the solution. We weren’t allowed to change code in production. But moving new code through dev and test was problematic, because those environments weren’t maintained, and weren’t running.
One Catch All Error Routine For Everything:
The other mistake is just to use one error handling routine for everything.
When others then
This “works”, but usually leaves lots of questions. Just where in the program did it bomb? What were the parameters passed in? What value was it processing? What row was it on? How do I recreate this error? What’s the cause? What’s the solution?
No Error Trapping At All!:
But the worst thing you can do, is to have no error trapping at all!
This was standard practice at one place. There were a number of application log tables, to log who logged in, and who ran what program. Developed by the front end team. But there were no error trapping routines, and not even a error log table.
I asked one of the people who built it why there was no error trapping. Her response: “We don’t think we’ll get errors”!!!! I said, “What are you, psychic!?”
The errors were not handled cleanly. The programs just bombed. Constant reverse engineering of the crappy code every time there was an error. Poking through both code, and input data. Lots of wasted people time.
Learn error trapping, and do it. Create reliable error handling routines. Do not put more and more complexity on this aspect which needs to be simple and reliable.
Give meaningful, instantly digestible error messages for this fast food culture.
In the error message, list what package and procedure is creating the error message, and where in the program it is. And, the parameters passed in, the values of the variables, what row the program was on when it stopped.
Don’t have just one catch all error trapping routine. Put error trapping around the important parts of the program, and trap the problems there. So, if it’s really important that the big load finishes correctly before subsequent processing occurs, wrap that part up in an error trapping routine, and abort the program if it doesn’t work, before it proceeds.