Creating Cobol Systems Inside the Database


More Database Design Mistakes To Avoid
Architecture Mistakes To Avoid

I haven’t worked a whole lot with Cobol, but I did do enough.

Coming from a background where the first thing I really learned well on the job was how to do good database design, there were some odd things that I noticed about Cobol.

In Cobol, data is stored in files.  To work with Cobol, you have to define the file structures, and their data in the -every- program.  This is in contrast to relational databases, where fields are defined once in the table structures.

Here is a definition from a typical program that I found on the web:
001600 DATA DIVISION.
001700 FILE SECTION.
001800 FD  PHONE-FILE
001900     LABEL RECORDS ARE STANDARD.
002000 01  PHONE-RECORD.
002100     05  PHONE-LAST-NAME      PIC X(20).
002200     05  PHONE-FIRST-NAME     PIC X(20).
002300     05  PHONE-NUMBER         PIC X(15).

It’s sort of hierarchial.   The big picture, 01, says that the following fields represent the Phone_Record.  At the 05 level, the last and first name, and the phone number are detailed.  The actual data is only found only at the lowest level, in this case, the 05 level.

The picture clauses detail the fields in a position based manner.  The first 20 positions are the last name.  And all 20 positions in the file are used.

This may give you some idea why so many programmers are still concerned with the idea that they can “save space”.   During the days of Cobol, disk storage (DASD – Direct Access Storage Device) was very expensive, and so most storage was done on tape, which was much cheaper.  Of course not every last name is 20 characters long.  If a name was only 10 characters long, 10 characters would be blank, and “wasted”.

When both CPU and storage was expensive, this was of course a concern.  But now gigs can be had for dollars.  And the CPU in a little home computer is many multiples more powerful than old big iron.

The other improvement since then, is the datatype, Varchar.  If you define a field as varchar2(20), you can insert up to 20 characters.  And there is an end of field character.  If you insert only 10 characters, just 10 characters are used for storage of that data.

You can see these concepts by importing files into an Excel spreadsheet.  You are given the option to import it in position based manner, or by using a delimiter, such as a comma.

These concepts of CPU and saving space lead to some other strange practices.

Different types of data were sometimes actually stored in the same file!  So, the same file could have both data for customers, and for suppliers!

001600 DATA DIVISION.
001700 FILE SECTION.
001800 FD  CONTACT-FILE
001900     LABEL RECORDS ARE STANDARD.
002000 01  RECORD-IDENTIFIER        PIC X(1)
002100     02  CUSTOMER-RECORD
002200         05  CUSTOMER-NAME    PIC X(20).

004100     02  SUPPLIER-RECORD
004200         05 SUPPLIER-NAME     PIC X(20).

The way it would work, would be that when the row was read, the program would look at the first position, RECORD-IDENTIFIER.  It would have an indicator as to what kind of row it was.  Could be a CUSTOMER.  Could be a SUPPLIER.  Or, depending how complex they wanted to make it, it could have indicated something completely unrelated, with completely different field definitions, say, planetary orbits.

Cobol:  Using The Same Field For Different Data:

The one system that I worked on, Cobol programmer paradigm had a spill over effect.  In the Oracle database, in a particular character field, there was data that would look like this:   20100104

Now, you would think that this might be a date.  And sometimes, that was true.

But other times, it was not a date.  It was actually, Year, Period, Cycle.

How did you know?  You had to look at another field in the table to know!

So you have to write code to make sense of your data.  In SQL, yes, it can be simple enough in the WHERE clause:  WHERE Record_Type = ‘X’.  However, this is already adding more processing.   The more processing, the slower things are.

Another variation was a tax example.  There were initially two taxes, and two fields:
TAX_A
TAX_B

Then the tax laws were changed in certain jurisdictions, and a new tax came into effect.  Database paradigm would intuitively add another column to the database.

TAX_A
TAX_B
TAX_C

However the programmer decided to be creative.  If there was a TAX_C, there was no TAX_B.  So, rather than adding a new field to the table, they used the existing TAX_B field.  And you were “just supposed to know” that in those jurisdictions that even though the tax was labeled TAX_B, it was really TAX_C.

More algorithm was added.  More confusion.

Just think of the tax example.  Some states have a sales tax, and others do not.  Different cities can have sales taxes, but others do not.  City taxes and rate can vary just by driving to the next city.   Certain products or industries, such as hotel rooms, have specific kinds of taxes, that don’t apply to others.

If you look at the transaction record, and you want to know just how much a tax is, intuitively, you just want to look at the appropriate field, and see the tax.  If it is NULL, there is no tax.  Simple?

TAX_A     x
TAX_B     NULL
TAX_C     y

When you consider how very complex the tax system is, does it not make sense to have multiple fields?   Don’t get creative and use them for different, or multiple things.

Cobol:  Using the first field to identify what kind of row it is:

Another example was in finance.  Data was tallied into a number of tables, concerning Portfolios, Indexes, and Differences between the two.  In Cobol fashion, the first field would indicate what kind of data the record contained.  More algorithm.

Cobol:  Requiring Complex Algorithms For The Simplest Of Identification:

Another example in finance, was even worse.  The raw data did in fact come from an old Cobol system.  I needed to determine what kind of financial instrument a row contained.  But I couldn’t just look at the row.  The row did not identify itself clearly at all.

At least in these other examples of using Cobol, you could look at one or two other fields, and understand what data the row contained.  But in this case, a long algorithm that used a process of elimination was needed.

If field_x = a, then, type_1
If field_y = b then, type_2
….
Else
Type_15
Else
Unknown
End if
End if

I can’t remember whether I could use CASE statements in a SQL statement to do this.  Or, if I had to actually go row by row, and use the 3GL code, comparing different fields, as the elimination progressed.  If it was the latter, you begin to see how complexity in the design, leads to complexity in the code.  As I wrote in my tuning presentation, not using batch SQL statements, and using loops is one of the best ways to slow things down.

In a database paradigm, you should be able to just look at a row, and know immediately what kind of data you are looking at.  If it comes from the customer table, then you are looking at a customer record.  If it is the date field, it is the date.  Etc.

As I wrote in the Leaning Tower Of Pisa, a good database design should eliminate code, not require more code, in order to make it work.

Solution:

Don’t do Cobol.

When you see Cobol techniques, tell them that it is Cobol, and that you are too young.

Get database training, and learn the database paradigm.

About these ads

One Response to Creating Cobol Systems Inside the Database

  1. Gary says:

    I don’t think they’ll believe the “I’m too young” excuse in my case :)

    I’m currently dealing with an interface to an external company where the file format is very COBOLish. While a database is great for persistence, for communication I’d like to see this replaced by XML.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: