Building Better Systems Through Better Analysis

From Reqs to Specs.

In my other piece, I wrote how business requirements documents don’t work very well, how they create a trial and error process, and prototypes make it into production.

Much of the reason is that there is no analysis, general design, or detailed design. Nor is anyone in charge of these areas.

Much of my work in development has been doing the analysis and design that were not done in the first place. Here are some of the techniques that I’ve used. And some that have been very useful when others have done them for me.


Say What You Mean:

If you want to be an effective communicator with developers, the number one thing you can do, is to communicate with your audience, and say what you mean. Sounds simple doesn’t it?

What does that mean in database terms? Everything in databases ultimately boils down to:

Schema.Table.Field=IS (NOT) NULL

If you want to communicate well, do not use ambiguous nonsense such as, “the Op code”.

Instead, the superior analyst will say exactly what they mean, and writes the much more meaningful:

Which translates to the exact schema, table, field and value that they are referring to. And does not leave the developer wasting time and energy wondering. Or, searching for the correct table and field. Or, wasting time translating the vague op_code into something that can also be used by a SQL query.

Once I worked at a shop that was bizarre in many ways, and on many levels. One of the bizarre factors was that in meetings, they constantly referred to the “Op Code”. It turned out, that this phrase could mean one of four different fields or combinations, depending on the context! Confusing? How quickly do you think the meetings went? How quickly do you think the systems were built? How good do you think the systems were?

But, some will protest, “the user doesn’t think in those terms.” That may be true. But the document is not for the user. The document is written for the developer. Speak to your audience. If need be, put in both terms side by side.

Of course, detailing the Schema.Table.Field=Value will be no effort at all, if you have already done the next point.

Look At The Data At The Lowest Level Possible:

After talking to a user, you’re first job as a Business Analyst is to: LOOK AT THE DATA!

It’s amazing how many people I’ve come across in IT who don’t look at the data. Business Analysts. Java Programmers. Even database developers. Even though, they are working on a database system.

See my other post, Failure To Do Data Analysis

When you look at the data, you need to look at the data at the lowest level possible.

Another case in point, the same place I worked where there were four business analysts in a group of ten, there were problems with the data in the spreadsheets. I asked if the BA if they had looked at the data, and was told yes, but they still had problems. Finally I discovered that what he meant was, he had looked at the CSV file by opening it in Excel! The solution was, of course, to look at the CSV file in notepad, at the lowest level possible. Duh.

Look At The Data In The Tables:

What does the data look like?

Select *
From Schema1.tableA
Order by 1, 2

After returning from the meeting with the user, if an analyst does this, they are doing much more than most business analysts!

Is the data the user refers to, even in the table? Often, I’ve discovered, it actually is not!

What the user sees, and what the developer sees, are two different perspectives. The user has a GUI. The developer has a query tool.

The superior analyst will use the same query tool that the developer uses.

Is the data where they think it is? Or not? I’ve found that data can actually be in different tables, schemas, and even different databases. Or, perhaps it is calculated and not in the database at all.


Find Objects In The Database:

But where could it be?  If you don’t know what object the data is in, find the correct object. See this other post on finding object info,  and some of the other SQL scripts I have written.

Search for the names of:

Jobs (processes) running and not.

Eventually you should find the object. Then you will be able to look at the data.

Secret Weapon #1 – GROUP BY:

The GROUP BY statement has often been my best friend and secret weapon!

What many people fail to realize is that a GROUP BY is also a DISTINCT. It will return one row for each distinct combination.

The GROUP BY will give all possible values in both fields:

Select field1, field2, count(*)
From tableA
Group by field1, field2

This will show the distribution of the data in the tables.

You can also add more aggregate functions, especially on numerical fields

Select field1
, field2
, min(field4)
, max(field4)
, sum(field4)
, avg(field4)
, median(field4)
, count(*)
From tableA
Group by field1, field2
Order by field1, field2

You need to do this to see what your data looks like. Are there NULL values in there? How many? How about zero? How about any other strange data, off by multiple decimal places? If so, what do you do with those rows?

Does the same name appear with different spellings? What do you do with that?

The average analyst will rationalize, “oh that will never happen”. The superior analyst will know that bad data can and will get into tables, and think it through. How to prevent it from getting in. What to do with it when it does.


Secret Weapon #2 – Decision Grids:

The next progression from the GROUP BY will be to make 2 dimensional decision grids.

Two dimensional decision grids have been the most useful thing that I have remembered and used from university! They have helped very complex projects succeed many times.

Most requirements docs will give pseudo code algorithms such as:
If “A” “Approve”
If “Z” “Disapprove”

However, if you have actually analyzed the data with GROUP BY statements, you will now know what all the possible values in the different fields are.

Values in Field1 ->Field2 A B C
X Approve
Y Approve
Z ??? Disapprove Disapprove

In this example, there is a conflict.
“A” says to “Approve”.
“Z” says to “Disapprove”
Which is correct?

By putting data in a 2D grid, you can see:

Exceptions to typical scenarios.
Assumptions so fundamental that they were never stated.
Loss of information between the user, analyst, and developer.
Perhaps even shortsights in the user’s thought process. Users are not infallible.

Once the data is in the 2D grid, you can then think through the rest of the possibilities, and fill in the decision rules.

Values in Field1 ->Field2 A B C
X Approve Warning Message for data correction or verification Execute process J
Y Approve with condition G Error message – should not occur Send to Head Office
Z Disapprove Flag for money laundering Call Police

I’ve used this technique with duration calculations for a bond.

Values in Field1 ->Field2 A B
Y Calculate duration using formula a and values in Schema1 Calculate duration using formula b and values in Schema2
Z Copy value from Schema3.table3.field3 Calculate duration using formula d and values in Schema4

The analyst had given me prose that just didn’t make any sense. It turned out that the analyst had misinterpreted what the user had heard. And, since she didn’t look at the data, she had no way to verify if what she was writing made any sense.

The beauty of this technique is that it is so simple, clear, concise, and instantly digestible. The analyst, user, project manager, and developer can all read it and understand it at once. Cool!

You will also see 2D grids in many other places:

In a lot of research and Masters’ thesis
Marketing mixes
Analyst reports from Forrester, McKinsey, Gartner
And so on.


When detailing ETL and Data movements, use the exact and specific sources and destinations.

Source: Schema1.Table1.field1
Destination: Schema2.Table2.field2

Do this for every field. Spreadsheets are very useful for this type of mapping.

Of course, the superior analyst have already looked at the data, and already know the exact schema.table.field names, so this will naturally be quite easy and no burden at all.

Unix Queries:

If you are working with flat files on Unix, you know there are all kinds of unix commands to look at the file data: cat, grep, cut, find, file, uniq, awk, sed, and so on. Use them to query and understand your flat file data.

So there you have a few simple but highly effective tools to do analysis. As I’ve mentioned before:

There are no secrets.
There are no shortcuts.

To better systems.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: