Failure To Do Data Analysis


Databases are all about, data.

So it’s totally bizarre how so many people working with databases never look at the data!

It’s really so fundamental, it’s hard to make a good analogy.  Imagine a tailor making your clothes, but never looked at the fabrics!  Can you think of a better analogy?

I’ve met programmers who have charged ahead and written one thousand lines of code, and have not even looked at either the data in the tables, the input, or the output!

———-

On one system that I worked, a programmer wrote a java program that would:  take an input file, read it, find the associated row in the table, flag a column in the table.  He told me on a Friday that he had run it, and it ran fine.  And on Monday, he told me the same thing.

I was curious as to what the results were, so I did a quick GROUP BY on the column.  A strange conversation ensued.

Me:  Hey, this field is NULL in every row of the table.
Java programmer:  Why are they NULL?
Me:  I’m asking you, it’s your program.  Why are they NULL?
Java programmer: I don’t know.
Me: Did you run a GROUP BY after you ran it to see how many got updated?
Java programmer: No.
Me:  How about before you ran the program?
Java programmer: No
Me:  Do you have any debug in here to point out how many rows you are updating as you go?
Java programmer:  No.
Me:  What fields are you using to search for the row?
Java programmer: I don’t know, whatever they gave me.
Me:  How do you know you are not updating 100,000 rows instead of a single row when you run the update statement?
Java programmer: I don’t know.
Me:  We got the data in, row by row.   We are going to be flagging them, row by row.  You need to get a single row when you run the update statement.
Java programmer:  Really?

I found out what the fields were that he was using for to search on.  I then did a GROUP BY, HAVING COUNT(*) > 1 to see if the combination was unique or not.   Fortunately, it was.  So, why were they not updating?

At this point, it went well beyond me advising, or helping him.  This was what I call, doing his work for him!

The guy had actually been programming for at least 10 years, and had the title, Senior Java Programmer.

———-

At the same job, there was another Senior Programmer who had been also working for at least 10 years.  He was writing a program that read the database for campaign responses, and created an output file, which was sent to an outside company.

We started a second campaign, with fewer responses.  The manager pointed out to me that they were suddenly getting strange results in the output file.  Instead of ones and zeros (0, 1), they found letters.  He read them to me, “N-U-L-L”  !!!!

The programmer had said there was a problem with the database.  Hmm.  A senior programmer who did not understand the concept of NULL.

———-

At another job, I made a package for the java team.  It ran fine.  But one guy kept bothering me.
Java programmer:  Hey, your program isn’t working.
Giving him the benefit of the doubt.  Really?  Show me.

He showed me his program.  Note, I said, his program.  Not, the input data.

Over an hour later, he saw that the problem was in fact his program.  He had failed to look at what the data output from the package was giving him, for the input into his program.

———-

So many people, like each in these examples, completely fail to look at the underlying data directly, say with SQLPLUS.  They don’t know what they data looks like.  Either before, or after they run their program. They don’t know what the data in the various columns look like.  Or even if there is any data in the table at all.  I’ve seen this with java programmers, project managers, business analysts, DBAs, and even, get this, database developers!

To build a meaningful system, you MUST look at, and understand your data!  It’s really quite simple.

I’ll write about some of my typical data analysis techniques in subsequent posts.

About these ads

One Response to Failure To Do Data Analysis

  1. Nice post.

    Like you it never ceases to amaze me how when developing information systems people don’t give data architecture related tasks – modelling, data quality etc – and overall data management appropriate priority.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: