Install Sqlite And Query Firefox Log Files Using SQL

September 10, 2014

Firefox saves a lot of log information in RDBMS files that can be queried in sqlite, using standard SQL commands.

SQLite3

SQLite3

 

To query the files, you have to have the right version of sqlite. Turns out that an old version was installed with my Redhat Linux 6.4 installation and I had to upgrade it in order to see the data. Here is how you can install the software and query the logs.

 

When I ran sample sqlite queries I found on the web, I got lots of errors:

sqlite3 ./cookies.sqlite 'select count() from moz_cookies'
Error: file is encrypted or is not a database


Read the rest of this entry »


Analyzing Keywords in Enron’s Email

November 24, 2013

You probably can’t read all the 252K email messages in the Enron email dataset by yourself.

But with SQL it’s easy to search for keywords, like “Special Purpose Entity”, “Bankrupt”, “Fraud”, “Shutdown”,  “Talking Points”, “FERC” and so on. They begin to reveal what really went on inside the minds at Enron.

TALKING POINTS:

Many Enron employees took MBA courses at UC Berkeley HAAS Business School.

Since the Enron bankruptcy, classes at UC Berkeley School Of Information began to analyze Enron’s emails, as early as 2004. Like this one: http://courses.ischool.berkeley.edu/i290-2/f04/assignments/a4_solutions/qu_poon.doc.

In this document, they search for “Talking Points“: an especially persuasive point helping to support an argument or discussion.

Read the rest of this entry »


Enron – A Few Good Guys

November 19, 2013

Sherron Watkins:

Sherron Watkins

Sherron Watkins

Sherron Watkins is the former Vice President of Enron Corporation who alerted then-CEO Ken Lay in August 2001 to accounting irregularities within the company, warning him that Enron ‘might implode in a wave of accounting scandals. From her website:
sherronwatkins.com/sherronwatkins/Sherrons_Bio.html

At the House Hearing on Enron, Sherron Watkins said:
“I wish we could get caught. We are such a crooked company.” Sherron Watkins former Vice President of Corporate Development at Enron

In the emails made public, what can we find about Sherron Watkins? Unfortunately, not as much as we might hope.

There is no entry like “Watkin” in the table: employeelist. Again, like other senior executives, there are not many emails from or to Sherron Watkins.

In the entire email set, where the sender or receiver is sherron.watkins@enron.com, there are only 24 unique messages. If a group by is done on the sender and receiver, there are only 46 unique messages.

So, does this indicate:
– Sherron Watkins did not email much
– her assistants did her email for her
– the email list has been overly sanitized
– or something else?

If we search the body of the messsages, we find Sherron Watkins in emails that she sent, that others then forwarded along. So, I would tend to think that the email list is a very small subset of actual email data. What we do find is rather interesting.

Sherron’s Resentment In The Behavior Of Others:   Read the rest of this entry »


Enron Email Analysis – Persons Of Interest

November 19, 2013

Persons Of Interest:

Continuing my analysis of the Enron scandal, I looked at some of the key players in the Enron scandal.

Kenneth Lay

Kenneth Lay

A good list of who played which position at Enron is at:
http://enrondata.org/assets/edo_enron-custodians-data.html
and at:
http://www.infosys.tuwien.ac.at/staff/dschall/email/enron-employees.txt

What can we discover about Enron’s People Of Interest by analyzing their email with SQL?  Among other things, there was some very abusive management at Enron.

Person Of Interest – Andrew Fastow:

Interestingly, there is very little in the emails regarding Andrew Fastow, the CFO of Enron, who was one of the main culprits. He is rather absent in this dataset. Read the rest of this entry »


Analyzing Enron Email Metadata Using SQL

November 10, 2013

enron-logo

enron-logo

What can we discover analyzing Enron emails using SQL? Quite a bit actually.

The Enron scandal in 2001 was huge.  As part of the discovery process, prosecutors started looking at emails to find evidence to convict the guilty. These email sets have since been made public.

Recently, I downloaded a set of Enron emails from USC:
http://www.isi.edu/~adibi/Enron/Enron.htm
and installed them into a MYSQL database.

There were over 252K email messages, sent to over 2 million recipients.

+-----------------+
| Tables_in_enron |      Count
+-----------------+
| employeelist    |        151
| message         |    252,759
| recipientinfo   |  2,064,442
| referenceinfo   |     54,778
+-----------------+

Analyzing the emails produced some very interesting findings about what went on inside Enron!

MetaData:

Read the rest of this entry »


Recommendation Engines: RDBMS and SQL, Versus Graph Database

May 30, 2013

For years, I’ve worked with Oracle, doing complex SQL. Recently I’ve been looking at the graph database, NEO4J.

Last night I was watching a NEO4J webinar about Graphs for Gaming.
It makes some interesting Cypher (NEO4J query language) queries for recommendation engines for a gaming company. The point being that certain tasks are much easier in Graph DB/Cypher than in RDBMS/SQL.

The more complex query was to take an individual gamer, Rik, and find other users/gamers, who:
– had worked at one the same companies as Rik
– spoke one of the same languages as Rik
– had NOT gamed with Rik yet

The 12 line Cypher query was:   Read the rest of this entry »


NEO4J: Finding Object Information Using the Web Interface

February 4, 2013

NEO4J has provided a very functional web interface to find information on objects.  If you run the following query,

START n = node(*)
WHERE has (n.name)
and (n.name=”Lucy Liu”)
return n

you will get one object back, the node for Lucy Liu. With the web interface, you can then click on the node and see much of the information about it. Right click, open in new tab.

Node 1000 Detail Lucy Liu

Node 1000 Detail Lucy Liu

Read the rest of this entry »


Getting Started With NEO4J: For Database Professionals

February 3, 2013

Why Use the NEO4J Graph Database?

About 10 years ago, I went to a BIO-IT conference, and looked a spiralling 3D model of a very large molecule made up of hundreds of atoms. I thought, “that does not look like the rows and columns in a relational database”. Read the rest of this entry »


The Parents And The Order Of Operations

December 29, 2011

When you build objects in a new environment, you need to build them in the right order of operations.

Until you have all the objects in place, you can’t create a procedure that references them all. And if those objects require more parent objects, they must be created first too. For example:

procedure reads
    view which reads a
       table which is composed of a
          type

—-

Recently I wrote in my other post about the parent and child dependencies. They give a lot of great information. But they only go one level in either direction. As you know, there can be many levels of objects.

What is the order of operations to build them? I’ve written some complex scripts here to find all the successive parents of an object.

Read the rest of this entry »


Why I Like The Merge Statement

November 22, 2010

Over the past few years, I’ve stopped using the Update statement, and  started using the Merge statement for updates instead.  While many use it to do both inserts and updates at the same time, you can use Merge for only updates, or inserts.

Say I want to update some fields in the SCOTT schema.  Traditionally, I would first do the analysis by looking at the data.
Read the rest of this entry »