March 4, 2014
Have you seen the map of all the different databases on the market today from 451 Group?
Database Landscape map – June 2013
There used to be just a few relational databases: DB2, MYSQL, Oracle, SQL Server, Sybase.
Now, there are actually twenty (20) different categories of databases! And well over 100 different vendors.
How many can you name?
How many have you installed?
How many have you used?
The full article is here:
December 28, 2013
Arterial Pulse Waveform
Recently, I saw a Master’s thesis defense at my Alma Mater, the University of Winnipeg, Applied Computer Science. Jingjing Xia analyzed the shape of heartbeats in patients known to have different diseases. By applying math and algorithms, she was able to find strong correlations between the disease, and the shape of the heartbeat.
Pulse wave analysis has actually been around since Mahomed wrote about it in 1872 and there has been more research since then. Researchers have studied the shape of the heartbeat, made measurements of the different sections of the graph, and applied math to it.
First, Xia determined an equation to fit the pulse wave, a complex sum of eight sine waves. Then the first and third derivatives were taken from the equation. From the derivatives, the locations of different points of the heartbeat were able to be identified: the wave foot, systolic peak, and reflected point. From these numbers, the Reverse Shoulder Index (RSI), and Ratio of Distance were determined.
Once all the numbers were known, correlations were run between the different measurements and the known diseases of the patient. They did find a number of correlations between the heartbeat’s pulse wave, and various cardiovascular diseases: coronary heart disease, hypertension, and chest pain.
One thought is that this has a lot of the ingredients of data science: mathematics & statistics, computer science & algorithms, data and a distinct subject area. Read the rest of this entry »
November 24, 2013
You probably can’t read all the 252K email messages in the Enron email dataset by yourself.
But with SQL it’s easy to search for keywords, like “Special Purpose Entity”, “Bankrupt”, “Fraud”, “Shutdown”, “Talking Points”, “FERC” and so on. They begin to reveal what really went on inside the minds at Enron.
Many Enron employees took MBA courses at UC Berkeley HAAS Business School.
Since the Enron bankruptcy, classes at UC Berkeley School Of Information began to analyze Enron’s emails, as early as 2004. Like this one: http://courses.ischool.berkeley.edu/i290-2/f04/assignments/a4_solutions/qu_poon.doc.
In this document, they search for “Talking Points“: an especially persuasive point helping to support an argument or discussion.
Read the rest of this entry »
November 19, 2013
Sherron Watkins is the former Vice President of Enron Corporation who alerted then-CEO Ken Lay in August 2001 to accounting irregularities within the company, warning him that Enron ‘might implode in a wave of accounting scandals. From her website:
At the House Hearing on Enron, Sherron Watkins said:
“I wish we could get caught. We are such a crooked company.” Sherron Watkins former Vice President of Corporate Development at Enron
In the emails made public, what can we find about Sherron Watkins? Unfortunately, not as much as we might hope.
There is no entry like “Watkin” in the table: employeelist. Again, like other senior executives, there are not many emails from or to Sherron Watkins.
In the entire email set, where the sender or receiver is firstname.lastname@example.org, there are only 24 unique messages. If a group by is done on the sender and receiver, there are only 46 unique messages.
So, does this indicate:
- Sherron Watkins did not email much
- her assistants did her email for her
- the email list has been overly sanitized
- or something else?
If we search the body of the messsages, we find Sherron Watkins in emails that she sent, that others then forwarded along. So, I would tend to think that the email list is a very small subset of actual email data. What we do find is rather interesting.
Sherron’s Resentment In The Behavior Of Others: Read the rest of this entry »
November 19, 2013
Persons Of Interest:
Continuing my analysis of the Enron scandal, I looked at some of the key players in the Enron scandal.
A good list of who played which position at Enron is at:
What can we discover about Enron’s People Of Interest by analyzing their email with SQL? Among other things, there was some very abusive management at Enron.
Person Of Interest – Andrew Fastow:
Interestingly, there is very little in the emails regarding Andrew Fastow, the CFO of Enron, who was one of the main culprits. He is rather absent in this dataset. Read the rest of this entry »
November 10, 2013
What can we discover analyzing Enron emails using SQL? Quite a bit actually.
The Enron scandal in 2001 was huge. As part of the discovery process, prosecutors started looking at emails to find evidence to convict the guilty. These email sets have since been made public.
Recently, I downloaded a set of Enron emails from USC:
and installed them into a MYSQL database.
There were over 252K email messages, sent to over 2 million recipients.
| Tables_in_enron | Count
| employeelist | 151
| message | 252,759
| recipientinfo | 2,064,442
| referenceinfo | 54,778
Analyzing the emails produced some very interesting findings about what went on inside Enron!
Read the rest of this entry »
August 19, 2013
As I pointed out in my last post, Counting Many Paths Between Nodes In NEO4J
the more complex and interconnected the graph, the number of paths between nodes goes up exponentially. However, those queries did not use the built in functionality, ShortestPath.
Last winter, I watched the video, Cypher for SQL Professionals, with Andres Taylor. In the video, there were a number of queries to the Cineasts database, that did use the ShortestPath functionality.
One of the queries in the video was Bacon Lucy. How many nodes of actors and movies separated the performers, Kevin Bacon, and Lucy Liu? Read the rest of this entry »