Creating Connectivity Between R Studio & SQL Server 17

January 1, 2019

Intro:

For Data Science, SQL Server 17 has optional built in R functionality. It also has the ability to call R Studio from within SQL Server. The prerequisite is reliable connectivity between SQL Server and R Studio.

Here are the steps to configure SQL Server and R to work with each other.

——————————————————-

Initial Technology Stack To Install:

Read the rest of this entry »


Statistical Analysis on Changing Work Values in the USA

June 18, 2014

Abstract:

Statistical analysis of the USA General Social Survey (GSS) data shows that work values changed radically after the 2008 recession. What used to be the most important value, “Work important and feel accomplishment” became the least important. What used to be the least important values, “Short working hours”, and ” No danger of being fired” became the most important in 2012.

 

Core Values

Core Values

 

 

Background:   Read the rest of this entry »


Database Landscape Map From 451 Group

March 4, 2014

Have you seen the map of all the different databases on the market today from 451 Group?

Database Landscape map – June 2013

Database Landscape map – June 2013

There used to be just a few relational databases: DB2, MYSQL, Oracle, SQL Server, Sybase.

Now, there are actually twenty (20) different categories of databases!  And well over 100 different vendors.

How many can you name?
How many have you installed?
How many have you used?

The full article is here:
http://blogs.the451group.com/information_management/2013/06/10/updated-database-landscape-map-june-2013/


Detecting Diseases By Analyzing the Pulse Waves of Heartbeats

December 28, 2013

Arterial Pulse Waveform

Arterial Pulse Waveform

Recently, I saw a Master’s thesis defense at my Alma Mater, the University of Winnipeg, Applied Computer Science. Jingjing Xia analyzed the shape of heartbeats in patients known to have different diseases. By applying math and algorithms, she was able to find strong correlations between the disease, and the shape of the heartbeat.

Pulse wave analysis has actually been around since Mahomed wrote about it in 1872 and there has been more research since then. Researchers have studied the shape of the heartbeat, made measurements of the different sections of the graph, and applied math to it.

Summary:

First, Xia determined an equation to fit the pulse wave, a complex sum of eight sine waves. Then the first and third derivatives were taken from the equation. From the derivatives, the locations of different points of the heartbeat were able to be identified: the wave foot, systolic peak, and reflected point. From these numbers, the Reverse Shoulder Index (RSI), and Ratio of Distance were determined.

Once all the numbers were known, correlations were run between the different measurements and the known diseases of the patient. They did find a number of correlations between the heartbeat’s pulse wave, and various cardiovascular diseases: coronary heart disease, hypertension, and chest pain.

Thoughts:

One thought is that this has a lot of the ingredients of data science: mathematics & statistics, computer science & algorithms, data and a distinct subject area.  Read the rest of this entry »


Analyzing Keywords in Enron’s Email

November 24, 2013

You probably can’t read all the 252K email messages in the Enron email dataset by yourself.

But with SQL it’s easy to search for keywords, like “Special Purpose Entity”, “Bankrupt”, “Fraud”, “Shutdown”,  “Talking Points”, “FERC” and so on. They begin to reveal what really went on inside the minds at Enron.

TALKING POINTS:

Many Enron employees took MBA courses at UC Berkeley HAAS Business School.

Since the Enron bankruptcy, classes at UC Berkeley School Of Information began to analyze Enron’s emails, as early as 2004. Like this one: http://courses.ischool.berkeley.edu/i290-2/f04/assignments/a4_solutions/qu_poon.doc.

In this document, they search for “Talking Points“: an especially persuasive point helping to support an argument or discussion.

Read the rest of this entry »


Enron – A Few Good Guys

November 19, 2013

Sherron Watkins:

Sherron Watkins

Sherron Watkins

Sherron Watkins is the former Vice President of Enron Corporation who alerted then-CEO Ken Lay in August 2001 to accounting irregularities within the company, warning him that Enron ‘might implode in a wave of accounting scandals. From her website:
sherronwatkins.com/sherronwatkins/Sherrons_Bio.html

At the House Hearing on Enron, Sherron Watkins said:
“I wish we could get caught. We are such a crooked company.” Sherron Watkins former Vice President of Corporate Development at Enron

In the emails made public, what can we find about Sherron Watkins? Unfortunately, not as much as we might hope.

There is no entry like “Watkin” in the table: employeelist. Again, like other senior executives, there are not many emails from or to Sherron Watkins.

In the entire email set, where the sender or receiver is sherron.watkins@enron.com, there are only 24 unique messages. If a group by is done on the sender and receiver, there are only 46 unique messages.

So, does this indicate:
– Sherron Watkins did not email much
– her assistants did her email for her
– the email list has been overly sanitized
– or something else?

If we search the body of the messsages, we find Sherron Watkins in emails that she sent, that others then forwarded along. So, I would tend to think that the email list is a very small subset of actual email data. What we do find is rather interesting.

Sherron’s Resentment In The Behavior Of Others:   Read the rest of this entry »


Enron Email Analysis – Persons Of Interest

November 19, 2013

Persons Of Interest:

Continuing my analysis of the Enron scandal, I looked at some of the key players in the Enron scandal.

Kenneth Lay

Kenneth Lay

A good list of who played which position at Enron is at:
http://enrondata.org/assets/edo_enron-custodians-data.html
and at:
http://www.infosys.tuwien.ac.at/staff/dschall/email/enron-employees.txt

What can we discover about Enron’s People Of Interest by analyzing their email with SQL?  Among other things, there was some very abusive management at Enron.

Person Of Interest – Andrew Fastow:

Interestingly, there is very little in the emails regarding Andrew Fastow, the CFO of Enron, who was one of the main culprits. He is rather absent in this dataset. Read the rest of this entry »


Analyzing Enron Email Metadata Using SQL

November 10, 2013

enron-logo

enron-logo

What can we discover analyzing Enron emails using SQL? Quite a bit actually.

The Enron scandal in 2001 was huge.  As part of the discovery process, prosecutors started looking at emails to find evidence to convict the guilty. These email sets have since been made public.

Recently, I downloaded a set of Enron emails from USC:
http://www.isi.edu/~adibi/Enron/Enron.htm
and installed them into a MYSQL database.

There were over 252K email messages, sent to over 2 million recipients.

+-----------------+
| Tables_in_enron |      Count
+-----------------+
| employeelist    |        151
| message         |    252,759
| recipientinfo   |  2,064,442
| referenceinfo   |     54,778
+-----------------+

Analyzing the emails produced some very interesting findings about what went on inside Enron!

MetaData:

Read the rest of this entry »