DATA SCIENCE MIS0855 | Spring 2016 Storing and Retrieving Data SungYong Um
The Database A collection of files, organized as tablesTables are made up records (rows)Rows are made up of fields (columns)Fields are made up of characters
Database Types Flat file database Relational database
EmpNoEnameDeptNoDeptName 101Abigail10Marketing 102Bob20Purchasing 103Carolyn10Marketing 104Doug20Purchasing 105Evelyn10Marketing DeptNoDeptName 10Marketing 20Purchasing Flat file database EmpNoEnameDeptNo 101Abigail10 102Bob20 103Carolyn10 104Doug20 105Evelyn10 Relational database
Back to Big Data Velocity Variety Volume
“Big Data” is a set of technologies It is not data analytics …Or information, or knowledge It’s a way of processing large amounts of data, not extracting insight from it
Why does Facebook (or Twitter, or Instagram) care how you feel? Do you think this is a problem?
Reminder: Looking for words… Great Fantastic Best Wonderful Outstanding Amazing Yelp reviews for Poi Dog Snack Shop “Positive” Word Library
How we’ll do sentiment analysis Retrieve Tweets using a spreadsheet add- in in Google Drive Copy the tweets from Google Drive to a special Excel workbook Excel classifies the Tweets as positive or negative
“Big Data” 101: It’s just a set of technologies… This isn’t the only option, but it is among the most popular… What is their purpose?
What do… Scenario: There an extremely large database with constantly changing data. My regular database and computer can’t keep up with this amount of data. Things get slow, or break down completely.
What do… Stores big databases in smaller pieces across a network of connected computers Breaks up the task and gives each connected computer a small piece to work on What those jobs are can be anything!
An example… Stores real-time cable box activity for 5,000,000 customers, by region Analyzes which programs people are mostly likely to pause and then skip commercials