Download presentation
Published byRoxana Voyce Modified over 9 years ago
1
Data Science for Tackling the Challenges of Big Data
Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community November 14, 2014
2
Overview Six Week MIT Online Course:
Started November 4th and Completed November 12th. Mined this MIT Online Course for Data Sets and Ideas: Found subset of the slides that contained data sets and ideas and were interesting and useful visualizations in themselves. Professor Karger's Lecture Slides on Visualization User Interfaces Were All About My Heroes: Tukey, Tufte, Sneiderman, and Spotfire. (In fact it was everything leading up to Spotfire, but Spotfire itself!) Preserve My Work & Present Tutorial to the Federal Big Data Working Group Meetup: MindTouch Knowledge Base, Excel Spreadsheet Index, and Spotfire Interactive Visualizations.
3
MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Course Assessment
Web Site (private)
4
MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Course Progress
5
MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Big Data Storage
Web Site (private)
6
MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Modern Databases
Script Web Site (private) and Script (Public)
7
Courseware: Big Data Storage
I was especially interested in the following since both Professors Stonebraker and Madden presented to our Federal Big Data Working Group Meetup: This module begins with an overview of a number of these technologies by renowned database professor Mike Stonebraker. In his unique and ardent fashion, Mike expresses his skepticism about many new technologies, particularly Hadoop/MapReduce and NoSQL, and voices support for many new relational technologies, including column stores and main memory databases. After that, Professors Matei Zaharia and Samuel Madden provide a more nuanced view of the tradeoffs between the various approaches, discussing Hadoop and its derivatives, as well as NoSQL and its tradeoffs, in more detail. Professor Stonebraker expresses a number of strong opinions in this module. Which of them do you agree with? Which do you disagree with? Why? 3.0 Introduction to Big Data Storage and Discussion 3
8
Selected Slides: Professor Sam Madden
What Is This Course Going to Cover? Other Techniques We'll Cover
9
Selected Slides: Professor David Karger
Overview Interaction Strategy
10
Selected Slides: Professor Daniela Rus
Case Study: Transportation in Singapore 1.1 Case Study: Transportation - PDF of Presentation slides (Rus)
11
Google Search: Singapore Taxi Data
12
Think Business: Why can’t I find a taxi when I really need one?
Based on: Labor Supply Decisions of Singaporean Cab Drivers, May 8, 2013 Newer Paper: Labor Supply Decisions of Singaporean Cab Drivers, September 2014
13
Labor Supply Decisions of Singaporean Cab Drivers: Table 1: Summary Statistics by Days
14
MIT Big Data Knowledge Base: Table 1 Spreadsheet
My Note: Image PDF so had to hand build! Spreadsheet
15
Singapore Land Transport Authority: Traffic Info Service Providers
16
Singapore Land Transport Authority: MyTransport.sg
Screen Scrape
17
Singapore Land Transport Authority: All Datasets Spreadsheet
18
MIT Big Data Knowledge Base: MindTouch
Labor Supply Decisions of Singaporean Cab Drivers, September 2014, as a Data Science Data Publication Data Science for Tackling the Challenges of Big Data
19
MIT Big Data: Knowledge Base Spreadsheet
20
MIT Big Data: Course Participant Spreadsheet
My Note: This was mapped in Spotfire after data curation (cleaning of the country names). Spotfire has built in data curation functions. Spreadsheet
21
MIT Big Data: Spotfire Cover Page
Web Player
22
MIT Big Data: Student Enrollment
Web Player
23
MIT Big Data: Singaporean Cab Drivers
Web Player
24
New York City Open Data: Socrata
25
New York City Open Data: Search Results
My Note: Could Only Find Taxi Drivers Data. Web Site
26
New York City Open Data: Data Table
Download: XLSX Web Site and Medallion_Drivers_-_Active.xlsx
27
Visualizing NYC’s Open Data: Socrata Beta
28
MIT Big Data Assessment: Questions and Answers
Big Data Collection 2) Data science requires: Knowledge of statistics Knowledge of data management Knowledge of curation All of the above - correct Big Data Systems 13) For which of the following tasks is interactive visualization most useful? (choose all that apply) Developing a hypothesis about data - correct Formally confirming a hypothesis Communicating a conclusion about data - correct All of the above Big Data Analytics: 13) Big Data means that there's no shortage of useful data. True False - correct Story
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.