Introduction Everything Data CompSci 290.01 Spring 2014.

Slides:



Advertisements
Similar presentations
1 Introduction to the Computer as an Analysis Tool OPIM 101.
Advertisements

CS6501: Text Mining Course Policy
CS112: Course Overview George Mason University. Today’s topics Go over the syllabus Go over resources – Marmoset – Blackboard – Piazza – Textbook Highlight.
CSE 5522: Survey of Artificial Intelligence II: Advanced Techniques Instructor: Alan Ritter TA: Fan Yang.
Inquiry Project: What Can We Learn From Weather Forecasts Online? By: Laura Stokes
Harvard School of Engineering and Applied Sciences Engineering Sciences 96 Engineering Design Seminar Spring 2009 Instructors: Warren Seering and Rob Howe.
COMP171 Data Structures and Algorithm Qiang Yang Lecture 1 ( Fall 2006)
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
CSCE790: Security and Privacy for Emerging Ubiquitous Communication system Wenyuan Xu Department of Computer Science and Engineering University of South.
Putting the Learning in Distance Learning Mary Beth Orrange AMATYC Math on the Web Themed Session Thursday, November 20, Washington D.C.
Introduction to Data Science Kamal Al Nasr, Matthew Hayes and Jean-Claude Pedjeu Computer Science and Mathematical Sciences College of Engineering Tennessee.
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall, 2014 Room 120 Integrated.
Spring 2008 Mark Fontenot CSE 1341 Principles of Computer Science I Note Set 1 1.
 Mrs. DeBoard’s Contact Information  Phone:   Website: deboardvirtualbio.wikispaces.com  Office Hours:
Grade 10 MCAS OPEN RESPONSE QUESTION SPRING 2001 Exam, #40
Spring 2008 Mark Fontenot CSE Honors Principles of Computer Science I Note Set 1 1.
CS105 Lab 1 – Introduction Section: ??? TA: ??? ??? Announcements CITES Accounts Compass Netfiles Other Administrative Information CS105 Fall
Cpt S 471/571: Computational Genomics Spring 2015, 3 cr. Where: Sloan 9 When: M WF 11:10-12:00 Instructor weekly office hour for Spring 2015: Tuesdays.
CS 150 PERSONAL PRODUCTIVITY USING TECHNOLOGY Instructor: Xenia Mountrouidou.
Welcome to CS 3260 Dennis A. Fairclough. Overview Course Canvas Web Site Course Materials Lab Assignments Homework Grading Exams Withdrawing from Class.
CSE 2337 Introduction to Data Management Introduction.
COMP Introduction to Programming Yi Hong May 13, 2015.
CS6501 Information Retrieval Course Policy Hongning Wang
Introduction to Statistics for the Social Sciences SBS200, COMM200, GEOG200, PA200, POL200, or SOC200 Lecture Section 001, Fall, 2014 Room 120 Integrated.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
CSc 2310 Principles of Programming (Java) Dr. Xiaolin Hu.
How to Learn in This Course CS 5010 Program Design Paradigms “Bootcamp” Lesson 0.1 © Mitchell Wand, This work is licensed under a Creative Commons.
Google Apps in Classrooms and Schools 32 Ways to Use Google Apps in 50 Minutes Julia Stiglitz Google Apps for Education
CST 229 Introduction to Grammars Dr. Sherry Yang Room 213 (503)
Spring 2011 ICS321 Data Storage & Retrieval Mon & Wed 12-1:15 PM Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii.
1 Software Systems Development CEN Spring 2011 TR 12:30 PM – 1:45 PM ENB 116 Instructor:Dr. Rollins Turner Dept. of Computer Science and Engineering.
How to start Milestone 1 CSSE 371 Project Info There are only 8 easy steps…
CSCI 51 Introduction to Computer Science Dr. Joshua Stough January 20, 2009.
MSE 101 ON LINE LECTURE INTRODUCTION LECTURE & LABORATORY Professor: Professor Shahriar Manufacturing Systems & Engineering Management Dept.  Faculty.
1 Principles of Computer Science I Note Set 1 CSE 1341.
CS 6961: Structured Prediction Fall 2014 Course Information.
Course Information Sarah Diesburg Operating Systems COP 4610.
Sugartown Science Fair Science Fair What is Science Research? Where Can I Get My Research Project Idea ? How Do I Develop My Idea into an Experiment?
 Instructor: Professor Timothy Burry  Address:  Office Location: Student Hall / 2 nd floor.
Course Information Andy Wang Operating Systems COP 4610 / CGS 5765.
CSE 2337 Introduction to Data Management Introduction.
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
How to Learn in This Course CS 5010 Program Design Paradigms “Bootcamp” Lesson 0.1 © Mitchell Wand, This work is licensed under a Creative Commons.
Jongwook Woo CIS 528 Introduction to Big Data Science (Syllabus) Jongwook Woo, PhD California State University, LA Computer and Information.
INF 117 Project in Software Engineering Lecture Notes -Winter Quarter, 2008 Michele Rousseau Set 1.
1 CAP 4063 Web Application Design Summer 2012 TR 9:30 – 11:40 PM CHE 102 Instructor:Dr. Rollins Turner Dept. of Computer Science and Engineering ENB 336.
Fall 2010 ICS321 Data Storage & Retrieval Mon & Wed 12-1:15 PM Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at.
Internet Studies. Faculty Members The specialty has now 2 faculty members Prof. Ronen Feldman: Text Mining, Data Mining, Social Media Analysis, Information.
AVI/Psych 358/IE 340: Human Factors Section AL1 (MWF 9:00 – 9:50) Fall 2008.
CS112: Course Overview George Mason University. Today’s topics Go over the syllabus Go over resources – Marmoset – Blackboard – Piazza – Textbook Highlight.
CSE 1340 Introduction to Computing Concepts Class 1 ~ Intro.
MM2422 Managing Business Information Systems & Applications — Before we start…
Mining of Massive Datasets Edited based on Leskovec’s from
Adapted from
Spring 2008 Mark Fontenot CSE 1341 – Honors Principles of Computer Science I Note Set 1 1.
DATA MINING: LECTURE 1 By Dr. Hammad A. Qureshi Introduction to the Course and the Field There is an inherent meaning in everything. “Signs for people.
CSE6339 DATA MANAGEMENT AND ANALYSIS FOR COMPUTATIONAL JOURNALISM CSE6339, Spring 2012 Department of Computer Science and Engineering, University of Texas.
Lecture 00: Introduction
CS6501 Advanced Topics in Information Retrieval Course Policy
Welcome to AP Stats!.
CSC207 Fall 2016.
The General Education Core in CLAS
Course Information Mark Stanovich Principles of Operating Systems
Data Structures Algorithms: (Slides to be Adopted from Goodrich and aligned with Weiss' book) Instructor: Ganesh Ramakrishnan
CMPT 409 – Competitive Programming (Spring 2018)
CS190/295 Programming in Python for Life Sciences: Lecture 1
Welcome to CS220/MATH 320 – Applied Discrete Mathematics Fall 2018
MIS2502: Data Analytics Course Introduction
Lecture 00: Introduction
Course Introduction Data Visualization & Exploration – COMPSCI 590
Presentation transcript:

Introduction Everything Data CompSci Spring 2014

About us Instructors – Ashwin Machanavajjhala: data privacy, massive data analytics – Jun Yang: data-intensive systems, computational journalism TAs – Austin Alexander: statistical methods in population statistics – Brett Walenz: computational journalism 2

Let’s talk about $$$ 3 In perspective: 87% of Google’s revenue comes from online ads (as of 2012)

Data and business % clicks vs. editorial one-size-fits-all +79% clicks vs. randomly selected +43% clicks vs. editor selected Recommended links Personalized News Interests Top Searches

Data and science 5 Detecting influenza epidemics using search engine query data e07634.html Red: official numbers from Center for Disease Control and Prevention; weekly Black: based on Google search logs; daily (potentially instantaneously)

Data and government 6 big-data-president/2013/06/14/1d71fe2e-d391-11e2- b05f-3ea3f0e7bb5a_story.html g/Democratizing-Data m/business/economy/democr ats-push-to-redeploy- obamas-voter- database/2012/11/20/d14793 a4-2e83-11e2-89d4- 040c a_story.html 23/edward-snowden-nsa-files-timeline

Data and culture 7 Word frequencies in English-language books in Google’s database ne/2013/03/20/what-are- you-in-the-mood-for- emotional-trends-in-20th- century-books/

Data and ____ ☜ your favorite subject 8 SportsJournalism

Hal Varian Chief Economist, Google I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades… Jan on_how_the_web_challenges_managers 9

How to extract value from data Wrangle data – Get the data you want into the form you need for analysis Analyze data – Explore, query, run models, visualize… Communicate your results – Tell a story – Empower others 10

Data wrangling/munging 11

Data analysis 12 Explore and visualize, e.g., using a spreadsheet Query, e.g., using database systems “80% of analytics is sums and averages.” – Aaron Kimball, wibidata Model, detect, and predict, e.g., using R Scale up, e.g., using MapReduce

Communicating results “The British government spends £13 billion a year on universities.” – So? – Try instead bubbletree-map.html#/~/total/education/university “On average, 1 in every 15 Europeans is totally illiterate.” – True – But about 1 in every 14 is under 7 years old! 13

To finish what Varian said… I think statisticians are part of it, but it’s just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively. But I do think those skills—of being able to access, understand, and communicate the insights you get from data analysis—are going to be extremely important 14

What skills do you need? Domain expertise – Formulating problem – Interpreting and communicating results Statistics and math – Developing/applying quantitative models and methods to analyze data Computer science – Munging data – Presenting data and results – Developing/applying computational techniques to analyze more data faster and cheaper 15

What happens… … if you ignore some of these skills 16

Why this course? No single course at Duke gave you the overall picture—we decided to fix that! With this course, we hope you will – Develop a holistic, interdisciplinary picture of how to deal with data – View data and results with a critical eye – Learn enough basic building blocks to go from raw data all the way to insights – Know what additional expertise you need for tackling bigger, harder problems 17

Course material Data wrangling Working with different types of data – Text, tabular, graph Working with “big” data – MapReduce Statistics Machine learning – Clustering, classification, etc. Visualization Ethics and privacy (not necessarily in this order) 18

Course format Meetings alternate between lectures and hands-on labs – With weekly homework exercises in between No exams Capstone team project – Open-ended: you propose what dataset(s) you want to “take all the way” – Present your projects to the class at a mini- conference when semester ends 19

Misc. course info Website: – Schedule (with links to lecture slides, labs, homework, and additional readings) – Help (office hours and online docs) Grading – Project: 50% – Homework: 35%, each graded on an X/I/V/E scale – Class participation: 15% Sakai for grades Piazza for discussion 20

Duke Community Standard See course website for link Group discussion for homework/labs is okay (and encouraged), but – Acknowledge help you receive from others – Make sure you “own” your solution All suspected cases of violation will be aggressively pursued 21

Announcements (Thu. Jan. 9) Homework #1 due Monday midnight: see website for details – Short self-intro (submission required) – Learn OpenRefine and regular expressions No submission required for this part unless you want to earn an “E” Top-10 on the wait list should have a good chance of getting in the class – Permission #’s will be given next Tuesday 22

A quick survey for loops Tag clouds Inverted lists Mashup API GROUP BY Standard deviation Bayes’ Rule k-means MapReduce 23 Please raise your hand if you know this term: