Flu and big data Week 10.2.

Slides:



Advertisements
Similar presentations
Learning in ECE 156A,B A Brief Summary Li-C. Wang, ECE, UCSB.
Advertisements

What is Science?.
Knowledge Engineering Week 3 Video 5. Knowledge Engineering  Where your model is created by a smart human being, rather than an exhaustive computer.
The Road to a Good Science Project Dr. Michael H. W. Lam Department of Biology & Chemistry City University of Hong Kong Hong Kong Student Science Project.
RSBM Business School Research in the real world: the users dilemma Dr Gill Green.
1 Using R for consumer psychological research Research Analytics | Strategy & Insight September 2014.
1 Chapter No 3 ICT IN Science,Maths,Modeling, Simulation.
Term 2, 2011 Week 1. CONTENTS Types and purposes of graphic representations Spreadsheet software – Producing graphs from numerical data Mathematical functions.
Design Science Method By Temtim Assefa.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
Highline Class, BI 348 Basic Business Analytics using Excel, Chapter 01 Intro to Business Analytics BI 348, Chapter 01.
Where did plants and animals come from? How did I come to be?
Systems Life Cycle. Know the elements of the system that are created Understand the need for thorough testing Be able to describe the different tests.
Jun Li, Peng Zhang, Yanan Cao, Ping Liu, Li Guo Chinese Academy of Sciences State Grid Energy Institute, China Efficient Behavior Targeting Using SVM Ensemble.
Feature Engineering Studio March 1, Let’s start by discussing the HW.
Data Mining Status and Risks Dr. Gregory Newby UNC-Chapel Hill
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Research methods revision The next couple of lessons will be focused on recapping and practicing exam questions on the following parts of the specification:
Introduction to Science: The Scientific Method Courtesy of: Omega Science.
Traffic Simulation L2 – Introduction to simulation Ing. Ondřej Přibyl, Ph.D.
AP CSP: Data and Trends.
How do Web Applications Work?
Information Systems in Organizations
AP CSP: Cleaning Data & Creating Summary Tables
Chapter 16: Sample Size “See what kind of love the Father has given to us, that we should be called children of God; and so we are. The reason why the.
BSc Computing and Information Systems Module: M2X8630 Research and Development Methods Introduction to Research Methods.
Programming & Scratch.
AP CSP: Data Assumptions & Good and Bad Data Visualizations
Debugging Intermittent Issues
Introduction to Sociology
Preface to the special issue on context-aware recommender systems
TYPES OF RESEARCH Chapter 1.
CS701 SOFTWARE ENGINEERING
Check Your Assumptions
Vocabulary byte - The technical term for 8 bits of data.
Hypothesis-Based Science
Cross-cutting concepts in science
Reviving the Essay Week 4
Capstone Project Pitch
Applied Statistical Analysis
How do we know things? The Scientific Method
Chapter 12: Automated data collection methods
The Scientific Method: Focus questions
A logical and systematic problem solving process
Module 5: Data Cleaning and Building Reports
Identifying Confusion from Eye-Tracking Data
On your whiteboard: What is innatism? Give two examples to support it
The Scientific Method.
What Is Science? Read the lesson title aloud to students.
Computer Science Testing.
4. Computational Problem Solving
Studying politics scientifically
Finding Trends with Visualizations
The Second One Hundred Sight Words
THE NATURE OF SCIENCE.
Psych 231: Research Methods in Psychology
Core Methods in Educational Data Mining
Psych 231: Research Methods in Psychology
The Nature of Science.
S.A.S. Science, Art & Spelling Night April 20th
MIS2502: Data Analytics Introduction to Advanced Analytics and R
MIS 5302 Managing Technology and Systems Week 3
Interleaved Evaluation for Retrospective Summarization and
Chapter 1 The Science of Biology
Unconstrained Endpoint Profiling (Googling the Internet)‏
The way we make reasoning
A logical and systematic problem solving process
Latent Semantic Analysis
A logical and systematic problem solving process
Presentation transcript:

flu and big data Week 10.2

This doesn’t even include the internet yet! What is big data? This doesn’t even include the internet yet!

Big data Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Researchers typically use R, even though SPSS can handle 2 billion cases (rows). Why? It’s because SPSS is too restrictive; R has many packages. Python is also often used to mine data.

Case in point: Facebook User eXperience team (UX): They hire psychologists who are competent with R; additional advantage if you know Python. No SPSS, SAS, STATA. Why do you think a psychology major is valued in Facebook? What is Facebook studying that would require behavioural science? Again: Think beyond traditional disciplines in psychology. The world is bigger than that! https://research.fb.com/category/human-computer-interaction-and-ux/

Google Correlate Google Correlate is a tool on Google Trends which enables you to find queries with a similar pattern to a target data series. The target can either be a real-world trend that you provide (e.g., a data set of event counts over time) or a query that you enter. (https://www.google.com/trends/correlate/faq) https://www.google.com/trends/correlate See also: Google Trends or Google Insights

Google Trends Travel ban (searched on 31 Oct) President Trump vs. Kim Jong-un (searched on 31 Oct)

Google Trends Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. Likewise a score of 0 means the term was less than 1% as popular as the peak. Trump Impeachment

Behavioural “residue” What is a residue? In chemistry, a residue is a substance left behind (e.g., in a filter paper) after a process has been performed (e.g., filtration). Think about what happens in a sneeze. What is produced? The sound of sneezing Saliva Mucous Germs Tears in eyes

Behavioural “residue” Behavioral residue is whatever that is left behind after we have performed our action The key word here is “left behind”. That means you are not directly observing the person’s actions, but inferring something about the person based on ‘clues’ that still exist. For example, leaving behind a trail of destruction (flipped tables, burnt houses, etc.) is a behavioural residue of aggression

Origins of behavioural residue Originally conceptualized by personality psychologist, Sam Gosling, to study personality traces. How do you know if someone is ____X______, without actually observing this person in action? Where X = Nazi supporter Conscientiousness Having flu Having fever HIV+ Gosling (2009). Snoop: What your stuff says about you. New York: Basic Books

Would someone explain the logic of the arguments (in terms of behavioural residue) found in the paper?

Here is the catch Premise 1: Because behavioural residue is an inference. Premise 2: Inferences are probabilistic by nature. Conclusion: Therefore inferences can be wrong. Agree?

Here is what I did not tell you… 2009: Google Flu missed the H1N1 pandemic (a nonseasonal flu) Then, Google Flu updated its algorithm 2013: Google Flu overshot its predicted flu rates by 100% Lazer et al. (2014). The parable of Google Flu: Traps in big data analysis. Science.

What went wrong? 1. Big data hubris: The implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis. In other words: Quantity ≠ Quality

What makes good data? Reliability and validity These are fundamental methodological issues that you must never ignore!

Reliability and validity Is the measurement capturing the construct of interest (and nothing else)? Is the measurement stable and comparable across people and over time? Are measurement errors systematic – and how would you know if there is an error?

What went wrong? 2. Algorithm dynamics Changes made by engineers to improve commercial service and by consumers in using that service

Relevance to behavioural change More accurate data about prevalence rates do allow for more effective/efficient life-saving interventions. The challenge Keep the faith in big data… But develop better algorithms Be aware that quantity ≠ quality Science is always a work in progress.