flu and big data Week 10.2
This doesn’t even include the internet yet! What is big data? This doesn’t even include the internet yet!
Big data Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Researchers typically use R, even though SPSS can handle 2 billion cases (rows). Why? It’s because SPSS is too restrictive; R has many packages. Python is also often used to mine data.
Case in point: Facebook User eXperience team (UX): They hire psychologists who are competent with R; additional advantage if you know Python. No SPSS, SAS, STATA. Why do you think a psychology major is valued in Facebook? What is Facebook studying that would require behavioural science? Again: Think beyond traditional disciplines in psychology. The world is bigger than that! https://research.fb.com/category/human-computer-interaction-and-ux/
Google Correlate Google Correlate is a tool on Google Trends which enables you to find queries with a similar pattern to a target data series. The target can either be a real-world trend that you provide (e.g., a data set of event counts over time) or a query that you enter. (https://www.google.com/trends/correlate/faq) https://www.google.com/trends/correlate See also: Google Trends or Google Insights
Google Trends Travel ban (searched on 31 Oct) President Trump vs. Kim Jong-un (searched on 31 Oct)
Google Trends Numbers represent search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. Likewise a score of 0 means the term was less than 1% as popular as the peak. Trump Impeachment
Behavioural “residue” What is a residue? In chemistry, a residue is a substance left behind (e.g., in a filter paper) after a process has been performed (e.g., filtration). Think about what happens in a sneeze. What is produced? The sound of sneezing Saliva Mucous Germs Tears in eyes
Behavioural “residue” Behavioral residue is whatever that is left behind after we have performed our action The key word here is “left behind”. That means you are not directly observing the person’s actions, but inferring something about the person based on ‘clues’ that still exist. For example, leaving behind a trail of destruction (flipped tables, burnt houses, etc.) is a behavioural residue of aggression
Origins of behavioural residue Originally conceptualized by personality psychologist, Sam Gosling, to study personality traces. How do you know if someone is ____X______, without actually observing this person in action? Where X = Nazi supporter Conscientiousness Having flu Having fever HIV+ Gosling (2009). Snoop: What your stuff says about you. New York: Basic Books
Would someone explain the logic of the arguments (in terms of behavioural residue) found in the paper?
Here is the catch Premise 1: Because behavioural residue is an inference. Premise 2: Inferences are probabilistic by nature. Conclusion: Therefore inferences can be wrong. Agree?
Here is what I did not tell you… 2009: Google Flu missed the H1N1 pandemic (a nonseasonal flu) Then, Google Flu updated its algorithm 2013: Google Flu overshot its predicted flu rates by 100% Lazer et al. (2014). The parable of Google Flu: Traps in big data analysis. Science.
What went wrong? 1. Big data hubris: The implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis. In other words: Quantity ≠ Quality
What makes good data? Reliability and validity These are fundamental methodological issues that you must never ignore!
Reliability and validity Is the measurement capturing the construct of interest (and nothing else)? Is the measurement stable and comparable across people and over time? Are measurement errors systematic – and how would you know if there is an error?
What went wrong? 2. Algorithm dynamics Changes made by engineers to improve commercial service and by consumers in using that service
Relevance to behavioural change More accurate data about prevalence rates do allow for more effective/efficient life-saving interventions. The challenge Keep the faith in big data… But develop better algorithms Be aware that quantity ≠ quality Science is always a work in progress.