The Curse of Big Data in Mobile Analytics Dr. Guodong (Gordon) Gao M-CERSI Workshop, 9/11/2015
Mobile devices = Big Data User generated data Facebook ingests 500 terabytes of new data every day. Text messages, diet log, photos, videos, … System generated data App download and usage Gesture, touches Communications with other wearable devices Sensor-generated data 6 billion mobile phones Geo-location data, pedometer, heart beat sensor, and oxygen saturation sensor 2
Even more data 3
7
5
6
Causal inference Most the statistical methods try to measure correlations, not causation. For actionable knowledge, we need causation! Does the roster crowing cause the sun to rise? Confusing correlation with causality can be dangerous 7
8
9
Does Anne Hathaway help Warren Buffet get richer? 10
The curse of big data Heterogeneity in Treatment Effects (HTE) Sub-group analysis Helps answer: Which sub-group will benefit from this treatment? Should I prescribe the treatment to this particular patient? With dozens of variable, and thousands of combinations, we can define sub-group in many ways e.g. 10 variables, each with 3 levels, there are 3^10 = 59,049 combinations! We are doomed to find something statistically significant in certain sub-groups 11
Yet another curse of big data 12
Do not ignore the fundamentals Patient #11 13