CMU SCS Big (graph) data analytics Christos Faloutsos CMU
CMU SCS CMU SCS IC '14C. Faloutsos2 CONGRATULATIONS!
CMU SCS CMU SCS IC '14C. Faloutsos3 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions
CMU SCS CMU SCS IC '14C. Faloutsos4 Q+A Are you recruiting? How many? How many do you have? How frequently you meet them? What is your advising style? How do you feel about summer internships?
CMU SCS CMU SCS IC '14C. Faloutsos5 Q+A Are you recruiting? How many? How many do you have? How frequently you meet them? What is your advising style? How do you feel about summer internships? 1 or 2 6 (+5pdocs) 1/week results Yes/Maybe (FB, MSR, IBM, ++)
CMU SCS CMU SCS IC '14C. Faloutsos6 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions
CMU SCS CMU SCS IC '14C. Faloutsos7 Motivation Data mining: ~ find patterns (rules, outliers) How do real graphs look like? Anomalies? Time series / Monitoring PA, NY, …
CMU SCS CMU SCS IC '14C. Faloutsos8 Graphs - why should we care?
CMU SCS C. Faloutsos9 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] ~1B users $10-$100B revenue CMU SCS IC '14
CMU SCS CMU SCS IC '14C. Faloutsos10 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions
CMU SCS NELL & concepts (=groups) Predicates (subject, verb, object) in knowledge base “Barack Obama is the president of U.S.” “Eric Clapton plays guitar” (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M CMU SCS IC '14C. Faloutsos Tom Mitchell CMU/CS-MLD 11 Vagelis Papalexakis CMU-CS
CMU SCS Answer : tensor factorization Recall: (SVD) matrix factorization: finds blocks CMU SCS IC '14C. Faloutsos12 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ ++
CMU SCS PARAFAC decomposition CMU SCS IC '14C. Faloutsos13 = + + subject object verb politicians artistsathletes Answer : tensor factorization
CMU SCS PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days CMU SCS IC '14C. Faloutsos14 = + + caller callee time ?? Answer : tensor factorization
CMU SCS Concept Discovery Concept Discovery in Knowledge Base CMU SCS IC '14C. Faloutsos15
CMU SCS Concept Discovery Concept Discovery in Knowledge Base CMU SCS IC '14C. Faloutsos16 NP1: Internet, file, data NP2: Protocol, software, suite
CMU SCS Neuro-semantics Brain Scan Data * 9 persons 60 nouns Questions 218 questions ‘is it alive?’, ‘can you eat it?’ CMU SCS IC '1417C. Faloutsos *Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,
CMU SCS Neuro-semantics Brain Scan Data * 9 persons 60 nouns Questions 218 questions ‘is it alive?’, ‘can you eat it?’ CMU SCS IC '1418C. Faloutsos Patterns?
CMU SCS Neuro-semantics Brain Scan Data * 9 persons 60 nouns Questions 218 questions ‘is it alive?’, ‘can you eat it?’ … airplane dog persons nouns questions voxels CMU SCS IC '1419C. Faloutsos Patterns?
CMU SCS Neuro-semantics 20CMU SCS IC '14C. Faloutsos =
CMU SCS Neuro-semantics 21CMU SCS IC '14C. Faloutsos Small items -> Premotor cortex =
CMU SCS Neuro-semantics 22CMU SCS IC '14C. Faloutsos Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014 Small items -> Premotor cortex
CMU SCS CMU SCS IC '14C. Faloutsos23 Scalability Google: > 450,000 processors in clusters of ~2000 processors each [ Barroso+, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003 ] Yahoo: 5Pb of data [Fayyad, KDD’07] Google-NY, Aug’14: ‘graph with 1T edges, 300B nodes’ Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone)
CMU SCS CMU SCS IC '14C. Faloutsos24 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly/fraud detection Conclusions
CMU SCS App-store fraud Opinion Fraud Detection in Online Reviews using Network Effects Leman Akoglu, Rishi Chandy, CF ICWSM’13 CMU SCS IC '14C. Faloutsos25 (NSF grant, with Alex Beutel)
CMU SCS Problem Given –user-product review network –review sign (+/-) Classify –objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’ No side data! (e.g., timestamp, review text) CMU SCS IC '14C. Faloutsos26
CMU SCS Formulation: BP UserProduct honestbad honestgood CMU SCS IC '14C. Faloutsos27 – + Before After
CMU SCS Top scorers CMU SCS IC '14C. Faloutsos28 + positive (4-5) rating o negative (1-2) rating Users Products
CMU SCS Top scorers CMU SCS IC '14C. Faloutsos29 + positive (4-5) rating o negative (1-2) rating Users Products
CMU SCS ‘Fraud-bot’ member reviews CMU SCS IC '14C. Faloutsos30 Same developer!Duplicated text! Same day activity!
CMU SCS CMU SCS IC '14C. Faloutsos31 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly/fraud detection Time series, monitoring / forecasting Conclusions
CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1432C. Faloutsos Yasuko Matsubara 50 states x 46 diseases
CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1433C. Faloutsos Prof. Yasuko Matsubara
CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1434C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?
CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1435C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?
CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1436C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?
CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1437C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?
CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1438C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?
CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1439C. Faloutsos Prof. Yasuko Matsubara from U. Pitt (epidemiology dept.) Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos Faloutsos, FUNNEL: Automatic Mining of Spatially Coevolving Epidemics, KDD 2014, New York City, NY, USA, Aug , 2014.
CMU SCS Open research questions Patterns/anomalies for time-evolving graphs (Call graph, 3M people x 6mo) Spot fraudsters in soc-net (eg., Twitter ‘$10 -> 1000 followers’) How is the human brain wired CMU SCS IC '14C. Faloutsos40
CMU SCS CMU SCS IC '14C. Faloutsos41 Contact info GHC 8019 Ph#: x ic/ FYI: Course: , Tu-Th 3:00-4:20 and, again WELCOME!