Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS Big (graph) data analytics Christos Faloutsos CMU.

Similar presentations


Presentation on theme: "CMU SCS Big (graph) data analytics Christos Faloutsos CMU."— Presentation transcript:

1 CMU SCS Big (graph) data analytics Christos Faloutsos CMU

2 CMU SCS CMU SCS IC '14C. Faloutsos2 CONGRATULATIONS!

3 CMU SCS CMU SCS IC '14C. Faloutsos3 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions

4 CMU SCS CMU SCS IC '14C. Faloutsos4 Q+A Are you recruiting? How many? How many do you have? How frequently you meet them? What is your advising style? How do you feel about summer internships?

5 CMU SCS CMU SCS IC '14C. Faloutsos5 Q+A Are you recruiting? How many? How many do you have? How frequently you meet them? What is your advising style? How do you feel about summer internships? 1 or 2 6 (+5pdocs) 1/week results Yes/Maybe (FB, MSR, IBM, ++)

6 CMU SCS CMU SCS IC '14C. Faloutsos6 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions

7 CMU SCS CMU SCS IC '14C. Faloutsos7 Motivation Data mining: ~ find patterns (rules, outliers) How do real graphs look like? Anomalies? Time series / Monitoring Measles @ PA, NY, …

8 CMU SCS CMU SCS IC '14C. Faloutsos8 Graphs - why should we care?

9 CMU SCS C. Faloutsos9 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] ~1B users $10-$100B revenue CMU SCS IC '14

10 CMU SCS CMU SCS IC '14C. Faloutsos10 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly detection Conclusions

11 CMU SCS NELL & concepts (=groups) Predicates (subject, verb, object) in knowledge base “Barack Obama is the president of U.S.” “Eric Clapton plays guitar” (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M CMU SCS IC '14C. Faloutsos Tom Mitchell CMU/CS-MLD 11 Vagelis Papalexakis CMU-CS

12 CMU SCS Answer : tensor factorization Recall: (SVD) matrix factorization: finds blocks CMU SCS IC '14C. Faloutsos12 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ ++

13 CMU SCS PARAFAC decomposition CMU SCS IC '14C. Faloutsos13 = + + subject object verb politicians artistsathletes Answer : tensor factorization

14 CMU SCS PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days CMU SCS IC '14C. Faloutsos14 = + + caller callee time ?? Answer : tensor factorization

15 CMU SCS Concept Discovery Concept Discovery in Knowledge Base CMU SCS IC '14C. Faloutsos15

16 CMU SCS Concept Discovery Concept Discovery in Knowledge Base CMU SCS IC '14C. Faloutsos16 NP1: Internet, file, data NP2: Protocol, software, suite

17 CMU SCS Neuro-semantics Brain Scan Data * 9 persons 60 nouns Questions 218 questions ‘is it alive?’, ‘can you eat it?’ CMU SCS IC '1417C. Faloutsos *Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html

18 CMU SCS Neuro-semantics Brain Scan Data * 9 persons 60 nouns Questions 218 questions ‘is it alive?’, ‘can you eat it?’ CMU SCS IC '1418C. Faloutsos Patterns?

19 CMU SCS Neuro-semantics Brain Scan Data * 9 persons 60 nouns Questions 218 questions ‘is it alive?’, ‘can you eat it?’ … airplane dog persons nouns questions voxels CMU SCS IC '1419C. Faloutsos Patterns?

20 CMU SCS Neuro-semantics 20CMU SCS IC '14C. Faloutsos =

21 CMU SCS Neuro-semantics 21CMU SCS IC '14C. Faloutsos Small items -> Premotor cortex =

22 CMU SCS Neuro-semantics 22CMU SCS IC '14C. Faloutsos Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014 Small items -> Premotor cortex

23 CMU SCS CMU SCS IC '14C. Faloutsos23 Scalability Google: > 450,000 processors in clusters of ~2000 processors each [ Barroso+, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003 ] Yahoo: 5Pb of data [Fayyad, KDD’07] Google-NY, Aug’14: ‘graph with 1T edges, 300B nodes’ Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone) http://hadoop.apache.org/ http://hadoop.apache.org/

24 CMU SCS CMU SCS IC '14C. Faloutsos24 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly/fraud detection Conclusions

25 CMU SCS App-store fraud Opinion Fraud Detection in Online Reviews using Network Effects Leman Akoglu, Rishi Chandy, CF ICWSM’13 CMU SCS IC '14C. Faloutsos25 (NSF grant, with Alex Beutel)

26 CMU SCS Problem Given –user-product review network –review sign (+/-) Classify –objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’ No side data! (e.g., timestamp, review text) CMU SCS IC '14C. Faloutsos26

27 CMU SCS Formulation: BP UserProduct honestbad honestgood CMU SCS IC '14C. Faloutsos27 – + Before After

28 CMU SCS Top scorers CMU SCS IC '14C. Faloutsos28 + positive (4-5) rating o negative (1-2) rating Users Products

29 CMU SCS Top scorers CMU SCS IC '14C. Faloutsos29 + positive (4-5) rating o negative (1-2) rating Users Products

30 CMU SCS ‘Fraud-bot’ member reviews CMU SCS IC '14C. Faloutsos30 Same developer!Duplicated text! Same day activity!

31 CMU SCS CMU SCS IC '14C. Faloutsos31 Outline Q+A Problem definition / Motivation Graphs, tensors and brains Anomaly/fraud detection Time series, monitoring / forecasting Conclusions

32 CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1432C. Faloutsos Yasuko Matsubara 50 states x 46 diseases

33 CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1433C. Faloutsos Prof. Yasuko Matsubara

34 CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1434C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

35 CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1435C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

36 CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1436C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

37 CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1437C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

38 CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1438C. Faloutsos Prof. Yasuko Matsubara Flu? Measles? August? No periodicity?

39 CMU SCS ‘Tycho’ – epidemics analysis CMU SCS IC '1439C. Faloutsos Prof. Yasuko Matsubara https://www.tycho.pitt.edu/resources.php from U. Pitt (epidemiology dept.) Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos Faloutsos, FUNNEL: Automatic Mining of Spatially Coevolving Epidemics, KDD 2014, New York City, NY, USA, Aug. 24-27, 2014.

40 CMU SCS Open research questions Patterns/anomalies for time-evolving graphs (Call graph, 3M people x 6mo) Spot fraudsters in soc-net (eg., Twitter ‘$10 -> 1000 followers’) How is the human brain wired CMU SCS IC '14C. Faloutsos40

41 CMU SCS CMU SCS IC '14C. Faloutsos41 Contact info www.cs.cmu.edu/~christos GHC 8019 Ph#: x8.1457 www.cs.cmu.edu/~christos/TALKS/14- 09-ic/ FYI: Course: 15-826, Tu-Th 3:00-4:20 and, again WELCOME!


Download ppt "CMU SCS Big (graph) data analytics Christos Faloutsos CMU."

Similar presentations


Ads by Google