Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.

Similar presentations


Presentation on theme: "CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU."— Presentation transcript:

1 CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU

2 CMU SCS (c) 2015, C. Faloutsos 2 Graphs - why should we care? >$10B; ~1B users CMU Cylab

3 CMU SCS (c) 2015, C. Faloutsos 3 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] CMU Cylab

4 CMU SCS (c) 2015, C. Faloutsos 4 Graphs - why should we care? web-log (‘blog’) news propagation computer network security: email/IP traffic and anomaly detection Recommendation systems.... Many-to-many db relationship -> graph CMU Cylab

5 CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors CMU Cylab(c) 2015, C. Faloutsos 5 time source destination

6 CMU SCS (c) 2015, C. Faloutsos 6 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns & fraud detection Part#2: time-evolving graphs; tensors Conclusions CMU Cylab

7 CMU SCS CMU Cylab(c) 2015, C. Faloutsos 7 Part 1: Anomalies & fraud detection

8 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 8 CMU Cylab 8 (c) 2015, C. Faloutsos ?? ?

9 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 9 CMU Cylab 9 (c) 2015, C. Faloutsos

10 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 10 CMU Cylab 10 (c) 2015, C. Faloutsos

11 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 11 CMU Cylab 11 (c) 2015, C. Faloutsos

12 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 12 CMU Cylab 12 (c) 2015, C. Faloutsos

13 CMU SCS (c) 2015, C. Faloutsos 13 Roadmap Introduction – Motivation Part#1: Anomaly / fraud detection –CopyCatch –Malware & Belief Propagation Part#2: time-evolving graphs; tensors Conclusions CMU Cylab

14 CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products CMU Cylab(c) 2015, C. Faloutsos 14 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.

15 CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products CMU Cylab(c) 2015, C. Faloutsos 15 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013. Likes

16 CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes CMU Cylab 16 (c) 2015, C. Faloutsos

17 CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes CMU Cylab 17 (c) 2015, C. Faloutsos

18 CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Suspicious Lockstep Behavior Likes CMU Cylab 18 (c) 2015, C. Faloutsos

19 CMU SCS MapReduce Overview ▪ Use Hadoop to search for many clusters in parallel: 1. Start with randomly seed 2. Update set of Pages and center Like times for each cluster 3. Repeat until convergence Likes CMU Cylab 19 (c) 2015, C. Faloutsos

20 CMU SCS Deployment at Facebook ▪ CopyCatch runs regularly (along with many other security mechanisms, and a large Site Integrity team) 3 months of CopyCatch @ Facebook #users caught time CMU Cylab 20 (c) 2015, C. Faloutsos

21 CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Fake acct CMU Cylab 21 (c) 2015, C. Faloutsos

22 CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Most clusters (77%) come from real but compromised users Fake acct CMU Cylab 22 (c) 2015, C. Faloutsos

23 CMU SCS (c) 2015, C. Faloutsos 23 Roadmap Introduction – Motivation Part#1: Anomaly / fraud detection –CopyCatch –Malware & Belief Propagation Part#2: time-evolving graphs; tensors Conclusions CMU Cylab

24 CMU SCS Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept Polonium: Tera-Scale Graph Mining and Inference for Malware Detection PATENT PENDING SDM 2011, Mesa, Arizona

25 CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges CMU Cylab 25 (c) 2015, C. Faloutsos Ç√Ç√ Ç√Ç√ Ç√Ç√

26 CMU SCS Polonium: Key Ideas Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware Use “guilt-by-association” (i.e., homophily) –E.g., files that appear on machines with many bad files are more likely to be bad Scalability: handles 37 billion-edge graph CMU Cylab 26 (c) 2015, C. Faloutsos

27 CMU SCS Polonium: One-Interaction Results 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified 27 Ideal CMU Cylab(c) 2015, C. Faloutsos False Positive Rate % of non-malware wrongly labeled as malware

28 CMU SCS CMU Cylab(c) 2015, C. Faloutsos 28 Part 2: Time evolving graphs; tensors

29 CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 29 smith johnson

30 CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 30

31 CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 31 Mon Tue

32 CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 32 callee caller time

33 CMU SCS Answer to both: tensor factorization Recall: (SVD) matrix factorization: finds blocks CMU Cylab(c) 2015, C. Faloutsos 33 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ ++

34 CMU SCS Answer to both: tensor factorization PARAFAC decomposition CMU Cylab(c) 2015, C. Faloutsos 34 = + + subject object verb politicians artistsathletes

35 CMU SCS Answer: tensor factorization PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days CMU Cylab(c) 2015, C. Faloutsos 35 = + + caller callee time ??

36 CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller5 receivers4 days of activity CMU Cylab 36 (c) 2015, C. Faloutsos =

37 CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! CMU Cylab 37 (c) 2015, C. Faloutsos = 1 caller5 receivers4 days of activity

38 CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! CMU Cylab 38 (c) 2015, C. Faloutsos = Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.

39 CMU SCS (c) 2015, C. Faloutsos 39 Cast Akoglu, Leman Chau, Polo Kang, U Prakash, Aditya CMU Cylab Koutra, Danai Beutel, Alex Papalexakis, Vagelis Shah, Neil Lee, Jay Yoon Araujo, Miguel

40 CMU SCS (c) 2015, C. Faloutsos 40 CONCLUSION#1 – Big data Large datasets reveal patterns/outliers that are invisible otherwise CMU Cylab

41 CMU SCS (c) 2015, C. Faloutsos 41 CONCLUSION#2 – tensors powerful tool CMU Cylab = 1 caller5 receivers4 days of activity

42 CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity CMU Cylab(c) 2015, C. Faloutsos 42 = 1 caller5 receivers 4 days of activity

43 CMU SCS Cross-disciplinarity CMU Cylab(c) 2015, C. Faloutsos 43 = 1 caller5 receivers 4 days of activity Thank you!


Download ppt "CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU."

Similar presentations


Ads by Google