CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.

Slides:



Advertisements
Similar presentations
CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
BiG-Align: Fast Bipartite Graph Alignment
CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
R and HDInsight in Microsoft Azure
UC Berkeley Online System Problem Detection by Mining Console Logs Wei Xu* Ling Huang † Armando Fox* David Patterson* Michael Jordan* *UC Berkeley † Intel.
SCALING SGD to Big dATA & Huge Models
CMU SCS : Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Extra: intro to hadoop C. Faloutsos.
Node labels as random variables prior belief observed neighbor potentials compatibility potentials Opinion Fraud Detection in Online Reviews using Network.
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
Detecting Fraudulent Personalities in Networks of Online Auctioneers Duen Horng (“Polo”) Chau Shashank Pandit Christos Faloutsos School of Computer Science.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
Processing and Analyzing Large log from Search Engine Meng Dou 13/9/2012.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.
CMU SCS Mining Billion-Node Graphs Christos Faloutsos CMU.
CMU SCS Mining Billion-Node Graphs: Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS Graph Mining - surprising patterns in real graphs Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
EVENT DETECTION IN TIME SERIES OF MOBILE COMMUNICATION GRAPHS
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
ParCube: Sparse Parallelizable Tensor Decompositions
Single-Pass Belief Propagation
CMU SCS Patterns, Anomalies, and Fraud Detection in Large Graphs Christos Faloutsos CMU.
Crowd Fraud Detection in Internet Advertising Tian Tian 1 Jun Zhu 1 Fen Xia 2 Xin Zhuang 2 Tong Zhang 2 Tsinghua University 1 Baidu Inc. 2 1.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks (WWW2013) BEUTEL, ALEX, WANHONG XU, VENKATESAN GURUSWAMI, CHRISTOPHER.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU.
A Peta-Scale Graph Mining System
Large Graph Mining: Power Tools and a Practitioner’s guide
15-826: Multimedia Databases and Data Mining
Anomaly detection in large graphs
15-826: Multimedia Databases and Data Mining
Anomaly detection in large graphs
PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM
A Network Science Approach to Fake News Detection on Social Media
Kijung Shin1 Mohammad Hammoud1
Part 1: Graph Mining – patterns
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Christos Faloutsos CMU
Graph and Tensor Mining for fun and profit
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
15-826: Multimedia Databases and Data Mining
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
GANG: Detecting Fraudulent Users in OSNs
Presentation transcript:

CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU

CMU SCS (c) 2015, C. Faloutsos 2 Graphs - why should we care? >$10B; ~1B users CMU Cylab

CMU SCS (c) 2015, C. Faloutsos 3 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] CMU Cylab

CMU SCS (c) 2015, C. Faloutsos 4 Graphs - why should we care? web-log (‘blog’) news propagation computer network security: /IP traffic and anomaly detection Recommendation systems.... Many-to-many db relationship -> graph CMU Cylab

CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors CMU Cylab(c) 2015, C. Faloutsos 5 time source destination

CMU SCS (c) 2015, C. Faloutsos 6 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns & fraud detection Part#2: time-evolving graphs; tensors Conclusions CMU Cylab

CMU SCS CMU Cylab(c) 2015, C. Faloutsos 7 Part 1: Anomalies & fraud detection

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 8 CMU Cylab 8 (c) 2015, C. Faloutsos ?? ?

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 9 CMU Cylab 9 (c) 2015, C. Faloutsos

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 10 CMU Cylab 10 (c) 2015, C. Faloutsos

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 11 CMU Cylab 11 (c) 2015, C. Faloutsos

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 12 CMU Cylab 12 (c) 2015, C. Faloutsos

CMU SCS (c) 2015, C. Faloutsos 13 Roadmap Introduction – Motivation Part#1: Anomaly / fraud detection –CopyCatch –Malware & Belief Propagation Part#2: time-evolving graphs; tensors Conclusions CMU Cylab

CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products CMU Cylab(c) 2015, C. Faloutsos 14 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.

CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products CMU Cylab(c) 2015, C. Faloutsos 15 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, Likes

CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes CMU Cylab 16 (c) 2015, C. Faloutsos

CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes CMU Cylab 17 (c) 2015, C. Faloutsos

CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Suspicious Lockstep Behavior Likes CMU Cylab 18 (c) 2015, C. Faloutsos

CMU SCS MapReduce Overview ▪ Use Hadoop to search for many clusters in parallel: 1. Start with randomly seed 2. Update set of Pages and center Like times for each cluster 3. Repeat until convergence Likes CMU Cylab 19 (c) 2015, C. Faloutsos

CMU SCS Deployment at Facebook ▪ CopyCatch runs regularly (along with many other security mechanisms, and a large Site Integrity team) 3 months of Facebook #users caught time CMU Cylab 20 (c) 2015, C. Faloutsos

CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Fake acct CMU Cylab 21 (c) 2015, C. Faloutsos

CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Most clusters (77%) come from real but compromised users Fake acct CMU Cylab 22 (c) 2015, C. Faloutsos

CMU SCS (c) 2015, C. Faloutsos 23 Roadmap Introduction – Motivation Part#1: Anomaly / fraud detection –CopyCatch –Malware & Belief Propagation Part#2: time-evolving graphs; tensors Conclusions CMU Cylab

CMU SCS Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept Polonium: Tera-Scale Graph Mining and Inference for Malware Detection PATENT PENDING SDM 2011, Mesa, Arizona

CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges CMU Cylab 25 (c) 2015, C. Faloutsos Ç√Ç√ Ç√Ç√ Ç√Ç√

CMU SCS Polonium: Key Ideas Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware Use “guilt-by-association” (i.e., homophily) –E.g., files that appear on machines with many bad files are more likely to be bad Scalability: handles 37 billion-edge graph CMU Cylab 26 (c) 2015, C. Faloutsos

CMU SCS Polonium: One-Interaction Results 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified 27 Ideal CMU Cylab(c) 2015, C. Faloutsos False Positive Rate % of non-malware wrongly labeled as malware

CMU SCS CMU Cylab(c) 2015, C. Faloutsos 28 Part 2: Time evolving graphs; tensors

CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 29 smith johnson

CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 30

CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 31 Mon Tue

CMU SCS Graphs over time -> tensors! Problem #2.1: –Given who calls whom, and when –Find patterns / anomalies CMU Cylab(c) 2015, C. Faloutsos 32 callee caller time

CMU SCS Answer to both: tensor factorization Recall: (SVD) matrix factorization: finds blocks CMU Cylab(c) 2015, C. Faloutsos 33 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ ++

CMU SCS Answer to both: tensor factorization PARAFAC decomposition CMU Cylab(c) 2015, C. Faloutsos 34 = + + subject object verb politicians artistsathletes

CMU SCS Answer: tensor factorization PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days CMU Cylab(c) 2015, C. Faloutsos 35 = + + caller callee time ??

CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller5 receivers4 days of activity CMU Cylab 36 (c) 2015, C. Faloutsos =

CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! CMU Cylab 37 (c) 2015, C. Faloutsos = 1 caller5 receivers4 days of activity

CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! CMU Cylab 38 (c) 2015, C. Faloutsos = Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.

CMU SCS (c) 2015, C. Faloutsos 39 Cast Akoglu, Leman Chau, Polo Kang, U Prakash, Aditya CMU Cylab Koutra, Danai Beutel, Alex Papalexakis, Vagelis Shah, Neil Lee, Jay Yoon Araujo, Miguel

CMU SCS (c) 2015, C. Faloutsos 40 CONCLUSION#1 – Big data Large datasets reveal patterns/outliers that are invisible otherwise CMU Cylab

CMU SCS (c) 2015, C. Faloutsos 41 CONCLUSION#2 – tensors powerful tool CMU Cylab = 1 caller5 receivers4 days of activity

CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity CMU Cylab(c) 2015, C. Faloutsos 42 = 1 caller5 receivers 4 days of activity

CMU SCS Cross-disciplinarity CMU Cylab(c) 2015, C. Faloutsos 43 = 1 caller5 receivers 4 days of activity Thank you!