CMU SCS Patterns, Anomalies, and Fraud Detection in Large Graphs Christos Faloutsos CMU.

Slides:



Advertisements
Similar presentations
CMU SCS Large Graph Mining - Patterns, tools and cascade analysis Christos Faloutsos CMU.
Advertisements

CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU.
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
BiG-Align: Fast Bipartite Graph Alignment
CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos.
CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Detecting Fraudulent Personalities in Networks of Online Auctioneers Duen Horng (“Polo”) Chau Shashank Pandit Christos Faloutsos School of Computer Science.
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Graph and stream mining Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #28: Graph mining - patterns Christos Faloutsos.
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Mining Billion-Node Graphs Christos Faloutsos CMU.
CMU SCS Mining Billion-Node Graphs: Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS Graph Mining - surprising patterns in real graphs Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
EVENT DETECTION IN TIME SERIES OF MOBILE COMMUNICATION GRAPHS
CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Application 2: Misstatement detection Problem: Given network and noisy domain knowledge about suspicious nodes (flags), which nodes are most risky? Cash.
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs Zhe Jin.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Single-Pass Belief Propagation
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks (WWW2013) BEUTEL, ALEX, WANHONG XU, VENKATESAN GURUSWAMI, CHRISTOPHER.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU.
Large Graph Mining: Power Tools and a Practitioner’s guide
15-826: Multimedia Databases and Data Mining
Anomaly detection in large graphs
Anomaly detection in large graphs
PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM
NetMine: Mining Tools for Large Graphs
Kijung Shin1 Mohammad Hammoud1
Part 2: Graph Mining Tools - SVD and ranking
Part 1: Graph Mining – patterns
15-826: Multimedia Databases and Data Mining
Graph and Tensor Mining for fun and profit
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Algorithms for Large Graph Mining
15-826: Multimedia Databases and Data Mining
GANG: Detecting Fraudulent Users in OSNs
Presentation transcript:

CMU SCS Patterns, Anomalies, and Fraud Detection in Large Graphs Christos Faloutsos CMU

CMU SCS (c) 2015, C. Faloutsos 2 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –tools Part#2: time-evolving graphs –Patterns –Tools Conclusions Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 3 Graphs - why should we care? Financial (user-account) Social networks computer network security: /IP traffic and anomaly detection Recommendation systems.... Many-to-many db relationship -> graph Eurlion, Beijing 2015

CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors Eurlion, Beijing 2015(c) 2015, C. Faloutsos 4 time user account

CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors Eurlion, Beijing 2015(c) 2015, C. Faloutsos 5 time Patterns anomalies user account

CMU SCS (c) 2015, C. Faloutsos 6 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –tools Part#2: time-evolving graphs –Patterns –Tools Conclusions Eurlion, Beijing 2015

CMU SCS Eurlion, Beijing 2015(c) 2015, C. Faloutsos 7 Part 1: Static graphs

CMU SCS (c) 2015, C. Faloutsos 8 Laws and patterns Q1: Are real graphs random? Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 9 Laws and patterns Q1: Are real graphs random? A1: NO!! –Diameter (‘6 degrees’; ‘Kevin Bacon’) –in- and out- degree distributions –other (surprising) patterns So, let’s look at the data Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 10 Solution# S.1 Power law in the degree distribution [Faloutsos x 3 SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 11 Solution# S.1 Power law in the degree distribution [Faloutsos x 3 SIGCOMM99; + Siganos] log(rank) log(degree) internet domains att.com ibm.com Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 12 Solution# S.2: Eigen Exponent E A2: power law in the eigenvalues of the adjacency matrix (‘ eig() ’) E = Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001 Eurlion, Beijing 2015 A x = x

CMU SCS (c) 2015, C. Faloutsos 13 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 14 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles –Friends of friends are friends Any patterns? –2x the friends, 2x the triangles ? Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 15 Triangle Law: #S.3 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n 1.6 triangles Eurlion, Beijing 2015

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 16 Eurlion, Beijing (c) 2015, C. Faloutsos ?? ?

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 17 Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 18 Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 19 Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 20 Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS (c) 2015, C. Faloutsos 21 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns (binary; weighted) –tools Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 22 Observations on weighted graphs? A: yes - even more ‘laws’! M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 23 Observation W.1: Fortification Q: How do the weights of nodes relate to degree? Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 24 Observation W.1: Fortification Double the checks, double the $ ? $10 $5 Eurlion, Beijing 2015 ‘Reagan’ ‘Clinton’ $7

CMU SCS Edges (# donors) In-weights ($) (c) 2015, C. Faloutsos 25 Observation W.1: fortification: Snapshot Power Law Weight: super-linear on in-degree exponent ‘iw’: 1.01 < iw < 1.26 Orgs-Candidates e.g. John Kerry, $10M received, from 1K donors $10 $5 Eurlion, Beijing 2015 Double the checks, double the $ ?

CMU SCS MORE Graph Patterns Eurlion, Beijing 2015(c) 2015, C. Faloutsos 26 ✔ ✔ ✔ RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.

CMU SCS MORE Graph Patterns Eurlion, Beijing 2015(c) 2015, C. Faloutsos 27 Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal) Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool. Graph Mining: Laws, Tools, and Case Studies

CMU SCS (c) 2015, C. Faloutsos 28 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –Tools Local Spectral Label propagation Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS Overview of tools Eurlion, Beijing 2015(c) 2015, C. Faloutsos 29 Binary / weights Time- evolving with class- labels Triangle-degreeB oddBallW eigenSpokesW fBoxW netProbe+WY tensorsWY ✔

CMU SCS OddBall: Spotting Ano malies in Weighted Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School of Computer Science PAKDD 2010, Hyderabad, India

CMU SCS Main idea For each node, extract ‘ego-net’ (=1-step-away neighbors) Extract features (#edges, total weight, etc etc) Compare with the rest of the population (c) 2015, C. Faloutsos 31 Eurlion, Beijing 2015

CMU SCS What is an egonet? ego 32 egonet (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS Selected Features  N i : number of neighbors (degree) of ego i  E i : number of edges in egonet i  W i : total weight of egonet i  λ w,i : principal eigenvalue of the weighted adjacency matrix of egonet I 33 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS Near-Clique/Star 34 Eurlion, Beijing 2015(c) 2015, C. Faloutsos

CMU SCS Near-Clique/Star 35 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS Near-Clique/Star 36 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS Andrew Lewis (director) Near-Clique/Star 37 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 38 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –Tools Local Spectral (EigenSpokes, CopyCatch) Label propagation Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS Overview of tools Eurlion, Beijing 2015(c) 2015, C. Faloutsos 39 Binary / weights Time- evolving with class- labels Triangle-degreeB oddBallW eigenSpokesW fBoxW netProbe+WY tensorsWY ✔ ✔

CMU SCS EigenSpokes B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, June (c) 2015, C. Faloutsos 40 Eurlion, Beijing 2015

CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 41 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 42 (c) 2015, C. FaloutsosEurlion, Beijing 2015 N N details

CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 43 (c) 2015, C. FaloutsosEurlion, Beijing 2015 N N details

CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 44 (c) 2015, C. FaloutsosEurlion, Beijing 2015 N N details

CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 45 (c) 2015, C. FaloutsosEurlion, Beijing 2015 N N details

CMU SCS EigenSpokes EE plot: Scatter plot of scores of u1 vs u2 One would expect –Many origin –A few scattered ~randomly (c) 2015, C. Faloutsos 46 u1 u2 Eurlion, Beijing st Principal component 2 nd Principal component

CMU SCS EigenSpokes EE plot: Scatter plot of scores of u1 vs u2 One would expect –Many origin –A few scattered ~randomly (c) 2015, C. Faloutsos 47 u1 u2 90 o Eurlion, Beijing 2015

CMU SCS EigenSpokes - pervasiveness Present in mobile social graph  across time and space Patent citation graph 48 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 49 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 50 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 51 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected So what?  Extract nodes with high scores  high connectivity  Good “communities” spy plot of top 20 nodes 52 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS Bipartite Communities! magnified bipartite community patents from same inventor(s) `cut-and-paste’ bibliography! 53 (c) 2015, C. FaloutsosEurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 54 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –Tools Local Spectral (eigenspokes, CopyCatch) Label propagation Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products Eurlion, Beijing 2015(c) 2015, C. Faloutsos 55 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.

CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products Eurlion, Beijing 2015(c) 2015, C. Faloutsos 56 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, Likes

CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Suspicious Lockstep Behavior Likes Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS MapReduce Overview ▪ Use Hadoop to search for many clusters in parallel: 1. Start with randomly seed 2. Update set of Pages and center Like times for each cluster 3. Repeat until convergence Likes Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS Deployment at Facebook ▪ CopyCatch runs regularly (along with many other security mechanisms, and a large Site Integrity team) 3 months of Facebook #users caught time Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Most clusters (77%) come from real but compromised users Fake acct Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS (c) 2015, C. Faloutsos 63 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –Tools Local Spectral (EigenSpokes, CopyCatch, fBox) Label propagation Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 64 Problem: Social Network Link Fraud Eurlion, Beijing 2015 Target: find “stealthy” attackers missed by other algorithms Clique Bipartite core 41.7M nodes 1.5B edges

CMU SCS (c) 2015, C. Faloutsos 65 Problem: Social Network Link Fraud Eurlion, Beijing 2015 Neil Shah, Alex Beutel, Brian Gallagher and Christos Faloutsos. Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective. ICDM 2014, Shenzhen, China. Target: find “stealthy” attackers missed by other algorithms Takeaway: use reconstruction error between true/latent representation!

CMU SCS Overview of tools Eurlion, Beijing 2015(c) 2015, C. Faloutsos 66 Binary / weights Time- evolving with class- labels Triangle-degreeB oddBallW eigenSpokesW fBoxW netProbe+WY tensorsWY ✔ ✔ ✔ ✔

CMU SCS (c) 2015, C. Faloutsos 67 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –Tools Local Spectral (eigenspokes, CopyCatch) Label propagation (NetProbe, Polonium, Snare) Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS Eurlion, Beijing 2015(c) 2015, C. Faloutsos 68 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07]

CMU SCS Eurlion, Beijing 2015(c) 2015, C. Faloutsos 69 E-bay Fraud detection

CMU SCS Eurlion, Beijing 2015(c) 2015, C. Faloutsos 70 E-bay Fraud detection

CMU SCS Eurlion, Beijing 2015(c) 2015, C. Faloutsos 71 E-bay Fraud detection - NetProbe

CMU SCS Popular press Eurlion, Beijing 2015(c) 2015, C. Faloutsos 72

CMU SCS (c) 2015, C. Faloutsos 73 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –Tools Local Spectral (eigenspokes, CopyCatch) Label propagation (NetProbe, Polonium, Snare) Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept Polonium: Tera-Scale Graph Mining and Inference for Malware Detection PATENT PENDING SDM 2011, Mesa, Arizona

CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS Polonium: Key Ideas Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware Use “guilt-by-association” (i.e., homophily) –E.g., files that appear on machines with many bad files are more likely to be bad Scalability: handles 37 billion-edge graph Eurlion, Beijing (c) 2015, C. Faloutsos

CMU SCS Polonium: One-Interaction Results 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified 77 Ideal Eurlion, Beijing 2015(c) 2015, C. Faloutsos False Positive Rate % of non-malware wrongly labeled as malware

CMU SCS (c) 2015, C. Faloutsos 78 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –Tools Local Spectral (eigenspokes, CopyCatch) Label propagation (NetProbe, Polonium, Snare) Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS Network Effect Tools: SNARE 79 Some accounts are sort-of-suspicious – how to combine weak signals? Before Eurlion, Beijing 2015(c) 2015, C. Faloutsos Inventory Acc. payable Revenue L.A.

CMU SCS Network Effect Tools: SNARE 80 Some accounts are sort-of-suspicious – how to combine weak signals? Before Eurlion, Beijing 2015(c) 2015, C. Faloutsos

CMU SCS Network Effect Tools: SNARE 81 A: Belief Propagation. Before Eurlion, Beijing 2015(c) 2015, C. Faloutsos

CMU SCS Network Effect Tools: SNARE 82 A: Belief Propagation. After Before Eurlion, Beijing 2015(c) 2015, C. Faloutsos Mary McGlohon, Stephen Bay, Markus G. Anderle, David M. Steier, Christos Faloutsos: SNARE: a link analytic system for graph labeling and risk detection. KDD 2009:

CMU SCS Network Effect Tools: SNARE 83 Produces improvement over simply using flags –Up to 6.5 lift –Improvement especially for low false positive rate True positive rate Results for accounts data (ROC Curve) Ideal SNARE Baseline (flags only) Eurlion, Beijing 2015(c) 2015, C. Faloutsos False positive rate

CMU SCS Network Effect Tools: SNARE 84 Accurate- Produces large improvement over simply using flags Flexible- Can be applied to other domains Scalable- One iteration BP runs in linear time (# edges) Robust- Works on large range of parameters Eurlion, Beijing 2015(c) 2015, C. Faloutsos

CMU SCS (c) 2015, C. Faloutsos 85 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –Tools Local Spectral (eigenspokes, CopyCatch) Label propagation (NetProbe,…, Unification) Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece

CMU SCS Problem Definition: G B A techniques (Guilt By Association) (c) 2015, C. Faloutsos 87 Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) Eurlion, Beijing 2015

CMU SCS Are they related? RWR (Random Walk with Restarts) –google’s pageRank (‘if my friends are important, I’m important, too’) SSL (Semi-supervised learning) –minimize the differences among neighbors BP (Belief propagation) –send messages to neighbors, on what you believe about them Eurlion, Beijing 2015(c) 2015, C. Faloutsos 88

CMU SCS Are they related? RWR (Random Walk with Restarts) –google’s pageRank (‘if my friends are important, I’m important, too’) SSL (Semi-supervised learning) –minimize the differences among neighbors BP (Belief propagation) –send messages to neighbors, on what you believe about them Eurlion, Beijing 2015(c) 2015, C. Faloutsos 89 YES!

CMU SCS Correspondence of Methods (c) 2015, C. Faloutsos 90 MethodMatrixUnknow n known RWR[I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A)] ×x=y F A BP [I + a D - c ’ A] ×bhbh =φhφh ? ? d1 d2 d3 d1 d2 d3 final labels/ beliefs prior labels/ beliefs adjacency matrix Eurlion, Beijing 2015 DETAILS

CMU SCS Results: Scalability (c) 2015, C. Faloutsos 91 F A BP is linear on the number of edges. # of edges runtime (min) Eurlion, Beijing 2015 DETAILS

CMU SCS Overview of tools Eurlion, Beijing 2015(c) 2015, C. Faloutsos 92 Binary / weights Time- evolving with class- labels Triangle-degreeB oddBallW eigenSpokesW fBoxW netProbe+WY tensorsWY ✔ ✔ ✔ ✔ ✔

CMU SCS Summary of Part#1 *many* patterns in real graphs –Power-laws everywhere –Long (and growing) list of tools for anomaly/fraud detection Eurlion, Beijing 2015(c) 2015, C. Faloutsos 93 Patterns anomalies fBoxNetProbe oddBall …

CMU SCS (c) 2015, C. Faloutsos 94 Roadmap Introduction – Motivation Part#1: Static graphs –Patterns –Tools Local Spectral (eigenspokes, CopyCatch) Label propagation (NetProbe, Polonium, Snare) Part#2: time-evolving graphs Conclusions Eurlion, Beijing 2015

CMU SCS Eurlion, Beijing 2015(c) 2015, C. Faloutsos 95 Part 2: Time evolving graphs

CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies Eurlion, Beijing 2015(c) 2015, C. Faloutsos 96 smith johnson

CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies Eurlion, Beijing 2015(c) 2015, C. Faloutsos 97

CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies Eurlion, Beijing 2015(c) 2015, C. Faloutsos 98 Mon Tue

CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies Eurlion, Beijing 2015(c) 2015, C. Faloutsos 99 callee caller time

CMU SCS Graphs over time -> tensors! Problem #2’: –Given customer-account-date –Find patterns / anomalies Eurlion, Beijing 2015(c) 2015, C. Faloutsos 100 account customer date MANY more settings, with >2 ‘modes’

CMU SCS Graphs over time -> tensors! Problem #2’’: –Given author-keyword-date –Find patterns / anomalies Eurlion, Beijing 2015(c) 2015, C. Faloutsos 101 keyword author date MANY more settings, with >2 ‘modes’

CMU SCS Graphs over time -> tensors! Problem #2’’’: –Given subject – verb – object facts –Find patterns / anomalies Eurlion, Beijing 2015(c) 2015, C. Faloutsos 102 object subject verb MANY more settings, with >2 ‘modes’

CMU SCS Graphs over time -> tensors! Problem #2’’’’: –Given –Find patterns / anomalies Eurlion, Beijing 2015(c) 2015, C. Faloutsos 103 mode2 mode1 mode3 MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)

CMU SCS Answer to all: tensor factorization Recall: (SVD) matrix factorization: finds blocks Eurlion, Beijing 2015(c) 2015, C. Faloutsos 104 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ + +

CMU SCS Answer to all: tensor factorization PARAFAC decomposition Eurlion, Beijing 2015(c) 2015, C. Faloutsos 105 = + + subject object verb politicians artistsathletes

CMU SCS Answer: tensor factorization PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days Eurlion, Beijing 2015(c) 2015, C. Faloutsos 106 = + + caller callee time ??

CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller5 receivers4 days of activity Eurlion, Beijing (c) 2015, C. Faloutsos =

CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller5 receivers4 days of activity Eurlion, Beijing (c) 2015, C. Faloutsos =

CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! Eurlion, Beijing (c) 2015, C. Faloutsos = Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.

CMU SCS (c) 2015, C. Faloutsos 110 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors Resources and Conclusions Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 111 Project info: PEGASUS Eurlion, Beijing Results on large graphs: with Pegasus + hadoop + M45 Apache license Code, papers, manual, video Prof. U Kang Prof. Polo Chau

CMU SCS (c) 2015, C. Faloutsos 112 Cast Akoglu, Leman Chau, Polo Kang, U Prakash, Aditya Eurlion, Beijing 2015 Koutra, Danai Beutel, Alex Papalexakis, Vagelis Shah, Neil Lee, Jay Yoon Araujo, Miguel

CMU SCS (c) 2015, C. Faloutsos 113 CONCLUSION#0 Patterns Anomalies Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 114 CONCLUSION#1 Many, surprising patterns in real graphs Eurlion, Beijing 2015 Rank-degree Degree-#triangles Degree-weight …

CMU SCS (c) 2015, C. Faloutsos 115 CONCLUSION#2 - tools Many, powerful tools Eurlion, Beijing 2015 fBox oddBall tensors …

CMU SCS Overview of tools Eurlion, Beijing 2015(c) 2015, C. Faloutsos 116 Binary / weights Time- evolving with class- labels Triangle-degreeB oddBallW eigenSpokesW fBoxW netProbe+WY tensorsWY

CMU SCS (c) 2015, C. Faloutsos 117 References D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool ED1V01Y201209DMK006 Eurlion, Beijing 2015

CMU SCS (c) 2015, C. Faloutsos 118 Many, powerful tools Eurlion, Beijing 2015 fBox oddBall tensors … Thank you!