CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU
CMU SCS Thanks Saman Haqqi IBM-PBGH June 2013C. Faloutsos (CMU) 2
CMU SCS C. Faloutsos (CMU) 3 Roadmap Graph problems: –G1: Fraud detection – BP –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling –C1: spikeM model Conclusions IBM-PBGH June 2013
CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 4 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07]
CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 5 E-bay Fraud detection
CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 6 E-bay Fraud detection
CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 7 E-bay Fraud detection - NetProbe
CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 8 E-bay Fraud detection - NetProbe FAH F99% A H49% Compatibility matrix heterophily details
CMU SCS C. Faloutsos (CMU) 9 Background 1: Belief Propagation Equations [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10] IBM-PBGH June 2013 ~b i (x i )
CMU SCS C. Faloutsos (CMU) 10 Background 1: Belief Propagation Equations [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10] IBM-PBGH June 2013 ~b i (x i ) FAH F99% A H49%
CMU SCS Popular press And less desirable attention: from ‘Belgium police’ (‘copy of your code?’) IBM-PBGH June 2013C. Faloutsos (CMU) 11
CMU SCS C. Faloutsos (CMU) 12 Roadmap Graph problems: –G1: Fraud detection – BP Ebay Symantec Unification –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013
CMU SCS Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept Polonium: Tera-Scale Graph Mining and Inference for Malware Detection PATENT PENDING SDM 2011, Mesa, Arizona
CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges IBM-PBGH June C. Faloutsos (CMU)
CMU SCS Polonium: Key Ideas Use “guilt-by-association” (i.e., homophily) –E.g., files that appear on machines with many bad files are more likely to be bad Scalability: handles 37 billion-edge graph IBM-PBGH June C. Faloutsos (CMU)
CMU SCS Polonium: One-Interaction Results 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified False Positive Rate % of non-malware wrongly labeled as malware 16 Ideal IBM-PBGH June 2013C. Faloutsos (CMU)
CMU SCS C. Faloutsos (CMU) 17 Roadmap Graph problems: –G1: Fraud detection – BP Ebay Symantec Unification –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013
CMU SCS Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece
CMU SCS Problem Definition: G B A techniques C. Faloutsos (CMU) 19 Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) IBM-PBGH June 2013
CMU SCS Homophily and Heterophily C. Faloutsos (CMU) 20 Step 1 Step 2 All methods handle homophily NOT all methods handle heterophily BUT proposed method does! NOT all methods handle heterophily BUT proposed method does! IBM-PBGH June 2013
CMU SCS Are they related? RWR (Random Walk with Restarts) –google’s pageRank (‘if my friends are important, I’m important, too’) SSL (Semi-supervised learning) –minimize the differences among neighbors BP (Belief propagation) –send messages to neighbors, on what you believe about them IBM-PBGH June 2013C. Faloutsos (CMU) 21
CMU SCS Are they related? RWR (Random Walk with Restarts) –google’s pageRank (‘if my friends are important, I’m important, too’) SSL (Semi-supervised learning) –minimize the differences among neighbors BP (Belief propagation) –send messages to neighbors, on what you believe about them IBM-PBGH June 2013C. Faloutsos (CMU) 22 YES!
CMU SCS C. Faloutsos (CMU) 23 Background 1: Belief Propagation Equations [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10] IBM-PBGH June 2013
CMU SCS Correspondence of Methods C. Faloutsos (CMU) 24 MethodMatrixUnknow n known RWR[I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A)] ×x=y F A BP [I + a D - c ’ A] ×bhbh =φhφh ? ? d1 d2 d3 d1 d2 d3 final labels/ beliefs prior labels/ beliefs adjacency matrix IBM-PBGH June 2013
CMU SCS Correspondence of Methods C. Faloutsos (CMU) 25 MethodMatrixUnknow n known RWR[I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A)] ×x=y F A BP [I + a D - c ’ A] ×bhbh =φhφh ? ? d1 d2 d3 d1 d2 d3 final labels/ beliefs prior labels/ beliefs adjacency matrix IBM-PBGH June 2013 We know when it converges!
CMU SCS Results: Scalability C. Faloutsos (CMU) 26 F A BP is linear on the number of edges. # of edges (Kronecker graphs) runtime (min) IBM-PBGH June 2013
CMU SCS Results: Parallelism C. Faloutsos (CMU) 27 F A BP ~2x faster & wins/ties on accuracy. runtime (min) % accuracy IBM-PBGH June 2013
CMU SCS C. Faloutsos (CMU) 28 Conclusions for BP ‘NetProbe’, ‘Polonium’, and belief propagation: exploit network effects. FaBP: fast & accurate (and -> convergence conditions) IBM-PBGH June 2013
CMU SCS C. Faloutsos (CMU) 29 Roadmap Graph problems: –G1: Fraud detection – BP Ebay Symantec Unification –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013
CMU SCS EigenSpokes B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, June C. Faloutsos (CMU) 30 IBM-PBGH June 2013
CMU SCS EigenSpokes Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) 31 C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS EigenSpokes Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) 32 C. Faloutsos (CMU)IBM-PBGH June 2013 N N details
CMU SCS EigenSpokes Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) 33 C. Faloutsos (CMU)IBM-PBGH June 2013 N N details
CMU SCS EigenSpokes Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) 34 C. Faloutsos (CMU)IBM-PBGH June 2013 N N details
CMU SCS EigenSpokes Eigenvectors of adjacency matrix equivalent to singular vectors (symmetric, undirected graph) 35 C. Faloutsos (CMU)IBM-PBGH June 2013 N N details
CMU SCS EigenSpokes EE plot: Scatter plot of scores of u1 vs u2 One would expect –Many origin –A few scattered ~randomly C. Faloutsos (CMU) 36 u1 u2 IBM-PBGH June st Principal component 2 nd Principal component
CMU SCS EigenSpokes EE plot: Scatter plot of scores of u1 vs u2 One would expect –Many origin –A few scattered ~randomly C. Faloutsos (CMU) 37 u1 u2 90 o IBM-PBGH June 2013
CMU SCS EigenSpokes - pervasiveness Present in mobile social graph across time and space Patent citation graph 38 C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 39 C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 40 C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 41 C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 42 C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected So what? Extract nodes with high scores high connectivity Good “communities” spy plot of top 20 nodes 43 C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS Bipartite Communities! magnified bipartite community patents from same inventor(s) `cut-and-paste’ bibliography! 44 C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS (maybe, botnets?) Victim IPs? Botnet members? 45 C. Faloutsos (CMU)IBM-PBGH June 2013 Exploring it with Dr. Eric Mao (III-Taiwan)
CMU SCS C. Faloutsos (CMU) 46 Roadmap Graph problems: –G1: Fraud detection – BP –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013
CMU SCS GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos KDD’12 Evangelos Papalexakis Abhay Harpale IBM-PBGH June C. Faloutsos (CMU)
CMU SCS Background: Tensors Tensors (=multi-dimensional arrays) are everywhere –Hyperlinks &anchor text [Kolda+,05] URL 1 URL 2 Anchor Text Java C++ C# IBM-PBGH June C. Faloutsos (CMU) java
CMU SCS Background: Tensors Tensors (=multi-dimensional arrays) are everywhere –Sensor stream (time, location, type) –Predicates (subject, verb, object) in knowledge base “Barack Obama is president of U.S.” “Eric Clapton plays guitar” (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M IBM-PBGH June C. Faloutsos (CMU)
CMU SCS Background: Tensors Tensors (=multi-dimensional arrays) are everywhere –Sensor stream (time, location, type) –Predicates (subject, verb, object) in knowledge base IBM-PBGH June C. Faloutsos (CMU) IP-destination IP-source Time-stamp Anomaly Detection in Computer networks
CMU SCS Problem Definition How to decompose a billion-scale tensor? –Corresponds to SVD in 2D case IBM-PBGH June C. Faloutsos (CMU)
CMU SCS Problem Definition How to decompose a billion-scale tensor? –Corresponds to SVD in 2D case IBM-PBGH June C. Faloutsos (CMU) ‘Politicians’ ‘Artists’
CMU SCS Problem Definition Q1: Dominant concepts/topics? Q2: Find synonyms to a given noun phrase? (and how to scale up: |data| > RAM) (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M IBM-PBGH June C. Faloutsos (CMU)
CMU SCS Experiments GigaTensor solves 100x larger problem Number of nonzero = I / 50 (J) (I) (K) GigaTensor Tensor Toolbox Out of Memory 100x IBM-PBGH June C. Faloutsos (CMU)
CMU SCS A1: Concept Discovery Concept Discovery in Knowledge Base IBM-PBGH June C. Faloutsos (CMU)
CMU SCS A1: Concept Discovery IBM-PBGH June C. Faloutsos (CMU)
CMU SCS A2: Synonym Discovery IBM-PBGH June C. Faloutsos (CMU)
CMU SCS C. Faloutsos (CMU) 58 Roadmap Graph problems: –G1: Fraud detection – BP –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013
CMU SCS Rise and Fall Patterns of Information Diffusion: Model and Implications Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), B. Aditya Prakash (CMU), Lei Li (UCB), Christos Faloutsos (CMU) KDD’12, Beijing China
CMU SCS Meme (# of mentions in blogs) –short phrases Sourced from U.S. politics in “you can put lipstick on a pig” “yes we can” Rise and fall patterns in social media C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS Rise and fall patterns in social media 61 four classes on YouTube [Crane et al. ’08] six classes on Meme [Yang et al. ’11] C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS Rise and fall patterns in social media 62 Can we find a unifying model, which includes these patterns? four classes on YouTube [Crane et al. ’08] six classes on Meme [Yang et al. ’11] C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS Rise and fall patterns in social media 63 Answer: YES! We can represent all patterns by single model C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS 64 Main idea - SpikeM -1. Un-informed bloggers (uninformed about rumor) -2. External shock at time n b (e.g, breaking news) -3. Infection (word-of-mouth) Time n=0Time n=n b β C. Faloutsos (CMU)IBM-PBGH June 2013 Infectiveness of a blog-post at age n: -Strength of infection (quality of news) -Decay function Time n=n b +1
CMU SCS Un-informed bloggers (uninformed about rumor) -2. External shock at time n b (e.g, breaking news) -3. Infection (word-of-mouth) Time n=0Time n=n b β C. Faloutsos (CMU)IBM-PBGH June 2013 Infectiveness of a blog-post at age n: -Strength of infection (quality of news) -Decay function Time n=n b +1 Main idea - SpikeM
CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) slope J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF Response time (log) Prob(RT > x) (log) -1.5
CMU SCS SpikeM - with periodicity Full equation of SpikeM 67 Periodicity noon Peak 3am Dip Time n Bloggers change their activity over time (e.g., daily, weekly, yearly) Bloggers change their activity over time (e.g., daily, weekly, yearly) activity C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS Details Analysis – exponential rise and power-raw fall 68 Lin-log Log-log Rise-part SI -> exponential SpikeM -> exponential Rise-part SI -> exponential SpikeM -> exponential C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS Details Analysis – exponential rise and power-raw fall 69 Lin-log Log-log Fall-part SI -> exponential SpikeM -> power law Fall-part SI -> exponential SpikeM -> power law C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS Tail-part forecasts 70 SpikeM can capture tail part C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS “What-if” forecasting 71 e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date ? ? (1) First spike (2) Release date (3) Two weeks before release C. Faloutsos (CMU)IBM-PBGH June 2013 ? ?
CMU SCS “What-if” forecasting 72 SpikeM can forecast upcoming spikes (1) First spike (2) Release date (3) Two weeks before release C. Faloutsos (CMU)IBM-PBGH June 2013
CMU SCS Conclusions for spikes Exp rise; PL decay ‘spikeM’ captures all patterns, with a few parms –And can do extrapolation –And forecasting IBM-PBGH June 2013C. Faloutsos (CMU) 73
CMU SCS C. Faloutsos (CMU) 74 Roadmap Graph problems: –G1: Fraud detection – BP –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Future research Conclusions IBM-PBGH June 2013
CMU SCS Challenge#1: Time evolving networks / tensors Periodicities? Burstiness? What is ‘typical’ behavior of a node, over time Heterogeneous graphs (= nodes w/ attributes) IBM-PBGH June 2013C. Faloutsos (CMU) 75 …
CMU SCS Challenge #2: ‘Connectome’ – brain wiring IBM-PBGH June 2013C. Faloutsos (CMU) 76 Which neurons get activated by ‘bee’ How wiring evolves Modeling epilepsy N. Sidiropoulos George Karypis V. Papalexakis Tom Mitchell
CMU SCS C. Faloutsos (CMU) 77 Thanks IBM-PBGH June 2013 Thanks to: NSF IIS , IIS , CTA-INARC ; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab
CMU SCS C. Faloutsos (CMU) 78 Project info: PEGASUS IBM-PBGH June Results on large graphs: with Pegasus + hadoop + M45 Apache license Code, papers, manual, video Prof. U Kang Prof. Polo Chau
CMU SCS C. Faloutsos (CMU) 79 Cast Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Tong, Hanghang Prakash, Aditya IBM-PBGH June 2013 Koutra, Danai Beutel, Alex Papalexakis, Vagelis
CMU SCS C. Faloutsos (CMU) 80 References Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006) IBM-PBGH June 2013
CMU SCS C. Faloutsos (CMU) 81 References Christos Faloutsos, Tamara G. Kolda, Jimeng Sun: Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174 IBM-PBGH June 2013
CMU SCS References Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, Christos Faloutsos, "Rise and Fall Patterns of Information Diffusion: Model and Implications", KDD’12, pp. 6-14, Beijing, China, August 2012 IBM-PBGH June 2013C. Faloutsos (CMU) 82
CMU SCS References Jimeng Sun, Dacheng Tao, Christos Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: IBM-PBGH June 2013C. Faloutsos (CMU) 83
CMU SCS Overall Conclusions G1: fraud detection –BP: powerful method –FaBP: faster; equally accurate; known convergence G2: botnets -> Eigenspokes G3: Subject-Verb-Object -> Tensors/GigaTensor Spikes: ‘spikeM’ (exp rise; PL drop) IBM-PBGH June 2013C. Faloutsos (CMU) 84