Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU.

Similar presentations


Presentation on theme: "CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU."— Presentation transcript:

1 CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU

2 CMU SCS Thanks Saman Haqqi IBM-PBGH June 2013C. Faloutsos (CMU) 2

3 CMU SCS C. Faloutsos (CMU) 3 Roadmap Graph problems: –G1: Fraud detection – BP –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling –C1: spikeM model Conclusions IBM-PBGH June 2013

4 CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 4 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07]

5 CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 5 E-bay Fraud detection

6 CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 6 E-bay Fraud detection

7 CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 7 E-bay Fraud detection - NetProbe

8 CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 8 E-bay Fraud detection - NetProbe FAH F99% A H49% Compatibility matrix heterophily details

9 CMU SCS C. Faloutsos (CMU) 9 Background 1: Belief Propagation Equations [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10] IBM-PBGH June 2013 ~b i (x i )

10 CMU SCS C. Faloutsos (CMU) 10 Background 1: Belief Propagation Equations [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10] IBM-PBGH June 2013 ~b i (x i ) FAH F99% A H49%

11 CMU SCS Popular press And less desirable attention: E-mail from ‘Belgium police’ (‘copy of your code?’) IBM-PBGH June 2013C. Faloutsos (CMU) 11

12 CMU SCS C. Faloutsos (CMU) 12 Roadmap Graph problems: –G1: Fraud detection – BP Ebay Symantec Unification –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013

13 CMU SCS Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept Polonium: Tera-Scale Graph Mining and Inference for Malware Detection PATENT PENDING SDM 2011, Mesa, Arizona

14 CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges IBM-PBGH June 2013 14 C. Faloutsos (CMU)

15 CMU SCS Polonium: Key Ideas Use “guilt-by-association” (i.e., homophily) –E.g., files that appear on machines with many bad files are more likely to be bad Scalability: handles 37 billion-edge graph IBM-PBGH June 2013 15 C. Faloutsos (CMU)

16 CMU SCS Polonium: One-Interaction Results 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified False Positive Rate % of non-malware wrongly labeled as malware 16 Ideal IBM-PBGH June 2013C. Faloutsos (CMU)

17 CMU SCS C. Faloutsos (CMU) 17 Roadmap Graph problems: –G1: Fraud detection – BP Ebay Symantec Unification –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013

18 CMU SCS Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece

19 CMU SCS Problem Definition: G B A techniques C. Faloutsos (CMU) 19 Given: Graph; & few labeled nodes Find: labels of rest (assuming network effects) IBM-PBGH June 2013

20 CMU SCS Homophily and Heterophily C. Faloutsos (CMU) 20 Step 1 Step 2 All methods handle homophily NOT all methods handle heterophily BUT proposed method does! NOT all methods handle heterophily BUT proposed method does! IBM-PBGH June 2013

21 CMU SCS Are they related? RWR (Random Walk with Restarts) –google’s pageRank (‘if my friends are important, I’m important, too’) SSL (Semi-supervised learning) –minimize the differences among neighbors BP (Belief propagation) –send messages to neighbors, on what you believe about them IBM-PBGH June 2013C. Faloutsos (CMU) 21

22 CMU SCS Are they related? RWR (Random Walk with Restarts) –google’s pageRank (‘if my friends are important, I’m important, too’) SSL (Semi-supervised learning) –minimize the differences among neighbors BP (Belief propagation) –send messages to neighbors, on what you believe about them IBM-PBGH June 2013C. Faloutsos (CMU) 22 YES!

23 CMU SCS C. Faloutsos (CMU) 23 Background 1: Belief Propagation Equations [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10] IBM-PBGH June 2013

24 CMU SCS Correspondence of Methods C. Faloutsos (CMU) 24 MethodMatrixUnknow n known RWR[I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A)] ×x=y F A BP [I + a D - c ’ A] ×bhbh =φhφh 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 1 0 1 d1 d2 d3 d1 d2 d3 final labels/ beliefs prior labels/ beliefs adjacency matrix IBM-PBGH June 2013

25 CMU SCS Correspondence of Methods C. Faloutsos (CMU) 25 MethodMatrixUnknow n known RWR[I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A)] ×x=y F A BP [I + a D - c ’ A] ×bhbh =φhφh 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 1 0 1 d1 d2 d3 d1 d2 d3 final labels/ beliefs prior labels/ beliefs adjacency matrix IBM-PBGH June 2013 We know when it converges!

26 CMU SCS Results: Scalability C. Faloutsos (CMU) 26 F A BP is linear on the number of edges. # of edges (Kronecker graphs) runtime (min) IBM-PBGH June 2013

27 CMU SCS Results: Parallelism C. Faloutsos (CMU) 27 F A BP ~2x faster & wins/ties on accuracy. runtime (min) % accuracy IBM-PBGH June 2013

28 CMU SCS C. Faloutsos (CMU) 28 Conclusions for BP ‘NetProbe’, ‘Polonium’, and belief propagation: exploit network effects. FaBP: fast & accurate (and -> convergence conditions) IBM-PBGH June 2013

29 CMU SCS C. Faloutsos (CMU) 29 Roadmap Graph problems: –G1: Fraud detection – BP Ebay Symantec Unification –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013

30 CMU SCS EigenSpokes B. Aditya Prakash, Mukund Seshadri, Ashwin Sridharan, Sridhar Machiraju and Christos Faloutsos: EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs, PAKDD 2010, Hyderabad, India, 21-24 June 2010. C. Faloutsos (CMU) 30 IBM-PBGH June 2013

31 CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 31 C. Faloutsos (CMU)IBM-PBGH June 2013

32 CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 32 C. Faloutsos (CMU)IBM-PBGH June 2013 N N details

33 CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 33 C. Faloutsos (CMU)IBM-PBGH June 2013 N N details

34 CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 34 C. Faloutsos (CMU)IBM-PBGH June 2013 N N details

35 CMU SCS EigenSpokes Eigenvectors of adjacency matrix  equivalent to singular vectors (symmetric, undirected graph) 35 C. Faloutsos (CMU)IBM-PBGH June 2013 N N details

36 CMU SCS EigenSpokes EE plot: Scatter plot of scores of u1 vs u2 One would expect –Many points @ origin –A few scattered ~randomly C. Faloutsos (CMU) 36 u1 u2 IBM-PBGH June 2013 1 st Principal component 2 nd Principal component

37 CMU SCS EigenSpokes EE plot: Scatter plot of scores of u1 vs u2 One would expect –Many points @ origin –A few scattered ~randomly C. Faloutsos (CMU) 37 u1 u2 90 o IBM-PBGH June 2013

38 CMU SCS EigenSpokes - pervasiveness Present in mobile social graph  across time and space Patent citation graph 38 C. Faloutsos (CMU)IBM-PBGH June 2013

39 CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 39 C. Faloutsos (CMU)IBM-PBGH June 2013

40 CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 40 C. Faloutsos (CMU)IBM-PBGH June 2013

41 CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 41 C. Faloutsos (CMU)IBM-PBGH June 2013

42 CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected 42 C. Faloutsos (CMU)IBM-PBGH June 2013

43 CMU SCS EigenSpokes - explanation Near-cliques, or near- bipartite-cores, loosely connected So what?  Extract nodes with high scores  high connectivity  Good “communities” spy plot of top 20 nodes 43 C. Faloutsos (CMU)IBM-PBGH June 2013

44 CMU SCS Bipartite Communities! magnified bipartite community patents from same inventor(s) `cut-and-paste’ bibliography! 44 C. Faloutsos (CMU)IBM-PBGH June 2013

45 CMU SCS (maybe, botnets?) Victim IPs? Botnet members? 45 C. Faloutsos (CMU)IBM-PBGH June 2013 Exploring it with Dr. Eric Mao (III-Taiwan)

46 CMU SCS C. Faloutsos (CMU) 46 Roadmap Graph problems: –G1: Fraud detection – BP –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013

47 CMU SCS GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos KDD’12 Evangelos Papalexakis Abhay Harpale IBM-PBGH June 2013 47 C. Faloutsos (CMU)

48 CMU SCS Background: Tensors Tensors (=multi-dimensional arrays) are everywhere –Hyperlinks &anchor text [Kolda+,05] URL 1 URL 2 Anchor Text Java C++ C# 1 1 1 1 1 1 1 IBM-PBGH June 2013 48 C. Faloutsos (CMU) java

49 CMU SCS Background: Tensors Tensors (=multi-dimensional arrays) are everywhere –Sensor stream (time, location, type) –Predicates (subject, verb, object) in knowledge base “Barack Obama is president of U.S.” “Eric Clapton plays guitar” (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M IBM-PBGH June 2013 49 C. Faloutsos (CMU)

50 CMU SCS Background: Tensors Tensors (=multi-dimensional arrays) are everywhere –Sensor stream (time, location, type) –Predicates (subject, verb, object) in knowledge base IBM-PBGH June 2013 50 C. Faloutsos (CMU) IP-destination IP-source Time-stamp Anomaly Detection in Computer networks

51 CMU SCS Problem Definition How to decompose a billion-scale tensor? –Corresponds to SVD in 2D case IBM-PBGH June 2013 51 C. Faloutsos (CMU)

52 CMU SCS Problem Definition How to decompose a billion-scale tensor? –Corresponds to SVD in 2D case IBM-PBGH June 2013 52 C. Faloutsos (CMU) ‘Politicians’ ‘Artists’

53 CMU SCS Problem Definition  Q1: Dominant concepts/topics?  Q2: Find synonyms to a given noun phrase?  (and how to scale up: |data| > RAM) (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M IBM-PBGH June 2013 53 C. Faloutsos (CMU)

54 CMU SCS Experiments GigaTensor solves 100x larger problem Number of nonzero = I / 50 (J) (I) (K) GigaTensor Tensor Toolbox Out of Memory 100x IBM-PBGH June 2013 54 C. Faloutsos (CMU)

55 CMU SCS A1: Concept Discovery Concept Discovery in Knowledge Base IBM-PBGH June 2013 55 C. Faloutsos (CMU)

56 CMU SCS A1: Concept Discovery IBM-PBGH June 2013 56 C. Faloutsos (CMU)

57 CMU SCS A2: Synonym Discovery IBM-PBGH June 2013 57 C. Faloutsos (CMU)

58 CMU SCS C. Faloutsos (CMU) 58 Roadmap Graph problems: –G1: Fraud detection – BP –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Conclusions IBM-PBGH June 2013

59 CMU SCS Rise and Fall Patterns of Information Diffusion: Model and Implications Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), B. Aditya Prakash (CMU), Lei Li (UCB), Christos Faloutsos (CMU) KDD’12, Beijing China

60 CMU SCS Meme (# of mentions in blogs) –short phrases Sourced from U.S. politics in 2008 60 “you can put lipstick on a pig” “yes we can” Rise and fall patterns in social media C. Faloutsos (CMU)IBM-PBGH June 2013

61 CMU SCS Rise and fall patterns in social media 61 four classes on YouTube [Crane et al. ’08] six classes on Meme [Yang et al. ’11] C. Faloutsos (CMU)IBM-PBGH June 2013

62 CMU SCS Rise and fall patterns in social media 62 Can we find a unifying model, which includes these patterns? four classes on YouTube [Crane et al. ’08] six classes on Meme [Yang et al. ’11] C. Faloutsos (CMU)IBM-PBGH June 2013

63 CMU SCS Rise and fall patterns in social media 63 Answer: YES! We can represent all patterns by single model C. Faloutsos (CMU)IBM-PBGH June 2013

64 CMU SCS 64 Main idea - SpikeM -1. Un-informed bloggers (uninformed about rumor) -2. External shock at time n b (e.g, breaking news) -3. Infection (word-of-mouth) Time n=0Time n=n b β C. Faloutsos (CMU)IBM-PBGH June 2013 Infectiveness of a blog-post at age n: -Strength of infection (quality of news) -Decay function Time n=n b +1

65 CMU SCS 65 -1. Un-informed bloggers (uninformed about rumor) -2. External shock at time n b (e.g, breaking news) -3. Infection (word-of-mouth) Time n=0Time n=n b β C. Faloutsos (CMU)IBM-PBGH June 2013 Infectiveness of a blog-post at age n: -Strength of infection (quality of news) -Decay function Time n=n b +1 Main idea - SpikeM

66 CMU SCS IBM-PBGH June 2013C. Faloutsos (CMU) 66 -1.5 slope J. G. Oliveira & A.-L. Barabási Human Dynamics: The Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005). [PDF]PDF Response time (log) Prob(RT > x) (log) -1.5

67 CMU SCS SpikeM - with periodicity Full equation of SpikeM 67 Periodicity noon Peak 3am Dip Time n Bloggers change their activity over time (e.g., daily, weekly, yearly) Bloggers change their activity over time (e.g., daily, weekly, yearly) activity C. Faloutsos (CMU)IBM-PBGH June 2013

68 CMU SCS Details Analysis – exponential rise and power-raw fall 68 Lin-log Log-log Rise-part SI -> exponential SpikeM -> exponential Rise-part SI -> exponential SpikeM -> exponential C. Faloutsos (CMU)IBM-PBGH June 2013

69 CMU SCS Details Analysis – exponential rise and power-raw fall 69 Lin-log Log-log Fall-part SI -> exponential SpikeM -> power law Fall-part SI -> exponential SpikeM -> power law C. Faloutsos (CMU)IBM-PBGH June 2013

70 CMU SCS Tail-part forecasts 70 SpikeM can capture tail part C. Faloutsos (CMU)IBM-PBGH June 2013

71 CMU SCS “What-if” forecasting 71 e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date ? ? (1) First spike (2) Release date (3) Two weeks before release C. Faloutsos (CMU)IBM-PBGH June 2013 ? ?

72 CMU SCS “What-if” forecasting 72 SpikeM can forecast upcoming spikes (1) First spike (2) Release date (3) Two weeks before release C. Faloutsos (CMU)IBM-PBGH June 2013

73 CMU SCS Conclusions for spikes Exp rise; PL decay ‘spikeM’ captures all patterns, with a few parms –And can do extrapolation –And forecasting IBM-PBGH June 2013C. Faloutsos (CMU) 73

74 CMU SCS C. Faloutsos (CMU) 74 Roadmap Graph problems: –G1: Fraud detection – BP –G2: Botnet detection – spectral –G3: Beyond graphs: tensors and ``NELL’’ Influence propagation and spike modeling Future research Conclusions IBM-PBGH June 2013

75 CMU SCS Challenge#1: Time evolving networks / tensors Periodicities? Burstiness? What is ‘typical’ behavior of a node, over time Heterogeneous graphs (= nodes w/ attributes) IBM-PBGH June 2013C. Faloutsos (CMU) 75 …

76 CMU SCS Challenge #2: ‘Connectome’ – brain wiring IBM-PBGH June 2013C. Faloutsos (CMU) 76 Which neurons get activated by ‘bee’ How wiring evolves Modeling epilepsy N. Sidiropoulos George Karypis V. Papalexakis Tom Mitchell

77 CMU SCS C. Faloutsos (CMU) 77 Thanks IBM-PBGH June 2013 Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC ; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

78 CMU SCS C. Faloutsos (CMU) 78 Project info: PEGASUS IBM-PBGH June 2013 www.cs.cmu.edu/~pegasus Results on large graphs: with Pegasus + hadoop + M45 Apache license Code, papers, manual, video Prof. U Kang Prof. Polo Chau

79 CMU SCS C. Faloutsos (CMU) 79 Cast Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Tong, Hanghang Prakash, Aditya IBM-PBGH June 2013 Koutra, Danai Beutel, Alex Papalexakis, Vagelis

80 CMU SCS C. Faloutsos (CMU) 80 References Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006) IBM-PBGH June 2013

81 CMU SCS C. Faloutsos (CMU) 81 References Christos Faloutsos, Tamara G. Kolda, Jimeng Sun: Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174 IBM-PBGH June 2013

82 CMU SCS References Yasuko Matsubara, Yasushi Sakurai, B. Aditya Prakash, Lei Li, Christos Faloutsos, "Rise and Fall Patterns of Information Diffusion: Model and Implications", KDD’12, pp. 6-14, Beijing, China, August 2012 IBM-PBGH June 2013C. Faloutsos (CMU) 82

83 CMU SCS References Jimeng Sun, Dacheng Tao, Christos Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374- 383 IBM-PBGH June 2013C. Faloutsos (CMU) 83

84 CMU SCS Overall Conclusions G1: fraud detection –BP: powerful method –FaBP: faster; equally accurate; known convergence G2: botnets -> Eigenspokes G3: Subject-Verb-Object -> Tensors/GigaTensor Spikes: ‘spikeM’ (exp rise; PL drop) IBM-PBGH June 2013C. Faloutsos (CMU) 84


Download ppt "CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU."

Similar presentations


Ads by Google