CMU SCS Graph Mining and Influence Propagation Christos Faloutsos CMU.

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

1 Realistic Graph Generation and Evolution Using Kronecker Multiplication Jurij Leskovec, CMU Deepay Chakrabarti, CMU/Yahoo Jon Kleinberg, Cornell Christos.
Beyond Streams and Graphs: Dynamic Tensor Analysis
CMU SCS Mining Graphs and Tensors Christos Faloutsos CMU.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CS728 Lecture 5 Generative Graph Models and the Web.
Modeling Real Graphs using Kronecker Multiplication
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
CMU SCS Mining Large Graphs Christos Faloutsos CMU.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
CMU SCS APWeb 07(c) 2007, C. Faloutsos 1 Copyright notice Copyright (c) 2007, Christos Faloutsos - all rights preserved. Permission to use all or some.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS Graph mining: Patterns, Generators and Tools Christos Faloutsos CMU.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
Fast Random Walk with Restart and Its Applications
CMU SCS Multimedia and Graph mining Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #28: Graph mining - patterns Christos Faloutsos.
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
School of Computer Science Carnegie Mellon LLNL, Feb. '07C. Faloutsos1 Mining static and time-evolving graphs Christos Faloutsos Carnegie Mellon University.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Mining Billion-Node Graphs Christos Faloutsos CMU.
CMU SCS Mining Billion-Node Graphs: Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS Graph Mining - surprising patterns in real graphs Christos Faloutsos CMU.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Incremental Pattern Discovery on Streams, Graphs and Tensors Jimeng Sun Ph.D.Thesis Proposal May 15, 2006.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
CMU SCS Graph Mining: patterns and tools for static and time-evolving graphs Christos Faloutsos CMU.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
CMU SCS Graph Mining Christos Faloutsos CMU. CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
Graph Models Class Algorithmic Methods of Data Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
Modeling networks using Kronecker multiplication
NetMine: Mining Tools for Large Graphs
Large Graph Mining: Power Tools and a Practitioner’s guide
Part 1: Graph Mining – patterns
Lecture 13 Network evolution
R-MAT: A Recursive Model for Graph Mining
Dynamics of Real-world Networks
Graph and Tensor Mining for fun and profit
Algorithms for Large Graph Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
Modelling and Searching Networks Lecture 2 – Complex Networks
Presentation transcript:

CMU SCS Graph Mining and Influence Propagation Christos Faloutsos CMU

CMU SCS WICOW 08C. Faloutsos 2 Thank you! Adam Jatowt

CMU SCS WICOW 08C. Faloutsos 3 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions

CMU SCS WICOW 08C. Faloutsos 4 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

CMU SCS WICOW 08C. Faloutsos 5 Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

CMU SCS WICOW 08C. Faloutsos 6 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

CMU SCS WICOW 08C. Faloutsos 7 Graphs - why should we care? IR: bi-partite graphs (doc-terms) web: hyper-text graph... and more: D1D1 DNDN T1T1 TMTM...

CMU SCS WICOW 08C. Faloutsos 8 Graphs - why should we care? network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: /IP traffic and anomaly detection....

CMU SCS WICOW 08C. Faloutsos 9 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What is ‘normal’/‘abnormal’? which patterns/laws hold?

CMU SCS WICOW 08C. Faloutsos 10 Graph mining Are real graphs random?

CMU SCS WICOW 08C. Faloutsos 11 Laws and patterns Are real graphs random? A: NO!! –Diameter –in- and out- degree distributions –other (surprising) patterns

CMU SCS WICOW 08C. Faloutsos 12 Solution#1 Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com

CMU SCS WICOW 08C. Faloutsos 13 Solution#1’: Eigen Exponent E A2: power law in the eigenvalues of the adjacency matrix E = Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001

CMU SCS WICOW 08C. Faloutsos 14 Solution#1’: Eigen Exponent E [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent E = Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001

CMU SCS WICOW 08C. Faloutsos 15 But: How about graphs from other domains?

CMU SCS WICOW 08C. Faloutsos 16 The Peer-to-Peer Topology Count versus degree Number of adjacent peers follows a power-law [Jovanovic+]

CMU SCS WICOW 08C. Faloutsos 17 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

CMU SCS WICOW 08C. Faloutsos 18 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’

CMU SCS WICOW 08C. Faloutsos 19 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user

CMU SCS WICOW 08C. Faloutsos 20 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

CMU SCS WICOW 08C. Faloutsos 21 Problem#2: Time evolution with Jure Leskovec (CMU/MLD) and Jon Kleinberg (Cornell – CMU)

CMU SCS WICOW 08C. Faloutsos 22 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?

CMU SCS WICOW 08C. Faloutsos 23 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time

CMU SCS WICOW 08C. Faloutsos 24 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter

CMU SCS WICOW 08C. Faloutsos 25 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter

CMU SCS WICOW 08C. Faloutsos 26 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter

CMU SCS WICOW 08C. Faloutsos 27 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter

CMU SCS WICOW 08C. Faloutsos 28 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)

CMU SCS WICOW 08C. Faloutsos 29 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’

CMU SCS WICOW 08C. Faloutsos 30 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) ??

CMU SCS WICOW 08C. Faloutsos 31 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69

CMU SCS WICOW 08C. Faloutsos 32 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) : tree

CMU SCS WICOW 08C. Faloutsos 33 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2

CMU SCS WICOW 08C. Faloutsos 34 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66

CMU SCS WICOW 08C. Faloutsos 35 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18

CMU SCS WICOW 08C. Faloutsos 36 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15

CMU SCS WICOW 08C. Faloutsos 37 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

CMU SCS WICOW 08C. Faloutsos 38 Problem#3: Generation Given a growing graph with count of nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns

CMU SCS WICOW 08C. Faloutsos 39 Problem Definition Given a growing graph with count of nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns –Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter –Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters

CMU SCS WICOW 08C. Faloutsos 40 Problem Definition Given a growing graph with count of nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns Idea: Self-similarity –Leads to power laws –Communities within communities –…

CMU SCS WICOW 08C. Faloutsos 41 Adjacency matrix Kronecker Product – a Graph Intermediate stage Adjacency matrix

CMU SCS WICOW 08C. Faloutsos 42 Kronecker Product – a Graph Continuing multiplying with G 1 we obtain G 4 and so on … G 4 adjacency matrix

CMU SCS WICOW 08C. Faloutsos 43 Kronecker Product – a Graph Continuing multiplying with G 1 we obtain G 4 and so on … G 4 adjacency matrix

CMU SCS WICOW 08C. Faloutsos 44 Kronecker Product – a Graph Continuing multiplying with G 1 we obtain G 4 and so on … G 4 adjacency matrix

CMU SCS WICOW 08C. Faloutsos 45 Properties: We can PROVE that –Degree distribution is multinomial ~ power law –Diameter: constant –Eigenvalue distribution: multinomial –First eigenvector: multinomial See [Leskovec+, PKDD’05] for proofs

CMU SCS WICOW 08C. Faloutsos 46 Problem Definition Given a growing graph with nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns –Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter –Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters First and only generator for which we can prove all these properties

CMU SCS WICOW 08C. Faloutsos 47 Stochastic Kronecker Graphs Create N 1  N 1 probability matrix P 1 Compute the k th Kronecker power P k For each entry p uv of P k include an edge (u,v) with probability p uv P1P1 Instance Matrix G PkPk flip biased coins Kronecker multiplication skip

CMU SCS WICOW 08C. Faloutsos 48 Experiments How well can we match real graphs? –Arxiv: physics citations: 30,000 papers, 350,000 citations 10 years of data –U.S. Patent citation network 4 million patents, 16 million citations 37 years of data –Autonomous systems – graph of internet Single snapshot from January ,400 nodes, 26,000 edges We show both static and temporal patterns

CMU SCS WICOW 08C. Faloutsos 49 (Q: how to fit the parm’s?) A: Stochastic version of Kronecker graphs + Max likelihood + Metropolis sampling [Leskovec+, ICML’07]

CMU SCS WICOW 08C. Faloutsos 50 Experiments on real AS graph Degree distributionHop plot Network valueAdjacency matrix eigen values

CMU SCS WICOW 08C. Faloutsos 51 Conclusions Kronecker graphs have: –All the static properties Heavy tailed degree distributions Small diameter Multinomial eigenvalues and eigenvectors –All the temporal properties Densification Power Law Shrinking/Stabilizing Diameters –We can formally prove these results

CMU SCS WICOW 08C. Faloutsos 52 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

CMU SCS WICOW 08C. Faloutsos 53 Problem#4: MasterMind – ‘CePS’ w/ Hanghang Tong, KDD 2006 htong cs.cmu.edu

CMU SCS WICOW 08C. Faloutsos 54 Center-Piece Subgraph(Ceps) Given Q query nodes Find Center-piece ( ) App. –Social Networks –Law Inforcement, … Idea: –Proximity -> random walk with restarts

CMU SCS WICOW 08C. Faloutsos 55 Case Study: AND query R.AgrawalJiawei Han V.VapnikM.Jordan

CMU SCS WICOW 08C. Faloutsos 56 Case Study: AND query

CMU SCS WICOW 08C. Faloutsos 57 Case Study: AND query

CMU SCS WICOW 08C. Faloutsos 58 2_SoftAnd query ML/Statistics databases

CMU SCS WICOW 08C. Faloutsos 59 Conclusions Q1:How to measure the importance? A1: RWR+K_SoftAnd Q2:How to do it efficiently? A2:Graph Partition (Fast CePS) –~90% quality –150x speedup (ICDM’06, b.p. award)

CMU SCS WICOW 08C. Faloutsos 60 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions

CMU SCS WICOW 08C. Faloutsos 61 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

CMU SCS WICOW 08C. Faloutsos 62 Tensors for time evolving graphs [Jimeng Sun+ KDD’06] [ “, SDM’07] [ CF, Kolda, Sun, SDM’07 tutorial]

CMU SCS WICOW 08C. Faloutsos 63 Social network analysis Static: find community structures DB A u t h o r s Keywords 1990

CMU SCS WICOW 08C. Faloutsos 64 Social network analysis Static: find community structures DB A u t h o r s

CMU SCS WICOW 08C. Faloutsos 65 Social network analysis Static: find community structures Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps

CMU SCS WICOW 08C. Faloutsos 66 DB DM Application 1: Multiway latent semantic indexing (LSI) DB Michael Stonebraker Query Pattern U keyword authors keyword U authors Projection matrices specify the clusters Core tensors give cluster activation level Philip Yu

CMU SCS WICOW 08C. Faloutsos 67 Bibliographic data (DBLP) Papers from VLDB and KDD conferences Construct 2nd order tensors with yearly windows with Each tensor: 4584  timestamps (years)

CMU SCS WICOW 08C. Faloutsos 68 Multiway LSI AuthorsKeywordsYear michael carey, michael stonebraker, h. jagadish, hector garcia-molina queri,parallel,optimization,concurr, objectorient 1995 surajit chaudhuri,mitch cherniack,michael stonebraker,ugur etintemel distribut,systems,view,storage,servic,pr ocess,cache 2004 jiawei han,jian pei,philip s. yu, jianyong wang,charu c. aggarwal streams,pattern,support, cluster, index,gener,queri 2004 Two groups are correctly identified: Databases and Data mining People and concepts are drifting over time DM DB

CMU SCS WICOW 08C. Faloutsos 69 Network forensics Directional network flows A large ISP with 100 POPs, each POP 10Gbps link capacity [Hotnets2004] –450 GB/hour with compression Task: Identify abnormal traffic pattern and find out the cause normal traffic abnormal traffic destination source destination source (with Prof. Hui Zhang and Dr. Yinglian Xie)

CMU SCS WICOW 08C. Faloutsos 70 MDL mining on time-evolving graph (Enron s) GraphScope [w. Jimeng Sun, Spiros Papadimitriou and Philip Yu, KDD’07]

CMU SCS WICOW 08C. Faloutsos 71 Conclusions Tensor-based methods (WTA/DTA/STA): spot patterns and anomalies on time evolving graphs, and on streams (monitoring)

CMU SCS WICOW 08C. Faloutsos 72 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

CMU SCS WICOW 08C. Faloutsos 73 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) Conclusions

CMU SCS WICOW 08C. Faloutsos 74 Virus propagation How do viruses/rumors propagate? Blog influence? Will a flu-like virus linger, or will it become extinct soon?

CMU SCS WICOW 08C. Faloutsos 75 The model: SIS ‘Flu’ like: Susceptible-Infected-Susceptible Virus ‘strength’ s=  /  Infected Healthy NN1 N3 N2 Prob.  Prob. β Prob. 

CMU SCS WICOW 08C. Faloutsos 76 Epidemic threshold  of a graph: the value of , such that if strength s =  /  <  an epidemic can not happen Thus, given a graph compute its epidemic threshold

CMU SCS WICOW 08C. Faloutsos 77 Epidemic threshold  What should  depend on? avg. degree? and/or highest degree? and/or variance of degree? and/or third moment of degree? and/or diameter?

CMU SCS WICOW 08C. Faloutsos 78 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A

CMU SCS WICOW 08C. Faloutsos 79 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A largest eigenvalue of adj. matrix A attack prob. recovery prob. epidemic threshold Proof: [Wang+03]

CMU SCS WICOW 08C. Faloutsos 80 Experiments (Oregon)  /  > τ (above threshold)  /  = τ (at the threshold)  /  < τ (below threshold)

CMU SCS WICOW 08C. Faloutsos 81 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) Conclusions

CMU SCS WICOW 08C. Faloutsos 82 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU

CMU SCS WICOW 08C. Faloutsos 83 E-bay Fraud detection lines: positive feedbacks would you buy from him/her?

CMU SCS WICOW 08C. Faloutsos 84 E-bay Fraud detection lines: positive feedbacks would you buy from him/her? or him/her?

CMU SCS WICOW 08C. Faloutsos 85 E-bay Fraud detection - NetProbe

CMU SCS WICOW 08C. Faloutsos 86 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection, blogs) Conclusions

CMU SCS WICOW 08C. Faloutsos 87 Blog analysis with Mary McGlohon (CMU) Jure Leskovec (CMU) Natalie Glance (now at Google) Mat Hurst (now at MSR) [SDM’07]

CMU SCS WICOW 08C. Faloutsos 88 Cascades on the Blogosphere B1B1 B2B2 B4B4 B3B3 a b c d e 1 B1B1 B2B2 B4B4 B3B Blogosphere blogs + posts Blog network links among blogs Post network links among posts Q1: popularity-decay of a post? Q2: degree distributions?

CMU SCS WICOW 08C. Faloutsos 89 Q1: popularity over time Days after post Post popularity drops-off – exponentially? days after post # in links 1 2 3

CMU SCS WICOW 08C. Faloutsos 90 Q1: popularity over time Days after post Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links (log) days after post (log)

CMU SCS WICOW 08C. Faloutsos 91 Q1: popularity over time Days after post Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 (close to -1.5: Barabasi’s stack model) # in links (log) days after post (log)

CMU SCS WICOW 08C. Faloutsos 92 Q2: degree distribution 44,356 nodes, 122,153 edges. Half of blogs belong to largest connected component. blog in-degree count B1B1 B2B2 B4B4 B3B ??

CMU SCS WICOW 08C. Faloutsos 93 Q2: degree distribution 44,356 nodes, 122,153 edges. Half of blogs belong to largest connected component. blog in-degree count B1B1 B2B2 B4B4 B3B

CMU SCS WICOW 08C. Faloutsos 94 Q2: degree distribution 44,356 nodes, 122,153 edges. Half of blogs belong to largest connected component. blog in-degree count in-degree slope: -1.7 out-degree: -3 ‘rich get richer’

CMU SCS WICOW 08C. Faloutsos 95 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) –And research directions Conclusions

CMU SCS WICOW 08C. Faloutsos 96 Next steps: edges with –categorical attributes and/or –time-stamps and/or –weights nodes with attributes [G-Ray, Tong et al] scalability (cloud computing)

CMU SCS WICOW 08C. Faloutsos 97 E.g.: self-* CMU >200 nodes 40 racks of computing equipment 774kw of power. target: 1 PetaByte goal: self-correcting, self- securing, self-monitoring, self-...

CMU SCS WICOW 08C. Faloutsos 98 Cloud computing, D.I.S.C. and hadoop ‘Data Intensive Scientific Computing’ [R. Bryant, CMU] –‘big data’ – 128.pdf Yahoo: ~5Pb of data [Fayyad’07] ‘M45’: 4K proc’s, 3Tb RAM, 1.5 Pb disk Hadoop: open-source clone of map-reduce

CMU SCS WICOW 08C. Faloutsos 99 OVERALL CONCLUSIONS Graphs pose a wealth of fascinating problems self-similarity and power laws work, when textbook methods fail! New patterns (shrinking diameter!) New generator: Kronecker SVD / tensors / RWR: valuable tools Scalability / cloud computing -> PetaBytes

CMU SCS WICOW 08C. Faloutsos 100 References Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong.Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PACenter-Piece Subgraphs: Problem Definition and Fast Solutions, Hanghang Tong, Brian Gallagher, Christos Faloutsos, and Tina Eliassi-Rad Fast Best-Effort Pattern Matching in Large Attributed Graphs KDD 2007, San Jose, CAFast Best-Effort Pattern Matching in Large Attributed Graphs

CMU SCS WICOW 08C. Faloutsos 101 References Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award).Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication (ECML/PKDD 2005), Porto, Portugal, 2005.Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker MultiplicationECML/PKDD 2005

CMU SCS WICOW 08C. Faloutsos 102 References Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff, Alberta, Canada, May 8-12, 2007.NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA Beyond Streams and Graphs: Dynamic Tensor Analysis,

CMU SCS WICOW 08C. Faloutsos 103 References Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr [pdf]pdf Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, GraphScope: Parameter- free Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007

CMU SCS WICOW 08C. Faloutsos 104 Contact info: www. cs.cmu.edu /~christos (w/ papers, datasets, code, etc)