CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU
1 Realistic Graph Generation and Evolution Using Kronecker Multiplication Jurij Leskovec, CMU Deepay Chakrabarti, CMU/Yahoo Jon Kleinberg, Cornell Christos.
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
School of Computer Science Carnegie Mellon Sensor and Graph Mining Christos Faloutsos Carnegie Mellon University & IBM
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Carnegie Mellon DB/IR '06C. Faloutsos#1 Data Mining on Streams Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
CS728 Lecture 5 Generative Graph Models and the Web.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
CMU SCS Mining Large Graphs Christos Faloutsos CMU.
CMU SCS APWeb 07(c) 2007, C. Faloutsos 1 Copyright notice Copyright (c) 2007, Christos Faloutsos - all rights preserved. Permission to use all or some.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Mapping the Internet Topology Via Multiple Agents.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
CMU SCS Graph mining: Patterns, Generators and Tools Christos Faloutsos CMU.
CS Lecture 6 Generative Graph Models Part II.
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
CMU SCS Graph Mining and Influence Propagation Christos Faloutsos CMU.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
The structure of the Internet. The Internet as a graph Remember: the Internet is a collection of networks called autonomous systems (ASs) The Internet.
CMU SCS Multimedia and Graph mining Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
Data Mining using Fractals and Power laws
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
School of Computer Science Carnegie Mellon UIUC 04C. Faloutsos1 Advanced Data Mining Tools: Fractals and Power Laws for Graphs, Streams and Traditional.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon Data Mining using Fractals (fractals for fun and profit) Christos Faloutsos Carnegie Mellon University.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Carnegie Mellon Finding patterns in large, real networks Christos Faloutsos CMU
CMU SCS Finding patterns in large, real networks Christos Faloutsos CMU.
CMU SCS Graph Mining: patterns and tools for static and time-evolving graphs Christos Faloutsos CMU.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
CMU SCS Graph Mining Christos Faloutsos CMU. CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
Graph Models Class Algorithmic Methods of Data Mining
NetMine: Mining Tools for Large Graphs
Finding patterns in large, real networks
Part 1: Graph Mining – patterns
Lecture 13 Network evolution
R-MAT: A Recursive Model for Graph Mining
15-826: Multimedia Databases and Data Mining
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Presentation transcript:

CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU

CMU SCS CALD Day '06C. Faloutsos2 Outline Problem definition / Motivation Graphs and power laws Streams and forecasting Conclusions

CMU SCS CALD Day '06C. Faloutsos3 Motivation Data mining: ~ find patterns (rules, outliers) How do real graphs look like? How do (numerical) streams look like?

CMU SCS CALD Day '06C. Faloutsos4 Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

CMU SCS CALD Day '06C. Faloutsos5 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

CMU SCS CALD Day '06C. Faloutsos6 Graphs - why should we care? network of supply-chain companies network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: /IP traffic and anomaly detection....

CMU SCS CALD Day '06C. Faloutsos7 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?

CMU SCS CALD Day '06C. Faloutsos8 Graph mining Are real graphs random?

CMU SCS CALD Day '06C. Faloutsos9 Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns

CMU SCS CALD Day '06C. Faloutsos10 Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3

CMU SCS CALD Day '06C. Faloutsos11 Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3 count 3

CMU SCS CALD Day '06C. Faloutsos12 Solution #1: The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) O = Exponent = slope Outdegree count Nov’

CMU SCS CALD Day '06C. Faloutsos13 Solution Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com

CMU SCS CALD Day '06C. Faloutsos14 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

CMU SCS CALD Day '06C. Faloutsos15 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]

CMU SCS CALD Day '06C. Faloutsos16 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

CMU SCS CALD Day '06C. Faloutsos17 Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi Publication%20Categories/ 04%20Talks/2005-norway- 3hours.ppt

CMU SCS CALD Day '06C. Faloutsos18 Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi Publication%20Categories/ 04%20Talks/2005-norway- 3hours.ppt

CMU SCS CALD Day '06C. Faloutsos19 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’

CMU SCS CALD Day '06C. Faloutsos20 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user

CMU SCS CALD Day '06C. Faloutsos21 More Power laws Also hold for other web graphs [Barabasi+], [Tomkins+], with additional ‘rules’ (bi- partite cores follow power laws)

CMU SCS CALD Day '06C. Faloutsos22 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

CMU SCS CALD Day '06C. Faloutsos23 Time Evolution: rank R The rank exponent has not changed! Domain level log(rank) log(degree) att.com ibm.com

CMU SCS CALD Day '06C. Faloutsos24 Any other pattern, over time?

CMU SCS CALD Day '06C. Faloutsos25 Time evolution with Jure Leskovec (CMU) and Jon Kleinberg (Cornell)

CMU SCS CALD Day '06C. Faloutsos26 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?

CMU SCS CALD Day '06C. Faloutsos27 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time –As the network grows the distances between nodes slowly decrease

CMU SCS CALD Day '06C. Faloutsos28 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter

CMU SCS CALD Day '06C. Faloutsos29 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter

CMU SCS CALD Day '06C. Faloutsos30 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter

CMU SCS CALD Day '06C. Faloutsos31 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter

CMU SCS CALD Day '06C. Faloutsos32 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)

CMU SCS CALD Day '06C. Faloutsos33 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’

CMU SCS CALD Day '06C. Faloutsos34 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) ??

CMU SCS CALD Day '06C. Faloutsos35 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69

CMU SCS CALD Day '06C. Faloutsos36 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) : tree

CMU SCS CALD Day '06C. Faloutsos37 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2

CMU SCS CALD Day '06C. Faloutsos38 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66

CMU SCS CALD Day '06C. Faloutsos39 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18

CMU SCS CALD Day '06C. Faloutsos40 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15

CMU SCS CALD Day '06C. Faloutsos41 Outline Problem definition / Motivation Graphs and power laws –laws –algorithms: graph partitioning using MDL Streams and forecasting Conclusions

CMU SCS CALD Day '06C. Faloutsos42 Graph partitioning Documents x terms Customers x products Users x web-sites Q: HOW MANY PIECES?

CMU SCS CALD Day '06C. Faloutsos43 Graph partitioning Documents x terms Customers x products Users x web-sites Q: HOW MANY PIECES? A: MDL/ compression

CMU SCS CALD Day '06C. Faloutsos44 Cross-associations 1x2 2x2

CMU SCS CALD Day '06C. Faloutsos45 Cross-associations 2x3 3x3 3x4

CMU SCS CALD Day '06C. Faloutsos46 Cross-associations

CMU SCS CALD Day '06C. Faloutsos47 Cross-associations outlier edge missing edge

CMU SCS CALD Day '06C. Faloutsos48 bio nuclear physics math education

CMU SCS CALD Day '06C. Faloutsos49 Conclusions Real graphs obey some surprising patterns –which can help us spot anomalies / outliers MDL helps partition a graph into ‘natural’ groups

CMU SCS CALD Day '06C. Faloutsos50 Outline Problem definition / Motivation Graphs and power laws Streams and forecasting Conclusions

CMU SCS CALD Day '06C. Faloutsos51 Why care about streams? Sensor devices –Temperature, weather measurements –Road traffic data –Geological observations –Patient physiological data Embedded devices –Network routers –Intelligent (active) disks

CMU SCS CALD Day '06C. Faloutsos52 Co-evolving time sequences Joint work with Jimeng Sun (CMU) Spiros Papadimitriou (CMU/IBM) Dr. Yasushi Sakurai (NTT)

CMU SCS CALD Day '06C. Faloutsos53 Outline Problem definition / Motivation Graphs and power laws Streams and forecasting –single stream mining & forecasting –multiple-stream mining and summarization Conclusions

CMU SCS CALD Day '06C. Faloutsos54 Results - Synthetic data Triangle pulse Mix (sine + square) AR captures wrong trend (or none) Seasonal AR estimation fails AWSOMARSeasonal AR

CMU SCS CALD Day '06C. Faloutsos55 Automobile traffic Daily periodicity Rush-hour peaks Bursty “noise” at smaller time scales Results – real data

CMU SCS CALD Day '06C. Faloutsos56 Results - Real data Sunspot intensity Slightly time-varying “period” AR captures wrong trend Seasonal ARIMA –wrong trend; needs human

CMU SCS CALD Day '06C. Faloutsos57 Outline Problem definition / Motivation Graphs and power laws Streams and forecasting –single stream mining & forecasting –multiple-stream mining and summarization Conclusions

CMU SCS CALD Day '06C. Faloutsos58 Motivation water distribution network normal operation Phase 1Phase 2Phase 3 : : : chlorine concentrations sensors near leak sensors away from leak Hundreds of measurements, possibly, correlated.

CMU SCS CALD Day '06C. Faloutsos59 Phase 1Phase 2Phase 3 : : : Motivation water distribution network normal operation chlorine concentrations sensors near leak sensors away from leak Hundreds of measurements, possibly, correlated.

CMU SCS CALD Day '06C. Faloutsos60 Phase 1Phase 2Phase 3 : : : Motivation water distribution network normal operationmajor leak chlorine concentrations sensors near leak sensors away from leak Hundreds of measurements, possibly, correlated.

CMU SCS CALD Day '06C. Faloutsos61 Phase 1Phase 2Phase 3 : : : Motivation water distribution network normal operationmajor leak Hundreds of measurements, possibly, correlated. chlorine concentrations sensors near leak sensors away from leak

CMU SCS CALD Day '06C. Faloutsos62 Motivation actual measurements (n streams) k hidden variable(s) Phase 1 : : : chlorine concentrations Phase 1 k = 1 Find: “hidden (latent) variables”, to summarize the key trends

CMU SCS CALD Day '06C. Faloutsos63 Motivation Find: “hidden (latent) variables”, to summarize the key trends chlorine concentrations Phase 1 Phase 2 actual measurements (n streams) k hidden variable(s) k = 2 : : :

CMU SCS CALD Day '06C. Faloutsos64 Motivation chlorine concentrations Phase 1 Phase 2 Phase 3 actual measurements (n streams) k hidden variable(s) k = 1 : : : Find: “hidden (latent) variables”, to summarize the key trends

CMU SCS CALD Day '06C. Faloutsos65 Stream mining Solution: SPIRIT [VLDB’05] –incremental, on-line PCA

CMU SCS CALD Day '06C. Faloutsos66 Stream mining Solution: SPIRIT [VLDB’05] –incremental, on-line PCA

CMU SCS CALD Day '06C. Faloutsos67 SPIRIT also to monitor a data center (‘self-*’ storage project see demo of SPIRIT at (needs JVM plugin) demo

CMU SCS CALD Day '06C. Faloutsos68 Conclusions Graphs & streams pose fascinating problems MDL, PCA/SVD (wavelets): powerful tools self-similarity, fractals and power laws work, when textbook methods fail!

CMU SCS CALD Day '06C. Faloutsos69 Other projects video data mining [Pan + Yang] Virus propagation (Wang) Anomaly detection in network traffic (Wang, Olston, ++)

CMU SCS CALD Day '06C. Faloutsos70 Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

CMU SCS CALD Day '06C. Faloutsos71 Contact info Wean Hall 7107 Ph#: x8.1457