SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct. 26-30, 2008, Napa, CA CIKM 2008.

Slides:



Advertisements
Similar presentations
Beyond Streams and Graphs: Dynamic Tensor Analysis
Advertisements

A Vector Space Model for Automatic Indexing
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
© 2012 IBM Corporation IBM Research Gelling, and Melting, Large Graphs by Edge Manipulation Joint Work by Hanghang Tong (IBM) B. Aditya Prakash (Virginia.
Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.
© 2010 IBM Corporation Diversified Ranking on Large Graphs: An Optimization Viewpoint Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Abdullah Mueen UC Riverside Suman Nath Microsoft Research Jie Liu Microsoft Research.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.
Fast Random Walk with Restart and Its Applications
Overview of Search Engines
GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)
1 Contact Prediction, Routing and Fast Information Spreading in Social Networks Kazem Jahanbakhsh Computer Science Department University of Victoria August.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Random Walk with Restart (RWR) for Image Segmentation
Learning user preferences for 2CP-regression for a recommender system Alan Eckhardt, Peter Vojtáš Department of Software Engineering, Charles University.
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
FINDING RELEVANT INFORMATION OF CERTAIN TYPES FROM ENTERPRISE DATA Date: 2012/04/30 Source: Xitong Liu (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
1 Presented by: Yuchen Bian MRWC: Clustering based on Multiple Random Walks Chain.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Tools and Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Kijung Shin Jinhong Jung Lee Sael U Kang
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
TribeFlow Mining & Predicting User Trajectories Flavio Figueiredo Bruno Ribeiro Jussara M. AlmeidaChristos Faloutsos 1.
Large Graph Mining: Power Tools and a Practitioner’s guide
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Associative Query Answering via Query Feature Similarity
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Finding Story Chains in Newswire Articles
Large Graph Mining: Power Tools and a Practitioner’s guide
Grouping.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Methodology & Current Results
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Graph and Tensor Mining for fun and profit
Asymmetric Transitivity Preserving Graph Embedding
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Kostas Kolomvatsos, Christos Anagnostopoulos
Presentation transcript:

SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008 Fast Mining of Complex Time-Stamped Events

SCS CMU A Motivating Example: Inputs TimeEvent (e.g., Session) Entity Oct. 26Link AnalysisTom, Bob ClusteringBob, Alan Oct. 27ClassificationBob, Alan Anomaly DetectionAlan, Beck Oct. 28PartyBeck, Dan Oct. 29Web SearchDan, Jack AdvertisingJack, Peter Oct. 30Enterprise SearchJack, Peter Oct. 31Q & APeter, Smith 2

SCS CMU Time Cluster, rep. entities: b 7,b 6, b 8 A Motivating Example: Outputs Jack Oct. 29 Oct. 30 Oct. 28 Oct. 26 Oct. 27 Time Cluster Rep. Entities: ``Jack’’, ``Peter’’, ``Smith’’ Abnormal Time Rep. Entities: ``Beck’’ ``Dan’’ Time Cluster Rep. Entities: ``Tom’’, ``Bob’’,``Alan’’ 1 st eigen-vector 2 nd eigen-vector

SCS CMU Problem Definitions: ( How to Understand Time in such complex context) Given datasets collected at different time stamps; Find –Q1: Time Clusters –Q2: Abnormal Time stamps –Q3: Interpretations –Q4: Right time granularity 4

SCS CMU Roadmap Motivation T3: Single Resolution Analysis MT3: Multi Resolution Analysis Experimental Evaluations Conclusion 5

SCS CMU T3: Single Resolution Analysis Given the data sets collected at different time stamps… Find –(1) Clusters for time stamps –(2) Abnormal time stamps –(3) Interpretations 6

SCS CMU How to represent the data sets? TimeEvent (e.g., Session) Entity Oct. 26Link AnalysisTom, Bob ClusteringBob, Alan Oct. 27ClassificationBob, Alan Anomaly DetectionAlan, Beck Oct. 28PartyBeck, Dan Oct. 29Web SearchDan, Jack AdvertisingJack, Peter Oct. 30Enterprise SearchJack, Peter Oct. 31Q & APeter, Smith 7

SCS CMU A: Graph Representation! Oct. 26, 2008 Oct. 27, 2008 Oct. 28, 2008 Oct. 29, 2008 Oct. 30, 2008 Oct. 31, 2008 Link Analysis Clustering Classification Anomaly Dect. Party Web Search Advertising En. Search Q & A Tom Bob Alan Beck Dan Jack Peter Smith 8

SCS CMU A: Graph Representation! Oct. 26, 2008 Oct. 27, 2008 Oct. 28, 2008 Oct. 29, 2008 Oct. 30, 2008 Oct. 31, 2008 Link Analysis Clustering Classification Anomaly Dect. Party Web Search Advertising En. Search Q & A Tom Bob Alan Beck Dan Jack Peter Smith 9 Prof. CEO Stu.

SCS CMU Qs: Given the graph, How to cluster time nodes? How to spot abnormal time nodes? How to interpret? 10

SCS CMU Q1: How to cluster time nodes? Step 1: Time-To-Time (TT) proximity matrix Oct. 26 Oct. 27 Oct. 28 Oct. 29 Oct. 30 Oct. 31 Oct. 26 Oct. 27Oct. 28Oct. 29Oct. 30Oct

SCS CMU Q1: How to cluster time nodes? Step 2: Cluster time nodes by TT matrix –Spectral Cluster Alg. (and a lot of others) Oct. 26 Oct. 27 Oct. 28 Oct. 29 Oct. 30 Oct. 31 Oct. 26 Oct. 27Oct. 28Oct. 29Oct. 30Oct

SCS CMU Q2: how to find abnormal time node? Abnormal time = Time cluster with singleton Oct. 26 Oct. 27 Oct. 28 Oct. 29 Oct. 30 Oct. 31 Oct. 26 Oct. 27Oct. 28Oct. 29Oct. 30Oct. 31 Oct. 28 is abnormal! 13

SCS CMU Q3: How to interpret? Step 1: Time-to-People (TP) proximity matrix Tom Oct. 26 Oct. 27 Oct. 28 Oct. 29 Oct. 30 Oct. 31 Bob Alan Beck Dan Jack Peter Smith e.g., we want to use people to interpret time cluster/anomaly 14

SCS CMU Q3: How to interpret? Step 2: Time Cluster-to-People (TCP) matrix Tom Oct. 26 Oct. 27 Oct. 28 Oct. 29 Oct. 30 Oct. 31 Bob Alan Beck Dan Jack Peter Smith e.g., we want to use people to interpret time cluster/anomaly 15

SCS CMU Q3: How to interpret? Step 2: Time Cluster-to-People (TCP) matrix Tom Oct. 26 Oct. 27 Oct. 28 Oct. 29 Oct. 30 Oct. 31 Bob Alan Beck Dan Jack Peter Smith e.g., we want to use people to interpret time cluster/anomaly 16

SCS CMU Q3: How to interpret? Step 3: Find `unique’ entity nodes Tom Oct. 26 Oct. 27 Oct. 28 Oct. 29 Oct. 30 Oct. 31 Bob Alan Beck Dan Jack Peter Smith.9.8 e.g., “Bob is close to green cluster on average, but far away from both red & blue clusters”

SCS CMU Summary So Far… Given the data sets collected at different time stamps, We –Construct a graph representation –Get two proximity matrices –Find time clusters/abnormal time stamps –Provide the interpretations. Q: How to get proximity matrices ? 18

SCS CMU How to get proximity matrices ? (i.e., TT/TP matrices) a.k.a Relevance, Closeness, ‘Similarity’… 19 e.g., Oct. 28, 2008 Or, ``John Smith’’

SCS CMU What is a ``good’’ Proximity? Multiple Connections/paths Quality of connection Direct & In-directed Conns Length, Degree, Weight… … 20

SCS CMU Random walk with restart 21

SCS CMU Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node Ranking vector More red, more relevant Nearby nodes, higher scores 22

SCS CMU Computing RWR n x n n x 1 Ranking vector Starting vector Adjacency matrix 1 Restart p A lot of techniques exist to solve this, - e.g., Iterative method 23

SCS CMU Roadmap Motivation T3: Single Resolution Analysis MT3: Multi Resolution Analysis Experimental Evaluations Conclusion 24

SCS CMU MT3: Multiple Resolution Analysis Given –(1) the data sets collected at different time stamps; –(2) different time resolutions Find –(1) Clusters for time stamps –(2) Abnormal time stamps –(3) Interpretations At each of the given resolutions, efficiently. 25

SCS CMU Given We want to … –(At the Finest Res.) Mine & Interpret `Oct 26’, `Oct 27’, `Oct 28’, `Oct 29’, `Oct 30’, `Oct 31’ –(At the coarser Res.) Mine & Interpret `Oct 26-27’, `Oct 28-29’, `Oct 30-31’ MT3:an example 26

SCS CMU Outputs At the finest resolutionAt the coaser resolution 27

SCS CMU MT3: How to (Naïve Solution) TT TP Time Cluster & Anomaly Annotations/ interpretations TT TP Time Cluster & Anomaly ~ ~ 28 Annotations/ interpretations

SCS CMU Challenges Given the mining results at the finest resolution, How to speed up the analysis at the coarser resolutions? 29

SCS CMU MT3: Observation A lot of overlap between two graphs ! for finest resolution for coarser resolution

SCS CMU MT3: Solution TT TP TT TP ~ ~ 31

SCS CMU Roadmap Motivation T3: Single Resolution Analysis MT3: Multi Resolution Analysis Experimental Evaluations Conclusion 32

SCS CMU Data Sets CIKM: from CIKM proceedings Time: Publication year ( , 15) Event: Paper-published (952) Entities: Author (1895) & Session (279) Attribute: Keyword (158) DeviceScan: from MIT Reality Mining Time: the day scanning happened 1/1/2004-5/5/2005, 294 Event: blue tooth device scanning person (114, 046) Entities: Device (103) & Person (97) Attribute: NA 33

SCS CMU T3 on `CIKM’ Data Set Rep. AuthorsRep. Keywords James. P. Callan W. Bruce Croft James Allan Philip S. Yu George Karypis Charles Clarke Web Cluster Classification XML Language Stream Rep. AuthorsRep. Keywords Elke Rundensteiner Daniel Miranker Andreas Henrich Il-Yeol Song Scott B Huffman Robert J. Hall Knowledge System Unstructured Rule Object-oriented Deductive 34

SCS CMU MT3 on `DeviceScan’ Data Set Aggregate by Month Apr is anomaly Aggregate by Day Work day Semester Break & Holiday 35

SCS CMU Evaluation on Speed of MT3 Aggregation Length Log Time (Sec.) MT3 Naïve Sol. DeviceScan Data Set 120x speed up 36

SCS CMU Conclusion T3: Single Resolution Analysis Graph Representation Using Proximity to Find Time Cluster/Anomaly Provide Interpretations MT3: Multiple Resolution Analysis Redundancy among different resolutions Up to 2 orders of magnitude speedup (same quality) 37

SCS CMU Thank you! 38