CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.

Slides:



Advertisements
Similar presentations
Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Advertisements

CMU SCS Identifying on-line Fraudsters: Anomaly Detection Using Network Effects Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
Cost-effective Outbreak Detection in Networks Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie Glance.
CMU SCS : Multimedia Databases and Data Mining Extra: intro to hadoop C. Faloutsos.
15-826: Multimedia Databases and Data Mining
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
1 Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation David Brumley, Juan.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
WindMine: Fast and Effective Mining of Web-click Sequences SDM 2011Y. Sakurai et al.1 Yasushi Sakurai (NTT) Lei Li (Carnegie Mellon Univ.) Yasuko Matsubara.
Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.
Real-time Computer Vision with Scanning N-Tuple Grids Simon Lucas Computer Science Dept.
Thinking Processes By Marvi Matos. College of Engineering, UPR BS, Chem E My background.
Chapter 2: Algorithm Discovery and Design
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
Presented By Wanchen Lu 2/25/2013
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
SGD ON HADOOP FOR BIG DATA & HUGE MODELS Alex Beutel Based on work done with Abhimanu Kumar, Vagelis Papalexakis, Partha Talukdar, Qirong Ho, Christos.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
SCALING THE KNOWLEDGE BASE FOR THE NEVER-ENDING LANGUAGE LEARNER (NELL): A STEP TOWARD LARGE-SCALE COMPUTING FOR AUTOMATED LEARNING Joel Welling PSC 4/10/2012.
Introduction to tensor, tensor factorization and its applications
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Fast Mining and Forecasting of Complex Time-Stamped Events Yasuko Matsubara (Kyoto University), Yasushi Sakurai (NTT), Christos Faloutsos (CMU), Tomoharu.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
Noboru Matsuda Human-Computer Interaction Institute
Community Grids Lab. Indiana University, Bloomington Seung-Hee Bae.
CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
ParCube: Sparse Parallelizable Tensor Decompositions
Practical Asynchronous Neighbor Discovery and Rendezvous for Mobile Sensing Applications Prabal Dutta and David Culler Computer Science Division University.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Kijung Shin Jinhong Jung Lee Sael U Kang
Cluster Analysis Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Carnegie Mellon KDD04Faloutsos, McCurley & Tomkins1 Fast Discovery of Connection Subgraphs Christos Faloutsos (CMU) Kevin McCurley (IBM) Andrew Tomkins.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
1 / 24 Distributed Methods for High-dimensional and Large-scale Tensor Factorization Kijung Shin (Seoul National University) and U Kang (KAIST)
Queensland University of Technology
A Peta-Scale Graph Mining System
Large Graph Mining: Power Tools and a Practitioner’s guide
DOULION: Counting Triangles in Massive Graphs with a Coin
15-826: Multimedia Databases and Data Mining
PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM
Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint
Kijung Shin1 Mohammad Hammoud1
Large Graph Mining: Power Tools and a Practitioner’s guide
Graph and Tensor Mining for fun and profit
Jimeng Sun · Charalampos (Babis) E
Graph and Tensor Mining for fun and profit
Asymmetric Transitivity Preserving Graph Embedding
15-826: Multimedia Databases and Data Mining
Dong Deng+, Yu Jiang+, Guoliang Li+, Jian Li+, Cong Yu^
Presentation transcript:

CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer Science Carnegie Mellon University Evangelos Papalexakis Abhay Harpale

CMU SCS U Kang (CMU) 2KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions

CMU SCS U Kang (CMU) 3KDD 2012 Background: Tensor Tensors (=multi-dimensional arrays) are everywhere  Hyperlinks and anchor texts in Web graphs URL 1 URL 2 Anchor Text Java C++ C#

CMU SCS U Kang (CMU) 4KDD 2012 Background: Tensor Tensors (=multi-dimensional arrays) are everywhere  Sensor stream (time, location, type)  Predicates (subject, verb, object) in knowledge base “Barrack Obama is the president of U.S.” “Eric Clapton plays guitar” (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M

CMU SCS U Kang (CMU) 5KDD 2012 Problem Definition Q1: How to decompose a billion-scale tensor?  Corresponds to SVD in 2D case

CMU SCS U Kang (CMU) 6KDD 2012 Problem Definition Q2: What are the important concepts and synonyms in a KB tensor?  Q2.1: What are the dominant concepts in the knowledge base tensor?  Q2.2: What are the synonyms to a given noun phrase? (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M

CMU SCS U Kang (CMU) 7KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions

CMU SCS U Kang (CMU) 8KDD 2012 Algorithm: Problem Definition Q1: How to decompose a billion-scale tensor?  Corresponds to SVD in 2D case

CMU SCS U Kang (CMU) 9KDD 2012 Challenge Alternating Least Square (ALS) Algorithm : pseudo-inverse How to design fast MapReduce algorithm for the ALS? : Hadamard : Khatri-Rao (J=26M) (I=26M) (K=48M) Details

CMU SCS U Kang (CMU) 10KDD 2012 Main Idea 1. Ordering of Computation Our choice FLOPS (NELL data) Details

CMU SCS U Kang (CMU) 11KDD 2012 Main Idea 2. Avoiding Intermediate Data Explosion Size of Intermediate Data (NELL) - Naïve: 100 PB (J=26M) (I=26M) (K=48M) Details

CMU SCS U Kang (CMU) 12KDD 2012 Main Idea 2. Avoiding Intermediate Data Explosion Size of Intermediate Data (NELL) - Proposed: 1.5 GB Details Size of Intermediate Data (NELL) - Naïve: 100 PB (Before) (After)

CMU SCS U Kang (CMU) 13KDD 2012 Experiments GigaTensor solves 100x larger problem Number of nonzero = I / 50 (J) (I) (K) GigaTensor Tensor Toolbox Out of Memory 100x

CMU SCS U Kang (CMU) 14KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions

CMU SCS U Kang (CMU) 15KDD 2012 Discoveries: Problem Definition Q2: What are the important concepts and synonyms in a KB tensor?  Q2.1: What are the dominant concepts in the knowledge base tensor?  Q2.2: What are the synonyms to a given noun phrase? (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M

CMU SCS U Kang (CMU) 16KDD 2012 A2.1: Concept Discovery Concept Discovery in Knowledge Base

CMU SCS U Kang (CMU) 17KDD 2012 A2.1: Concept Discovery

CMU SCS U Kang (CMU) 18KDD 2012 A2.2: Synonym Discovery Synonym Discovery in Knowledge Base a1a1 a2a2 aRaR … (Given) noun phrase (Discovered) synonym 1 (Discovered) synonym 2

CMU SCS U Kang (CMU) 19KDD 2012 A2.2: Synonym Discovery

CMU SCS U Kang (CMU) 20KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions

CMU SCS U Kang (CMU) 21KDD 2012 Conclusion GigaTensor: scalable tensor decomposition algorithm for billion-length modes tensors  Algorithm: avoid intermediate data explosion  Discoveries: concept discovery and contextual synonym detection on KB tensor

CMU SCS U Kang (CMU) 22KDD 2012 Thank you !