Download presentation
Presentation is loading. Please wait.
Published byTiffany Todd Modified over 9 years ago
1
CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer Science Carnegie Mellon University Evangelos Papalexakis Abhay Harpale
2
CMU SCS U Kang (CMU) 2KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions
3
CMU SCS U Kang (CMU) 3KDD 2012 Background: Tensor Tensors (=multi-dimensional arrays) are everywhere Hyperlinks and anchor texts in Web graphs URL 1 URL 2 Anchor Text Java C++ C# 1 1 1 1 1 1 1
4
CMU SCS U Kang (CMU) 4KDD 2012 Background: Tensor Tensors (=multi-dimensional arrays) are everywhere Sensor stream (time, location, type) Predicates (subject, verb, object) in knowledge base “Barrack Obama is the president of U.S.” “Eric Clapton plays guitar” (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M
5
CMU SCS U Kang (CMU) 5KDD 2012 Problem Definition Q1: How to decompose a billion-scale tensor? Corresponds to SVD in 2D case
6
CMU SCS U Kang (CMU) 6KDD 2012 Problem Definition Q2: What are the important concepts and synonyms in a KB tensor? Q2.1: What are the dominant concepts in the knowledge base tensor? Q2.2: What are the synonyms to a given noun phrase? (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M
7
CMU SCS U Kang (CMU) 7KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions
8
CMU SCS U Kang (CMU) 8KDD 2012 Algorithm: Problem Definition Q1: How to decompose a billion-scale tensor? Corresponds to SVD in 2D case
9
CMU SCS U Kang (CMU) 9KDD 2012 Challenge Alternating Least Square (ALS) Algorithm : pseudo-inverse How to design fast MapReduce algorithm for the ALS? : Hadamard : Khatri-Rao (J=26M) (I=26M) (K=48M) Details
10
CMU SCS U Kang (CMU) 10KDD 2012 Main Idea 1. Ordering of Computation Our choice FLOPS (NELL data) Details
11
CMU SCS U Kang (CMU) 11KDD 2012 Main Idea 2. Avoiding Intermediate Data Explosion Size of Intermediate Data (NELL) - Naïve: 100 PB (J=26M) (I=26M) (K=48M) Details
12
CMU SCS U Kang (CMU) 12KDD 2012 Main Idea 2. Avoiding Intermediate Data Explosion Size of Intermediate Data (NELL) - Proposed: 1.5 GB Details Size of Intermediate Data (NELL) - Naïve: 100 PB (Before) (After)
13
CMU SCS U Kang (CMU) 13KDD 2012 Experiments GigaTensor solves 100x larger problem Number of nonzero = I / 50 (J) (I) (K) GigaTensor Tensor Toolbox Out of Memory 100x
14
CMU SCS U Kang (CMU) 14KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions
15
CMU SCS U Kang (CMU) 15KDD 2012 Discoveries: Problem Definition Q2: What are the important concepts and synonyms in a KB tensor? Q2.1: What are the dominant concepts in the knowledge base tensor? Q2.2: What are the synonyms to a given noun phrase? (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M
16
CMU SCS U Kang (CMU) 16KDD 2012 A2.1: Concept Discovery Concept Discovery in Knowledge Base
17
CMU SCS U Kang (CMU) 17KDD 2012 A2.1: Concept Discovery
18
CMU SCS U Kang (CMU) 18KDD 2012 A2.2: Synonym Discovery Synonym Discovery in Knowledge Base a1a1 a2a2 aRaR … (Given) noun phrase (Discovered) synonym 1 (Discovered) synonym 2
19
CMU SCS U Kang (CMU) 19KDD 2012 A2.2: Synonym Discovery
20
CMU SCS U Kang (CMU) 20KDD 2012 Outline Problem Definition Algorithm Discoveries Conclusions
21
CMU SCS U Kang (CMU) 21KDD 2012 Conclusion GigaTensor: scalable tensor decomposition algorithm for billion-length modes tensors Algorithm: avoid intermediate data explosion Discoveries: concept discovery and contextual synonym detection on KB tensor
22
CMU SCS U Kang (CMU) 22KDD 2012 Thank you ! www.cs.cmu.edu/~pegasus www.cs.cmu.edu/~ukang
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.