Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1.

Slides:



Advertisements
Similar presentations
Choosing an Order for Joins
Advertisements

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?
1cs542g-term Notes  Assignment 1 will be out later today (look on the web)
1cs542g-term Notes  Assignment 1 is out (questions?)
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
The Gamma Operator for Big Data Summarization
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.
Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Database Systems Carlos Ordonez. What is “Database systems” research? Input? large data sets, large files, relational tables How? Fast external algorithms;
Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
TensorFlow– A system for large-scale machine learning
Integrating the R Language Runtime System with a Data Stream Warehouse
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Tensorflow Tutorial Homin Yoon.
COMPUTATIONAL MODELS.
Large-scale Machine Learning
Pathology Spatial Analysis February 2017
Chilimbi, et al. (2014) Microsoft Research
Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University
Resource Elasticity for Large-Scale Machine Learning
Interquery Parallelism
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Parallel Database Systems
Database Performance Tuning and Query Optimization
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Evaluation of Relational Operations
Amir Kamil and Katherine Yelick
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
COS 518: Distributed Systems Lecture 10 Andrew Or, Mike Freedman
Applying Twister to Scientific Applications
湖南大学-信息科学与工程学院-计算机与科学系
Cse 344 May 2nd – Map/reduce.
CMPT 733, SPRING 2016 Jiannan Wang
Communication and Memory Efficient Parallel Decision Tree Construction
MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.
Carlos Ordonez, Predrag T. Tosic
Distributed System Gang Wu Spring,2018.
Parallel Analytic Systems
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Chapter 11 Database Performance Tuning and Query Optimization
Amir Kamil and Katherine Yelick
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Wednesday, 5/8/2002 Hash table indexes, physical operators
Wellington Cabrera Carlos Ordonez
The Gamma Operator for Big Data Summarization
Wellington Cabrera Advisor: Carlos Ordonez
CMPT 733, SPRING 2017 Jiannan Wang
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Wellington Cabrera Advisor: Carlos Ordonez
Carlos Ordonez, Javier Garcia-Garcia,
The Gamma Operator for Big Data Summarization on an Array DBMS
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Lecture 20: Query Execution
Parallel Systems to Compute
Presentation transcript:

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1

Parallel architecture Shared-nothing, message-passing N nodes Data partitioned before computation Examples: Parallel DBMSs, HDFS, MapReduce, Spark

    3

Old: separate sufficient statistics 4

New: Generalizing and unifying Sufficient Statistics: Z=[1,X,Y] 5

Linear Algebra: Our main result for parallel and scalable computation

2-phase algorithm

Equivalent equations with projections from Gamma (descriptive, predictive) 8

Fundamental properties: non-commutative but distributive 9

Parallel Theoretical Guarantees of  10

Dense matrix algorithm: O(d2 n) 11

Sparse matrix algorithm: O(d n) for hyper-sparse matrix 12

Pros: Algorithm evaluation with physical array operators Since xi fits in one chunk joins are avoided (at least 2X I/O with hash or merge join) Since xi*xiT can be computed in RAM we avoid an aggregation (avoid sorting points by I) No need to store X twice: X, XT: half I/O, half RAM space No need transpose X, costly reorg even in RAM, especially if X spans several RAM segments C++ compiled code: fast; vector accessed once; direct assignment (bypass C++ calls) 13

Running on the cloud  

Running in the cloud, 100 nodes

Conclusions One pass summarization matrix operator: parallel, scalable. Algorithm compatible with any parallel shared-nothing system, but better for array systems Optimization of outer matrix multiplication as sum (aggregation) of vector outer products Algorithm:Dense and sparse matrix versions required Gamma matrix must fit in RAM, but n unlimited ML methods to two phases: 1: Summarization, 2: Computing model parameters. Summarization matrix can be exploited in many intermediate computations. 16

Future work: Theory Study Gamma in other models like logistic regression, clustering, Factor Analysis, HMMs, Kalman filters Clustering Model: for frequent itemset Higher-order expected moments, co-variates Numeric stability with unnormalized sorted data: unlikely 17