Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1.

Slides:

Advertisements

Similar presentations

Choosing an Order for Joins

Advertisements

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?

1cs542g-term Notes  Assignment 1 will be out later today (look on the web)

1cs542g-term Notes  Assignment 1 is out (questions?)

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.

The Gamma Operator for Big Data Summarization

Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.

Assignment Solving System of Linear Equations Using MPI Phạm Trần Vũ.

Upcrc.illinois.edu OpenMP Lab Introduction. Compiling for OpenMP Open project Properties dialog box Select OpenMP Support from C/C++ -> Language.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.

Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.

Database Systems Carlos Ordonez. What is “Database systems” research? Input? large data sets, large files, relational tables How? Fast external algorithms;

Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)

Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.

 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.

CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)

Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:

| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.

”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.

TensorFlow– A system for large-scale machine learning

Integrating the R Language Runtime System with a Data Stream Warehouse

Sushant Ahuja, Cassio Cristovao, Sameep Mohta

Tensorflow Tutorial Homin Yoon.

COMPUTATIONAL MODELS.

Large-scale Machine Learning

Pathology Spatial Analysis February 2017

Chilimbi, et al. (2014) Microsoft Research

Ge Yang Ruoming Jin Gagan Agrawal The Ohio State University

Resource Elasticity for Large-Scale Machine Learning

Interquery Parallelism

CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.

Parallel Database Systems

Database Performance Tuning and Query Optimization

Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.

Evaluation of Relational Operations

Amir Kamil and Katherine Yelick

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

COS 518: Distributed Systems Lecture 10 Andrew Or, Mike Freedman

Applying Twister to Scientific Applications

湖南大学-信息科学与工程学院-计算机与科学系

Cse 344 May 2nd – Map/reduce.

CMPT 733, SPRING 2016 Jiannan Wang

Communication and Memory Efficient Parallel Decision Tree Construction

MapReduce Algorithm Design Adapted from Jimmy Lin’s slides.

Carlos Ordonez, Predrag T. Tosic

Distributed System Gang Wu Spring，2018.

Parallel Analytic Systems

The BIRCH Algorithm Davitkov Miroslav, 2011/3116

Chapter 11 Database Performance Tuning and Query Optimization

Amir Kamil and Katherine Yelick

Big Data Analytics: Exploring Graphs with Optimized SQL Queries

Wednesday, 5/8/2002 Hash table indexes, physical operators

Wellington Cabrera Carlos Ordonez

The Gamma Operator for Big Data Summarization

Wellington Cabrera Advisor: Carlos Ordonez

CMPT 733, SPRING 2017 Jiannan Wang

CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.

Wellington Cabrera Advisor: Carlos Ordonez

Carlos Ordonez, Javier Garcia-Garcia,

The Gamma Operator for Big Data Summarization on an Array DBMS

COS 518: Distributed Systems Lecture 11 Mike Freedman

MapReduce: Simplified Data Processing on Large Clusters

Lecture 20: Query Execution

Parallel Systems to Compute

Presentation transcript:

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1

Parallel architecture Shared-nothing, message-passing N nodes Data partitioned before computation Examples: Parallel DBMSs, HDFS, MapReduce, Spark

3

Old: separate sufficient statistics 4

New: Generalizing and unifying Sufficient Statistics: Z=[1,X,Y] 5

Linear Algebra: Our main result for parallel and scalable computation

2-phase algorithm

Equivalent equations with projections from Gamma (descriptive, predictive) 8

Fundamental properties: non-commutative but distributive 9

Parallel Theoretical Guarantees of  10

Dense matrix algorithm: O(d2 n) 11

Sparse matrix algorithm: O(d n) for hyper-sparse matrix 12

Pros: Algorithm evaluation with physical array operators Since xi fits in one chunk joins are avoided (at least 2X I/O with hash or merge join) Since xi*xiT can be computed in RAM we avoid an aggregation (avoid sorting points by I) No need to store X twice: X, XT: half I/O, half RAM space No need transpose X, costly reorg even in RAM, especially if X spans several RAM segments C++ compiled code: fast; vector accessed once; direct assignment (bypass C++ calls) 13

Running on the cloud

Running in the cloud, 100 nodes

Conclusions One pass summarization matrix operator: parallel, scalable. Algorithm compatible with any parallel shared-nothing system, but better for array systems Optimization of outer matrix multiplication as sum (aggregation) of vector outer products Algorithm:Dense and sparse matrix versions required Gamma matrix must fit in RAM, but n unlimited ML methods to two phases: 1: Summarization, 2: Computing model parameters. Summarization matrix can be exploited in many intermediate computations. 16

Future work: Theory Study Gamma in other models like logistic regression, clustering, Factor Analysis, HMMs, Kalman filters Clustering Model: for frequent itemset Higher-order expected moments, co-variates Numeric stability with unnormalized sorted data: unlikely 17