Experimental Perspectives on Lasso-related Algorithms on Parallel Computing Frameworks

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. or its affiliates. All rights reserved.1 PowerSchool 7.0 PowerSchool Application Architecture –PowerSchool 7.0.
Advertisements

WSUS Presented by: Nada Abdullah Ahmed.
A Casual Chat on Convex Optimization in Machine Learning Data Mining at Iowa Group Qihang Lin 02/09/2014.
Foundations & Core in Computer Vision: A System Perspective Ce Liu Microsoft Research New England.
Dynamically Scaling Applications in the Cloud Presented by Paul.
Summary Background –Why do we need parallel processing? Applications Introduction in algorithms and applications –Methodology to develop efficient parallel.
Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.
Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing large amount.
Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization Tyler B. Johnson and Carlos Guestrin University of Washington.
1 Random Forest in Distributed R Arash Fard Vishrut Gupta.
F5 Application Designer Extensions F5 Management Pack
Sebastian Schelter, Venu Satuluri, Reza Zadeh
GraphLab A New Framework for Parallel Machine Learning
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
1 Decentralized Jointly Sparse Optimization by Reweighted Lq Minimization Qing Ling Department of Automation University of Science and Technology of China.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
A performance analysis of multicore computer architectures Michel Schelske.
SU YUXIN JAN 20, 2014 Petuum: An Iterative-Convergent Distributed Machine Learning Framework.
DNA REASSEMBLY Using Javaspace Sung-Ho Maeung Laura Neureuter.
GRAPH PROCESSING Hi, I am Mayank and the second presenter for today is Shadi. We will be talking about Graph Processing.
Independent Component Analysis (ICA) A parallel approach.
Carrying Your Environment With You or Virtual Machine Migration Abstraction for Research Computing.
HeuristicLab Hive An Open Source Environment for Parallel and Distributed Execution of Heuristic Optimization Algorithms S. Wagner, C. Neumüller, A. Scheibenpflug.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Large-scale Deep Unsupervised Learning using Graphics Processors
Indexing HDFS Data in PDW: Splitting the data from the index VLDB2014 WSIC、Microsoft Calvin
Looking Ahead to Carbon 5 and Stratos 2 and Beyond By Afkham Azeez, Amila Suriarachchi.
CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.
VMware vSphere Configuration and Management v6
Is Your Graph Algorithm Eligible for Nondeterministic Execution? Zhiyuan Shao, Lin Hou, Yan Ai, Yu Zhang and Hai Jin Services Computing Technology and.
PETUUM A New Platform for Distributed Machine Learning on Big Data
Particle Swarm Optimization † Spencer Vogel † This presentation contains cheesy graphics and animations and they will be awesome.
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Practical Message-passing Framework for Large-scale Combinatorial Optimization Inho Cho, Soya Park, Sejun Park, Dongsu Han, and Jinwoo Shin KAIST 2015.
Issues on the operational cluster 1 Up to 4.4x times variation of the execution time on 169 cores Using -O2 optimization flag Using IBM MPI without efficient.
Large Scale Distributed Distance Metric Learning by Pengtao Xie and Eric Xing PRESENTED BY: PRIYANKA.
CS 732: Advance Machine Learning
Factorbird: a Parameter Server Approach to Distributed Matrix Factorization Sebastian Schelter, Venu Satuluri, Reza Zadeh Distributed Machine Learning.
Kriging for Estimation of Mineral Resources GISELA/EPIKH School Exequiel Sepúlveda Department of Mining Engineering, University of Chile, Chile ALGES Laboratory,
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Let's build a VMM service template from A to Z in one hour Damien Caro Technical Evangelist Microsoft Central & Eastern Europe
Chenning Xie+, Rong Chen+, Haibing Guan*, Binyu Zang+ and Haibo Chen+
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Organizations Are Embracing New Opportunities
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Matt Lemons Nate Mayotte
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
IM.Grid: A Grid Computing Solution for image processing
Large-scale Machine Learning
Distributed Network Traffic Feature Extraction for a Real-time IDS
Distributed Computation Framework for Machine Learning
GdX - Grid eXplorer parXXL: A Fine Grained Development Environment on Coarse Grained Architectures PARA 2006 – UMEǺ Jens Gustedt - Stéphane Vialle - Amelia.
From Algorithm to System to Cloud Computing
Windows Server 2008 and SQL 2008 Windows Server 2008.
Chapter 6 Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism Topic 11 Amazon Web Services Prof. Zhang Gang
Load Balancing: List Scheduling
Unstructured Grids at Sandia National Labs
Verilog to Routing CAD Tool Optimization
Summary Background Introduction in algorithms and applications
Particle swarm optimization
CS110: Discussion about Spark
Support for ”interactive batch”
IBM Power Systems.
Scalable Fast Rank-1 Dictionary Learning for fMRI Big Data Analysis
TensorFlow: A System for Large-Scale Machine Learning
Load Balancing: List Scheduling
The Gamma Operator for Big Data Summarization on an Array DBMS
Presentation transcript:

Experimental Perspectives on Lasso-related Algorithms on Parallel Computing Frameworks Jichuan Zeng

Experimental Perspectives on Lasso-related Algorithms on Parallel Computing Frameworks Big Data era (4V, Volume, Variety, Variability, Velocity) Still lacks comparison of the state of art Big Data frameworks in specific problems in large-scale dataset. Lasso-related algorithm large-scale, sparsity and slow convergence What are main features and differences of these distributed ML frameworks? Can distributed ML frameworks above capable of solving the lasso-related optimization problem on huge-scale data sets? What is the trade-off between the performance of frameworks and the sparsity of data? What is main factor in each framework that retards the lasso-related algorithms when the scale of data soars?

Distributed Machine Learning Frameworks Graphlab - Graph-based Petuum Parameter Server Stale Synchronous Parallel Spark General-purpose Resilient Distributed Datasets

Lasso on ML Frameworks Environment Arcane Multi-core servers, running VMware. For each virtual machine, we configured 4 cores(2.5 GHz each) and 16 GB of RAM Dataset Arcene The sparser dataset contains 10K features and 54K non-zero entries Graphlab performs poor in shared variables applications compared to the Petuum and Spark which deploy on master/workers mode

Future Works More lasso-related models Group lasso Elastic net Fused lasso Graphical lasso Try to improve the current distributed ML frameworks Load balance Heuristic update