Scaling Distributed Machine Learning with the Parameter Server By M. Li, D. Anderson, J. Park, A. Smola, A. Ahmed, V. Josifovski, J. Long E. Shekita, B.

Slides:

Advertisements

Similar presentations

Atlas Server – A Tool for Atlas Mapping Altai State Technical University Public Fund Altai 21-st Century Barnaul, Russia Irina Mikhailidi.

Advertisements

Managing Large Graphs on Multi-Cores With Graph Awareness Vijayan, Ming, Xuetian, Frank, Lidong, Maya Microsoft Research.

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,

Cloudifying Source Code Repositories: How much does it cost? LADIS 2009 Big Sky, Montana Michael Siegenthaler Hakim Weatherspoon Cornell University.

Spark: Cluster Computing with Working Sets

1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,

Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Fall 2007cs4251 Distributed Computing Umar Kalim Dept. of Communication Systems Engineering 31/10/2007.

September 24, 2007The 3 rd CSAIL Student Workshop Byzantine Fault Tolerant Cooperative Caching Raluca Ada Popa, James Cowling, Barbara Liskov Summer UROP.

7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.

Distributed Iterative Training Kevin Gimpel Shay Cohen Severin Hacker Noah A. Smith.

DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.

Slingshot: Deploying Stateful Services in Wireless Hotspots Ya-Yunn Su Jason Flinn University of Michigan.

Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.

Selected Topics on Databases n Multi-User Databases –more than one user processes the database at the same time n System Architectures for Multi-User Environments.

Learning with large datasets Machine Learning Large scale machine learning.

Orbe: Scalable Causal Consistency Using Dependency Matrices & Physical Clocks Jiaqing Du, EPFL Sameh Elnikety, Microsoft Research Amitabha Roy, EPFL Willy.

Extreme scale Lack of decomposition for insight Many services have centralized designs Impacts of service architectures  an open question Using Simulation.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.

CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.

Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.

MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.

An Integration Framework for Sensor Networks and Data Stream Management Systems.

Mobile Topic Maps for e-Learning John McDonald & Darina Dicheva Intelligent Information Systems Group Computer Science Department Winston-Salem State University,

March 19981© Dennis Adams Associates Tuning Oracle: Key Considerations Dennis Adams 25 March 1998.

Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve ， Devendra Dahiphale ， Amit Chhajer 報告 : 饒展榕.

Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.

By: Ashish Gohel 8 th sem ISE.. Why Cloud Computing ? Cloud Computing platforms provides easy access to a company’s high-performance computing and storage.

CoFM: An Environment for Collaborative Feature Modeling Li Yi Institute of Software, School of EECS, Peking University Key Laboratory of High Confidence.

Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.

Integrating SMS 2003 Into a Work Place Environment Adam Bolton EKU, Dept. of Technology, CEN/CET.

Chapter 12: SYSVOL: Old & New BAI617. Chapter Topics What is SysVol? Understanding File Replication System (FRS) Understanding 2008 R2 Distributed.

Active Directory Overview n Course: Operating System n Professor: Mort Anvari n Student: Lina Si n Date: 09/07/02.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Multicast instant channel change in IPTV systems 1.

Efficient Logistic Regression with Stochastic Gradient Descent William Cohen.

NoSQL Cheng Lei Department of Electrical and Computer Engineering University of Victoria Mar 05, 2015.

1 Active Directory Service in Windows 2000 Li Yang SID: November 2000.

Bigtable: A Distributed Storage System for Structured Data Google Inc. OSDI 2006.

Managed Communication and Consistency for Fast Data- Parallel Iterative Analytics Jinliang WeiWei DaiAurick QiaoQirong HoHenggang Cui Gregory R. GangerPhillip.

Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.

András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.

Cookies Tutorial Cavisson Systems Inc..

TensorFlow– A system for large-scale machine learning

Convolutional Neural Network

Chilimbi, et al. (2014) Microsoft Research

Slingshot: Deploying Stateful Services in Wireless Hotspots

Machine Learning Basics

Replication Middleware for Cloud Based Storage Service

Be Fast, Cheap and in Control

Providing Secure Storage on the Internet

Outline Midterm results summary Distributed file systems – continued

CS110: Discussion about Spark

Logistic Regression & Parallel SGD

CS 345A Data Mining MapReduce This presentation has been altered.

Key Manager Domains February, 2019.

CS639: Data Management for Data Science

TensorFlow: A System for Large-Scale Machine Learning

Distributed Edge Computing

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang University of Houston, USA 1.

Accelerating Distributed Machine Learning by Smart Parameter Server

Presentation transcript:

Scaling Distributed Machine Learning with the Parameter Server By M. Li, D. Anderson, J. Park, A. Smola, A. Ahmed, V. Josifovski, J. Long E. Shekita, B. Su. EECS 582 – W161

Outline Motivation Parameter Server architecture Why is it special? Evaluation EECS 582 – W162

Motivation EECS 582 – W163

Motivation EECS 582 – W164

Motivation EECS 582 – W165

Parameter Server EECS 582 – W166

Parameter Server EECS 582 – W167 How is this different from a traditional KV store?

Key Value Stores EECS 582 – W168 Key-Value Store Clients

Diff from KV (leverage domain knowledge) Treat KVs as sparse matrices Computation occurs on PSs Flexible consistency Intelligent replication Message Compression EECS 582 – W169

KVs as Sparse Matrices EECS 582 – W1610

KVs as Sparse Matrices EECS 582 – W1611

User-defined Functions PSs aggregate data from the workers: In distributed gradient descent workers push gradients that PSs use to calculate how to update the parameters Users can also supply user defined functions to run on the server EECS 582 – W1612

Flexible Consistency EECS 582 – W1613

Flexible Consistency EECS 582 – W1614

Intelligent Replication EECS 582 – W1615

Intelligent Replication EECS 582 – W1616

Message Compression EECS 582 – W1617

Message Compression Training data does not change: Only send hash of training data on subsequent iterations Values are often 0 Only send non-zero values EECS 582 – W1618

Evaluation (logistic regression) EECS 582 – W1619

Evaluation (logistic regression) EECS 582 – W1620

Evaluation (logistic regression) EECS 582 – W1621

Evaluation (logistic regression) EECS 582 – W1622

Evaluation (Topic Modeling) EECS 582 – W1623

Conclusion The Parameter Server model is a much better model for machine learning computation than comparing models EECS 582 – W1624

Conclusion The Parameter Server model is a much better model for machine learning computation than comparing models EECS 582 – W1625 Designing a system around a particular problem exposes optimizations which dramatically improve performance