Closing Remarks Cyrus M. Vahid, Principal Solutions Architect,

Slides:



Advertisements
Similar presentations
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Advertisements

+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.
Lecturer: Simon Winberg Lecture 18 Amdahl’s Law & YODA Blog & Design Review.
Example (1) Two computer systems have been tested using three benchmarks. Using the normalized ratio formula and the following tables below, find which.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Quantitative.
Recap.
CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.
Steve Lantz Computing and Information Science Parallel Performance Week 7 Lecture Notes.
Parallel System Performance CS 524 – High-Performance Computing.
CS 240A: Complexity Measures for Parallel Computation.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Amdahl's Law Validity of the single processor approach to achieving large scale computing capabilities Presented By: Mohinderpartap Salooja.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 6.
Message Passing Computing 1 iCSC2015,Helvi Hartmann, FIAS Message Passing Computing Lecture 1 High Performance Computing Helvi Hartmann FIAS Inverted CERN.
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
Compiled by Maria Ramila Jimenez
CS 8625 High Performance and Parallel, Dr. Hoganson Copyright © 2005, 2006 Dr. Ken Hoganson CS8625-June Class Will Start Momentarily… Homework.
Lecturer: Simon Winberg Lecture 18 Amdahl’s Law (+- 25 min)
Parallel Processing Steve Terpe CS 147. Overview What is Parallel Processing What is Parallel Processing Parallel Processing in Nature Parallel Processing.
Parallel Processing Sharing the load. Inside a Processor Chip in Package Circuits Primarily Crystalline Silicon 1 mm – 25 mm on a side 100 million to.
Scaling Area Under a Curve. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Chapter 1 Performance & Technology Trends Read Sections 1.5, 1.6, and 1.8.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Pipelining and Parallelism Mark Staveley
READINGS IN DEEP LEARNING 4 Sep ADMINSTRIVIA New course numbers (11-785/786) are assigned – Should be up on the hub shortly Lab assignment 1 up.
CSIS Parallel Architectures and Algorithms Dr. Hoganson Speedup Summary Balance Point The basis for the argument against “putting all your (speedup)
Scaling Conway’s Game of Life. Why do parallelism? Speedup – solve a problem faster. Accuracy – solve a problem better. Scaling – solve a bigger problem.
Complexity Measures for Parallel Computation. Problem parameters: nindex of problem size pnumber of processors Algorithm parameters: t p running time.
Amdahl’s Law CPS 5401 Fall 2013 Shirley Moore
Computer Science 320 Measuring Sizeup. Speedup vs Sizeup If we add more processors, we should be able to solve a problem of a given size faster If we.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.
Classification of parallel computers Limitations of parallel processing.
Performance. Moore's Law Moore's Law Related Curves.
High Performance Computer Architecture:
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Software Architecture in Practice
September 2 Performance Read 3.1 through 3.4 for Tuesday
Large-scale Machine Learning
Serverless ML/Analytics
Introduction to Parallelism.
CS : Technology Trends August 31, 2015 Ion Stoica and Ali Ghodsi (
Parallel Processing and GPUs
Parallel Computers.
Complexity Measures for Parallel Computation
Parallel Processing Sharing the load.
MXNet Internals Cyrus M. Vahid, Principal Solutions Architect,
Logistic Regression & Parallel SGD
CSE8380 Parallel and Distributed Processing Presentation
Quiz Questions Parallel Programming Parallel Computing Potential
CS 584.
By Brandon, Ben, and Lee Parallel Computing.
Multithreaded Programming in Cilk Lecture 1
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Parallelism and Amdahl's Law
What makes it work…let’s get physical
Complexity Measures for Parallel Computation
PANN Testing.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Quiz Questions Parallel Programming Parallel Computing Potential
Quiz Questions Parallel Programming Parallel Computing Potential
Quiz Questions Parallel Programming Parallel Computing Potential
Changes made to the WA-LISTS in 2015
Search-Based Approaches to Accelerate Deep Learning
Presentation transcript:

Closing Remarks Cyrus M. Vahid, Principal Solutions Architect, Principal Solutions Architect @ AWS Deep Learning cyrusmv@amazon.com June 2017

1014 parameters Brain runs at 100Hz

Brain-sized DNN Today Today: g2.16xlarge has 192 GB of GPU RAM. 32 bits/param. SGD needs 3x copies. => 12B parameters So you’d need 8,000x p2.16xl instances Cost: $115,000/hr

Amdahl’s Law Evolution according to Amdahl's law of the theoretical speedup in latency of the execution of a program in function of the number of processors executing it, for different values of p. The speedup is limited by the serial part of the program. For example, if 95% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 20 times.

Moore’s Law Price/performance doubles every 12-18 months Already on the fast side of that. TPUs from Google / IBM / Nervana

Yann Le Cun “The best neural networks have always taken 3 weeks to train.”

Brain-sized DNN in 2026 256x = 28, Moore’s law: 8 doublings = 8-12 years 400x p7.256xl* instances. $1,000/hr* 3 weeks to train: $500k *Pure speculation, obviously

Cyrus M. Vahid cyrusmv@amazon.com