Download presentation
Presentation is loading. Please wait.
Published byLizbeth Wheeler Modified over 9 years ago
1
READINGS IN DEEP LEARNING 4 Sep 2013
2
ADMINSTRIVIA New course numbers (11-785/786) are assigned – Should be up on the hub shortly Lab assignment 1 up – Due date: 2 weeks from today Google group: is everyone on? Website issues.. – Wordpress not yet an option (CMU CS setup) – Piazza?
3
Poll for next 2 classes Monday, Sep 9 – The perceptron: A probabilistic model for information storage and organization in the brain Rosenblatt Not really about the logistic perceptron, more about the probabilistic interpretation of learning in connectionist networks – Organization of behavior Donald Hebb About the Hebbian learning rule
4
Poll for next 2 classes Wed, Sep 11 – Optimal unsupervised learning in a single-layer linear feedforward neural network. Terence Sanger Generalized Hebbian learning rule – The Widrow Hoff learning rule Widrow and Hoff Will be presented by Pallavi Baljekar
5
Notices Success of course depends on good presentations Please send in your slides 1-2 days before the presentations – So that we can ensure they are OK You are encouraged to discuss your papers with us/your classmates while preparing for them – Use the google group for discussion
6
A new project Distributed large scale training of NNs.. Looking for volunteers
7
The Problem: Distributed data Training enormous networks – Billions of units from large amounts of data – Billions or Trillions of instances – Data may be localized.. – Or distributed
8
The problem: Distributed computing A single computer will not suffice – Need many processors – Tens or hundreds or thousands of computers Of possibly varying types and capacity
9
Challenge Getting the data to the computers – Tons of data to many computers Bandwidth problems Timing issues – Synchronizing the learning
10
Logistic Challenges How to transfer vast amounts of data to processors Which processor gets how much data.. – Not all processors equally fast – Not all data take equal amounts of time to process.. and which data – Data locality
11
Learning Challenges How to transfer parameters to processors – Networks are large, billions or trillions of parameters – Each processor must have the latest copy of parameters How to receive updates from processors – Each processor learns on local data – Updates from all processors must be pooled
12
Learning Challenges Synchronizing processor updates – Some processors slower than others – Inefficient to wait for slower ones In order to update parameters at all processors Requires asynchronous updates – Each processor updates when done – Problem: Different processors now have different set of parameters Other processors may have updated parameters already Requires algorithmic changes – How to update asynchronously – Which updates to trust
13
Current Solutions Faster processors GPUs – GPU programming required Large simple clusters – Simple distributed programming Large heterogeneous clusters – Techniques for asynchronous learning
14
Current Solutions Still assume data distribution not a major problem Assume relatively fast connectivity – Gigabit ethernet Fundamentally cluster-computing based – Local area network
15
New project Distributed learning Wide area network – Computers distributed across the world
16
New project Supervisor/Worker architecture One or more supervisors – May be a hierarchy A large number of workers Supervisors in charge of resource and task allocation, gathering and redistributing updates, synchronization
17
New project Challenges Data allocation – Optimal policy for data distribution Minimal latency Maximum locality
18
New project Challenges Computation allocation – Optimal policy for learning Compute load proportional to compute capacity Reallocation of data/task as required
19
New project Challenges Parameter allocation – Do we have to distribute all parameters – Can learning be local
20
New project Challenges Trustable updates – Different processors/LANs have different speeds – How do we trust their updates Do we incorporate or reject?
21
New project Optimal resychronization: how much do we transmit – Should not have to retransmit everything – Entropy coding? – Bit-level optimization?
22
Possibilities Massively parallel learning Never ending learning Multimodal learning GAIA..
23
Asking for Volunteers Will be an open source project Write to Anders
24
Today Bain’s theory: Lars Mahler – Linguist, mathematician, philosopher – One of the earliest people to propose connectionist architecture – Anticipated much of modern ideas McCulloch and Pitts: Kartik Goyal – Early model of neuron: Threshold gates – Earliest model to consider excitation and inhibition
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.