READINGS IN DEEP LEARNING 4 Sep 2013. ADMINSTRIVIA New course numbers (11-785/786) are assigned – Should be up on the hub shortly Lab assignment 1 up.

READINGS IN DEEP LEARNING 4 Sep 2013

ADMINSTRIVIA New course numbers (11-785/786) are assigned – Should be up on the hub shortly Lab assignment 1 up – Due date: 2 weeks from today Google group: is everyone on? Website issues.. – Wordpress not yet an option (CMU CS setup) – Piazza?

Poll for next 2 classes Monday, Sep 9 – The perceptron: A probabilistic model for information storage and organization in the brain Rosenblatt Not really about the logistic perceptron, more about the probabilistic interpretation of learning in connectionist networks – Organization of behavior Donald Hebb About the Hebbian learning rule

Poll for next 2 classes Wed, Sep 11 – Optimal unsupervised learning in a single-layer linear feedforward neural network. Terence Sanger Generalized Hebbian learning rule – The Widrow Hoff learning rule Widrow and Hoff Will be presented by Pallavi Baljekar

Notices Success of course depends on good presentations Please send in your slides 1-2 days before the presentations – So that we can ensure they are OK You are encouraged to discuss your papers with us/your classmates while preparing for them – Use the google group for discussion

A new project Distributed large scale training of NNs.. Looking for volunteers

The Problem: Distributed data Training enormous networks – Billions of units from large amounts of data – Billions or Trillions of instances – Data may be localized.. – Or distributed

The problem: Distributed computing A single computer will not suffice – Need many processors – Tens or hundreds or thousands of computers Of possibly varying types and capacity

Challenge Getting the data to the computers – Tons of data to many computers Bandwidth problems Timing issues – Synchronizing the learning

Logistic Challenges How to transfer vast amounts of data to processors Which processor gets how much data.. – Not all processors equally fast – Not all data take equal amounts of time to process.. and which data – Data locality

Learning Challenges How to transfer parameters to processors – Networks are large, billions or trillions of parameters – Each processor must have the latest copy of parameters How to receive updates from processors – Each processor learns on local data – Updates from all processors must be pooled

Learning Challenges Synchronizing processor updates – Some processors slower than others – Inefficient to wait for slower ones In order to update parameters at all processors Requires asynchronous updates – Each processor updates when done – Problem: Different processors now have different set of parameters Other processors may have updated parameters already Requires algorithmic changes – How to update asynchronously – Which updates to trust

Current Solutions Faster processors GPUs – GPU programming required Large simple clusters – Simple distributed programming Large heterogeneous clusters – Techniques for asynchronous learning

Current Solutions Still assume data distribution not a major problem Assume relatively fast connectivity – Gigabit ethernet Fundamentally cluster-computing based – Local area network

New project Distributed learning Wide area network – Computers distributed across the world

New project Supervisor/Worker architecture One or more supervisors – May be a hierarchy A large number of workers Supervisors in charge of resource and task allocation, gathering and redistributing updates, synchronization

New project Challenges Data allocation – Optimal policy for data distribution Minimal latency Maximum locality

New project Challenges Computation allocation – Optimal policy for learning Compute load proportional to compute capacity Reallocation of data/task as required

New project Challenges Parameter allocation – Do we have to distribute all parameters – Can learning be local

New project Challenges Trustable updates – Different processors/LANs have different speeds – How do we trust their updates Do we incorporate or reject?

New project Optimal resychronization: how much do we transmit – Should not have to retransmit everything – Entropy coding? – Bit-level optimization?

Possibilities Massively parallel learning Never ending learning Multimodal learning GAIA..

Asking for Volunteers Will be an open source project Write to Anders

Today Bain’s theory: Lars Mahler – Linguist, mathematician, philosopher – One of the earliest people to propose connectionist architecture – Anticipated much of modern ideas McCulloch and Pitts: Kartik Goyal – Early model of neuron: Threshold gates – Earliest model to consider excitation and inhibition

READINGS IN DEEP LEARNING 4 Sep 2013. ADMINSTRIVIA New course numbers (11-785/786) are assigned – Should be up on the hub shortly Lab assignment 1 up.

Similar presentations

Presentation on theme: "READINGS IN DEEP LEARNING 4 Sep 2013. ADMINSTRIVIA New course numbers (11-785/786) are assigned – Should be up on the hub shortly Lab assignment 1 up."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

READINGS IN DEEP LEARNING 4 Sep 2013. ADMINSTRIVIA New course numbers (11-785/786) are assigned – Should be up on the hub shortly Lab assignment 1 up.

Similar presentations

Presentation on theme: "READINGS IN DEEP LEARNING 4 Sep 2013. ADMINSTRIVIA New course numbers (11-785/786) are assigned – Should be up on the hub shortly Lab assignment 1 up."— Presentation transcript:

Similar presentations

About project

Feedback