Presentation is loading. Please wait.

Presentation is loading. Please wait.

cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

Similar presentations


Presentation on theme: "cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11"— Presentation transcript:

1 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11
12/2/2018 Today’s Topics More on DEEP ANNs Convolution Max Pooling Drop Out Auto Association Final ANN Wrapup FYI: Some Resources (Google) deep-learning-modules-for-torch/ (Facebook; also see Microsoft etc) Read: Chapters 7-9 of Russell & Norvig; skim Section 7.7 and Sections - we’re going to cover math logic for AI next, delaying SVMs until after that 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

2 Back to Deep ANNs - Convolution & Max Pooling
C = Convolution, MP = Max Pooling (next) (ie, a CHANGE OF REP) 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

3 Look for 8’s in all the Right Places
Imagine we have a great 8 detector expressed as an 8x8 array of 0-1’s (see upper right) We want to find all the 8’s in an 1024x1024 image of 0-1’s Q: What might we do? A: ‘Slide’ the detector across the image and count # of matching bits in the detector and the ‘overlaid’ image etc 8 8 8 If count greater than some threshold, say an ‘8’ is there 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

4 Look for 8’s in all the Right Places
Q: What about 8’s in the image larger than 8x8 bits? A: Use ‘detectors’ of, say, 16x16, 32x32, 64x64, etc PS: Could also ‘slide’ slightly rotated 8’s of various sizes (too much rotation and becomes infinity symbol!) 8 8 8 etc 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

5 Back to Deep ANNs - Convolution (cont.)
CS 540 Fall 2015 (Shavlik) 12/2/2018 Input units rep’ing the image Back to Deep ANNs - Convolution (cont.) The ‘sliding window’ is the basic idea of convolution but each ‘template’ is a HU and the wgts are learned some HUs are coupled each group of HUs learns what to ‘look for’ we do hard-code the ‘size’ of the ‘template’ HU1 HU2 Our code would employ weight sharing, ie the corresponding weights in each HU (eg, the two thicker lines) would always have the same value BUT HU65, say, would connect to the same INPUTS as HU1 but would have different wgts, ie would be a different ‘feature detector’ 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

6 A Possibly Helpful Slide on Convolution from the Web
11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

7 Back to Deep ANNs - Max Pooling
Researchers have empirically found it helpful to ‘clean up’ the convolution scores by Creating the next layer of HUs where each HU holds the MAX score in an N  N array, for various values of N and across various locations This is called MAX POOLING (example on next slide) Advanced note: only BP through the max-pooling node with the max value (ie, assign all error from the pooled nodes to the one with the max value) 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

8 Back to Deep ANNs - Max Pooling Example (connections not shown)
Possible Nodes in Hidden Layer i + 1 9 4x4 max Hidden Layer i -4 5 4 6 -3 2 7 8 -5 9 3 1 5 6 8 9 2x2 max, non overlapping 5 6 8 9 2x2 max, overlapping (contains non-overlapping, so no need for both) 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

9 Back to Deep ANNs - Drop Out (from Hinton’s Group)
Each time one example is processed (forward + back prop) during TRAINING Randomly turn off (ie, ‘drop out’) a fraction (say,  = ½) of the input and hidden units During TUNING & TESTING scale all weights by (1 - ), since that is the fraction of time each unit was present during training (ie, so on average, weighted sums are the same) Adds ROBUSTNESS – need to learn multiple ways to compute the function being learned 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

10 Back to Deep ANNs - Drop Out as an Ensemble
Drop Out can be viewed as training an ensemble of ‘thinned’ ANNs - ie, consider all possible ANNs that one can construct by ‘thinning’ the non-output nodes in the original ANN - in each Drop Out step we are training ONE of these (but note that ALL since wgts shared) We implicitly store O(2N) networks in 1, where N = # non-output nodes becomes 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

11 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11
Auto Association Previously, the early/lower HUs in Deep ANNs trained by auto association (but with lots of data, BP suffices) Train ANN to Predict the Input (so unsupervised ML) The HU’s might learn a good, new rep of inputs Use early stopping to reduce overfitting 1 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

12 Warning: At the Research Frontier
Research on Deep ANNs changing rapidly, lot of IT-industry money dedicated to it Until recently, people used unsupervised ML to train all the HU layers except the final one (surprisingly, BP works through many levels when much data!) So this ‘slide deck’ likely to be out of date soon, if not already  11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

13 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11
Neural Network Wrapup ANNs compute weighted sums to make decisions Use (stochastic) gradient descent to adjust weights in order to reduce error (or cost) Only find local minima, though (but good enough!) Impressive testset accuracy, especially Deep ANNs on (mainly) vision tasks and natural language tasks Slow training (GPUs, parallelism, advanced optimization methods, etc help) Learned models hard to interpret 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11

14 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11
ANN Wrapup (cont.) Some Ideas Use a window to deal with variable length sequences (NetTalk) Preprocess raw data to improve learning (Uberbacher & Mural, 1991; Craven & Shavlik, 1993) Or do all the layers of DEEP ANNs do this for us? Might be the case Use prior knowledge to structure the network - later (Towell & Shavlik, 1994; Sec 12.3 and 12.4 of Mitchell) Extract rules to understand what was learned (Trepan) Use early stopping (or wgt decay) to avoid overfitting Can interpret output as a probability distribution (with some tweaks to standard algo) 11/17/16 cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11


Download ppt "cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11"

Similar presentations


Ads by Google