Neural Networks: An Introduction and Overview 1/13/2018 Neural Networks: An Introduction and Overview Jim Ries NLM Predoctoral Fellow JimR@acm.org 6/13/2000
1/13/2018 Introduction Provide an intuitive feel for what NN’s are and problems for which they are an appropriate tool. NOT overwhelm you with mathematics. Caveat: I’m not an NN researcher; just an interested outsider (like most of you).
Topics of Discussion What are Neural Networks? Training History 1/13/2018 Topics of Discussion What are Neural Networks? Training History Alternative Methods Applications Conclusions Questions
1/13/2018 What are Neural Nets? A mechanism for approximating a function, given some sample or “training” data. A mechanism for classifying, clustering, or recognizing patterns in data. These two broad applications are essentially the same (e.g., imagine a function that outputs a discrete number indicating a cluster).
What are Neural Nets? (cont.) 1/13/2018 What are Neural Nets? (cont.) Rosenblatt’s Perceptron: a network of processing elements (PE): Y1 Yp a1 am . . . - Notice that weights determine the amount with which each x affects each a. - The weights are updated via a learning rule at each iteration of input. - Notice that networks need NOT be fully connected as shown here. x1 x2 x3 xn . . .
What are Neural Nets? (cont.) 1/13/2018 What are Neural Nets? (cont.) Additional layer(s) can be added: Y1 Yp a1 am . . . - We can add an arbitrary number of hidden layers. - Additional hidden layers tend to increase the ability of the network to learn complex functions, but also increase learning times required. h1 hm . . . x1 x2 x3 xn . . .
What are Neural Nets? (cont.) 1/13/2018 What are Neural Nets? (cont.)
What are Neural Nets? (cont.) 1/13/2018 What are Neural Nets? (cont.) A “node” (PE) is typically represented as a function. Simple functions can quickly be “trained” or updated to fit a curve to data, but are unable to fit well to complex data (e.g., linear functions can never approximate quadratics). Universal Approximator! (typically Radial Basis Function).
1/13/2018 Training With simple Perceptron model, we can train by adjusting the weights on inputs when the output does not match test data. The amount of adjustment we do at each training iteration is called the “learning rate”.
1/13/2018 Training (cont.) With one or more hidden layers, training requires some sort of “propagation algortihm”. Backpropagation is commonly used and is an extension to the “Minimum Disturbance Algorithm”:
Minimum Disturbance Algorithm 1/13/2018 Training (cont.) Minimum Disturbance Algorithm 1) Apply an example, propagate inputs to output 2) Count # of incorrect output units 3) For output units, do a number of times Select unselected units closest to zero activation Change weights if less errors, use new weights, else old 4) Repeat step #3 for all layers See also handout on backpropagation algorithm
1/13/2018 Training (cont.) Overfitting - fits a function to training data, but does not approximate real world. Ways to avoid overfitting Regularization (assumes real function is “smooth”. Early stopping Curvature-driven
1/13/2018 History Early 1960’s - Rosenblatt’s Perceptron (Rosenblatt, F., Principles of Neurodynamics, New York: Spartan Books, 1962). Late 1960’s - Minsky (Minsky, M. and Papert, S., Perceptrons, MIT Press, Cambridge, 1969). 1970’s & early 1980’s - largely empty of NN activity due to Minsky.
1/13/2018 History (cont.) Late 1980’s - NN re-emerge with Rumelhart and McClelland (Rumelhart, D., McClelland, J., Parallel and Distributed Processing, MIT Press, Cambridge, 1988). Since PDP there has been an explosion of NN literature.
Alternative Methods Classical statistical methods Symbolic approach. 1/13/2018 Alternative Methods Classical statistical methods Fail in on-line scenarios Not universal approximators (e.g., linear regression) Assume normal distribution. Symbolic approach. Expert Systems Mathematical Logic (e.g., Prolog) Schemas, Frames, or Scripts
Alternative Methods (cont.) 1/13/2018 Alternative Methods (cont.) NN’s are the “Connectionist” approach. Encoding of data can be a “creative” endeavor Ensemble Approach Baysian Networks Fuzzy NN
Applications Control Forecasting 1/13/2018 Applications Control Forecasting Provide faster approximations compared to exact algorithms (e.g., NeuroBlast). Compression Cognitive Modeling
1/13/2018 Conclusions NN’s are useful for a wide variety of tasks, but care must be taken to choose the correct algorithms for a given problem domain. NN’s are not a panacea, and other approaches may be appropriate for given problems.
1/13/2018 Questions?