Chapter 4: Artificial Neural Networks

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Artificial Neural Networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
CS 4700: Foundations of Artificial Intelligence
Reading for Next Week Textbook, Section 9, pp A User’s Guide to Support Vector Machines (linked from course website)
Classification Neural Networks 1
Machine Learning Neural Networks
Overview over different methods – Supervised Learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Artificial Neural Networks ML Paul Scheible.
Neural Networks Marco Loog.
Artificial Neural Networks #1 Machine Learning CH4 : 4.1 – 4.5
Machine Learning Neural Networks.
Back-Propagation Algorithm
Artificial Neural Networks
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
LOGO Classification III Lecturer: Dr. Bo Yuan
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
ICS 273A UC Irvine Instructor: Max Welling Neural Networks.
CS 484 – Artificial Intelligence
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Graphical Models in Machine Learning
CISC 4631 Data Mining Lecture 11: Neural Networks.
Artificial Neural Networks
Computer Science and Engineering
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
CS464 Introduction to Machine Learning1 Artificial N eural N etworks Artificial neural networks (ANNs) provide a general, practical method for learning.
Machine Learning Chapter 4. Artificial Neural Networks
Artificial Neural Network Yalong Li Some slides are from _24_2011_ann.pdf.
Artificial Neural Networks Biointelligence Laboratory Department of Computer Engineering Seoul National University.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Linear Discrimination Reading: Chapter 2 of textbook.
Non-Bayes classifiers. Linear discriminants, neural networks.
EE459 Neural Networks Backpropagation
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Artificial Neural Networks (Cont.) Chapter 4 Perceptron Gradient Descent Multilayer Networks Backpropagation Algorithm 1.
Artificial Neural Network
EEE502 Pattern Recognition
Multilayer Neural Networks (sometimes called “Multilayer Perceptrons” or MLPs)
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
Artificial Neural Network. Introduction Robust approach to approximating real-valued, discrete-valued, and vector-valued target functions Backpropagation.
129 Feed-Forward Artificial Neural Networks AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos Discovery Systems Laboratory.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Machine Learning Supervised Learning Classification and Regression
Fall 2004 Backpropagation CS478 - Machine Learning.
第 3 章 神经网络.
Artificial Neural Networks
Artificial Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
Artificial Neural Networks
Classification Neural Networks 1
Artificial Intelligence Chapter 3 Neural Networks
Artificial Neural Networks
Artificial Neural Networks
Lecture Notes for Chapter 4 Artificial Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Seminar on Machine Learning Rada Mihalcea
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Chapter 4: Artificial Neural Networks

Artificial neural network(ANN) General, practical method for learning real-valued, discrete-valued, vector-valued functions from examples BACPROPAGATION 알고리즘 Use gradient descent to tune network parameters to best fit a training set of input-output pairs ANN learning Training example의 error에 강하다. Interpreting visual scenes, speech recognition, learning robot control strategy

Biological motivation 생물학적인 뉴런과의 유사성 병렬 계산(parallel computing) 분산 표현(distributed representation) 생물학적인 뉴런과의 차이점 처리 단위(뉴런)의 출력

ALVINN system

신경망 학습에 적합한 문제 학습해야 하는 현상이 여러 가지 속성에 의해 표현되는 경우 출력 결과는 문제에 적당한 종류의 값을 가질 수 있다. 학습 예제에 에러(noise)가 존재할 가능성 긴 학습 시간 학습 결과의 신속한 적용 학습된 결과를 사람이 이해하는 것이 필요없는 경우

Perceptrons vector of real-valued input weights & threshold learning: choosing values for the weights

Perceptron learning의 hypotheses space n: input vector의 차수

Perceptron의 표현력 linearly separable example에 대한 hyperplane decision surface many boolean functions(XOR 제외) m-of-n function disjunctive normal form: 복수의 unit

Perceptron rule 유한번의 학습 후 올바른 가중치를 찾아내려면 충족되어야 할 사항 training example이 linearly separable 충분히 작은 learning rate

Gradient descent & Delta rule for non-linearly separable unthresholded od 는 w에 대한 함수값

Hypethesis space

Gradient descent gradient: steepest increase in E

Gradient descent(cont’d) Training example의 linearly separable 여부에 관계없이 하나의 global minimum을 찾는다. Learning rate가 큰 경우 overstepping의 문제 -> learning rate를 점진적으로 줄이는 방법을 사용하기도 한다.

Stochastic approximation to gradient descent hypothesis space is continuously parameterized error가 hypothesis parameter에 의해 미분 가능해야 한다. Gradient descent의 단점 시간이 오래 걸린다. 다수의 local minima가 존재하는 경우

Stochastic approximation to gradient descent(cont’d) 하나의 training example을 적용해서 E를 구하고 바로 weight를 갱신한다. 실제의 descent gradient를 추측 보다 낮은 learning rate를 사용 multiple local minima를 피할 가능성이 있다. Delta rule

Remark Perceptron rule Delta rule thresholded output 정확한 weight linearly separable Delta rule unthresholded output 점근적으로 에러를 최소화하는 weight non-linearly separable

Multilayer networks Nonlinear decision surface

Differential threshold unit Sigmoid function nonlinear, differentiable

i j(h) k o1 o1 x1 x21 w12 w21 o2 o2 x2 x22 w22 w22 w23 x23 w32 o3 o3 net1 net1     x21 w12 w21 o2 o2 x2 x22 w22 net2 w22 net2     w23 x23 w32 o3 o3 x3 net3 net3    

BACKPROPAGATION 알고리즘 새로운 error의 정의

BACKPROPAGATION 알고리즘(cont’d) Multiple local minima Termination fixed number of iteration error threshold error of separate validation set

BACKPROPAGATION 알고리즘(cont’d) Adding momentum 직전의 loop에서의 weight 갱신이 영향을 미침 Learning in arbitrary acyclic network downstream(r)

BACKPROPAGATION rule

BACKPROPAGATION rule(cont’d) Training rule for output unit

i j(h) k o1 o1 x1 x21 w12 w21 o2 o2 x2 x22 w22 w22 w23 x23 w32 o3 o3 net1 net1     x21 w12 w21 o2 o2 x2 x22 w22 net2 w22 net2     w23 x23 w32 o3 o3 x3 net3 net3    

BACKPROPAGATION rule(cont’d) Training rule for hidden unit

Convergence and local minima Only guarantee local minima This problem is not severe Algorithm is highly effective the more weights, the less local minima problem weight는 처음에 0에 가까운 값으로 초기화 해결책 momentum, stochastic, 복수의 network

Feedfoward network의 표현력 Boolean functions with two layers disjunctive normal form 하나의 입력에 하나의 hidden unit Continuous functions(bounded) Arbitrary functions with three layers linear combination of small functions

Hypothesis space search continuous -> distinct보다 유용 Inductive bias characterize의 어려움 완만한 interpolation

Hidden layer representation 사람이 미리 정해 준 feature만을 사용하는 경우보다 유연하며 미리 알 수 없는 특성을 파악하는데 유용하다.

Generalization, overfitting, stopping criterion Terminating condition error threshold는 위험 Generalization accuracy의 고려 Weight decay Validation data Cross-validation approach K-fold cross-validation

Face recognition for non-linearly separable unthresholded od 는 w에 대한 함수값

2 layers, 3 units -> 90% success learned hidden units Input image:120*128 ->30*32 계산상의 복잡도 감소 mean value(cf, ALVINN) 1-of-n output encoding many weights 모호성 해소에 도움 <0.9, 0.1, 0.1, 0.1> 2 layers, 3 units -> 90% success learned hidden units

Alternativce error functions Weight-tuning rule에 새로운 제약조건을 첨가하기 위해 사용 Penalty term for weight magnitude reducing the risk of overfitting Derivative of target function Minimizing cross-entropy for probabilistic function Weight sharing speech recognition

Alternative error minimization procedures Line search direction: same as backpropagation distance: minimum of the error function in this line very large or very small Conjugate gradient new direction: component of the error gradient remains zero

Recurrent networks

Dynamically modifying network structure 목적: 일반화의 정확도와 학습 효율의 향상 확장(without hidden unit) CASCADE-CORRELATION 학습 시간 단축, overfitting 문제 축소 “optimal brain damage” 학습 시간 단축