Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici.

Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici Vishwani D. Agrawal Vitaly Vodyanoy University Reader: Weikuan Yu

Outlines Why Neural Networks Network Architectures Training Algorithms How to Design Neural Networks Problems in Second Order Algorithms Proposed Second Order Computation Proposed Forward-Only Algorithm Neural Network Trainer Conclusion & Recent Research

What is Neural Network Classification: separate the two groups (red circles and blue stars) of twisted points [1].

What is Neural Network Interpolation: with the given 25 points (red), find the values of points A and B (black)

What is Neural Network Human Solutions Neural Network Solutions

What is Neural Network Recognition: retrieve the noised digit images (left) to original images (right) Original ImagesNoised Images

What is Neural Network “Learn to Behave” Build any relationship between input and outputs [2] Learning Process“Behave”

Why Neural Network What makes neural network different Given Patterns (5×5=25) Testing Patterns (41×41=1,681)

Different Approximators Test Results of Different Approximators Mamdani fuzzyTSK fuzzyNeuro-fuzzySVM-RBFSVM-Poly NearestLinearSplineCubicNeural Network Matlab Function: Interp2

Comparison Neural networks behave potentially as the best approximator Methods of Computational IntelligenceSum Square Errors Fuzzy inference system – Mamdani319.7334 Fuzzy inference system – TSK35.1627 Neuron – fuzzy system27.3356 Support vector machine – RBF kernel28.9595 Support vector machine – polynomial kernel176.1520 Interpolation – nearest197.7494 Interpolation – linear28.6683 Interpolation – spline11.0874 Interpolation – cubic3.2791 Neural network – 4 neurons in FCC network2.3628 Neural network – 5 neurons in FCC network0.4648

A Single Neuron Two basic computations (1) (2)

Network Architectures Multiplayer perceptron network is the most popular architecture Networks with connections across layers, such as bridged multiplayer perceptron (BMLP) networks and fully connected cascade (FCC) networks are much powerful than MLP networks. Wilamowski, B. M. Hunter, D. Malinowski, A., "Solving parity-N problems with feedforward neural networks". Proc. 2003 IEEE IJCNN, 2546-2551, IEEE Press, 2003. M. E. Hohil, D. Liu, and S. H. Smith, "Solving the N-bit parity problem using neural networks," Neural Networks, vol. 12, pp1321-1323, 1999. Example: smallest networks for solving parity-7 problem (analytical results) MLP network FCC network BMLP network

Error Back Propagation Algorithm The most popular algorithm for neural network training Update rule of EBP algorithm [3] Developed based on gradient optimization Advantages: –Easy –Stable Disadvantages: –Very limited power –Slow convergence

Improvement of EBP Improved gradient using momentum [4] Adjusted learning constant [5-6]

Newton Algorithm Newton algorithm: using the derivative of gradient to evaluate the change of gradient, then select proper learning constants in each direction [7] Advantages: –Fast convergence Disadvantages: –Not stable –Requires computation of second order derivative

Gaussian-Newton Algorithm Gaussian-Newton algorithm: eliminate the second order derivatives in Newton Method, by introducing Jacobian matrix Advantages: –Fast convergence Disadvantages: –Not stable

Levenberg Marquardt Algorithm LM algorithm: blend EBP algorithm and Gaussian-Newton algorithm [8-9] –When evaluation error increases, μ increase, LM algorithm switches to EBP algorithm –When evaluation error decreases, μ decreases, LM algorithm switches to Gaussian-Newton method Advantages –Fast convergence –Stable training Comparing with first order algorithms, LM algorithm has much more powerful search ability, but it also requires more complex computation

Comparison of Different Algorithms Training XOR patterns using different algorithms XOR problem EBPα=0.1α=10 success rate100%18% average iteration17845.44179.00 average time (ms)3413.2646.83 XOR problem EBP using momentum α=0.1α=10 m=0.5 success rate100% average iteration18415.84187.76 average time (ms)4687.7939.27 XOR problem – EBP adjusted learning constant success rate100% average iteration170.23 average time (ms)41.19 XOR problem – Gaussian-Newton algorithm success rate6% average iteration1.29 average time (ms)2.29 XOR problem – LM algorithm success rate100% average iteration5.49 average time (ms)4.35

How to Design Neural Networks Traditional design: –Most popular training algorithm: EBP algorithm –Most popular network architecture: MLP network Results: –Large size neural networks –Poor generalization ability –Lots of engineers move to other methods, such as fuzzy systems

How to Design Neural Networks B. M. Wilamowski, "Neural Network Architectures and Learning Algorithms: How Not to Be Frustrated with Neural Networks," IEEE Ind. Electron. Mag., vol. 3, no. 4, pp. 56-63, 2009. –Over-fitting problem –Mismatch between size of training patterns and network size Recommended design policy: compact networks benefit generalization ability –Powerful training algorithm: LM algorithm –Efficient network architecture: BMLP network and FCC network 2 neurons3 neurons4 neurons5 neurons 6 neurons7 neurons8 neurons9 neurons

Problems in Second Order Algorithms Matrix inversion –Nature of second order algorithms –The size of matrix is proportional to the size of networks –As the size of networks increases, second order algorithms may not as efficient as first order algorithms

Problems in Second Order Algorithms Architecture limitation M. T. Hagan and M. Menhaj, "Training feedforward networks with the Marquardt algorithm". IEEE Trans. on Neural Networks, vol. 5, no. 6, pp. 989-993, 1994. (citation 2474) –Only developed for training MLP networks –Not proper for design compact networks Neuron-by-Neuron Algorithm B. M. Wilamowski, N. J. Cotton, O. Kaynak and G. Dundar, "Computing Gradient Vector and Jacobian Matrix in Arbitrarily Connected Neural Networks", IEEE Trans. on Industrial Electronics, vol. 55, no. 10, pp. 3784-3790, Oct. 2008. –SPICE computation routines –Capable of training arbitrarily connected neural networks –Compact neural network design: NBN algorithm + BMLP (FCC) networks –Very complex computation

Problems in Second Order Algorithms Memory limitation: –The size of Jacobian matrix J is P×M×N –P is the number of training patterns –M is the number of outputs –N is the number of weights Practically, the number of training patterns is huge and is encouraged to be as large as possible MINST handwritten digit database [10]: 60,000 training patterns, 784 inputs and 10 outputs. Using the simplest network architecture (1 neuron per output), the required memory could be nearly 35 GB. Limited by most of the Windows compiler.

Problems in Second Order Algorithms Computational duplication –Forward computation: calculate errors –Backward computation: error backpropagation In second order algorithms, both Hagan and Menhaj LM algorithm and NBN algorithm, the error backpropagation process has to be repeated for each output. –Very complex –Inefficient for networks with multiple outputs

Proposed Second Order Computation – Basic Theory Matrix Algebra [11] In neural network training, considering –Each pattern is related to one row of Jacobian matrix –Patterns are independent of each other Multiplication Methods Elements for storage Row-column (P × M) × N + N × N + N Column-row N × N + N Difference(P × M) × N Row-column multiplication Column-row multiplication Memory comparison Multiplication Methods AdditionMultiplication Row-column (P × M) × N × N Column-rowN × N × (P × M) Computation comparison

Proposed Second Order Computation – Derivation Hagan and Menhaj LM algorithm or NBN algorithm Improved Computation

Proposed Second Order Computation – Pseudo Code Properties: –No need for Jacobian matrix storage –Vector operation instead of matrix operation Main contributions: –Significant memory reduction –Memory reduction benefits computation speed –NO tradeoff ! Memory limitation caused by Jacobian matrix storage in second order algorithms is solved Again, considering the MINST problem, the memory cost for storage Jacobian elements could be reduced from more than 35 gigabytes to nearly 30.7 kilobytes Pseudo Code

Proposed Second Order Computation – Experimental Results Memory Comparison Time Comparison Parity-N ProblemsN=14N=16 Patterns16,38465,536 Structures15 neurons17 neurons Jacobian matrix sizes5,406,72027,852,800 Weight vector sizes330425 Average iteration99.2166.4 Success Rate13%9% AlgorithmsActual memory cost Traditional LM79.21Mb385.22Mb Improved LM3.41Mb4.30Mb Parity-N ProblemsN=9N=11N=13N=15 Patterns5122,0488,19232,768 Neurons10121416 Weights145210287376 Average Iterations38.5159.0268.08126.08 Success Rate58%37%24%12% AlgorithmsAveraged training time (s) Traditional LM0.7868.011508.4643,417.06 Improved LM0.3322.09173.792,797.93

Traditional Computation – Forward Computation For each training pattern p Calculate net for neuron j Calculate output for neuron j Calculate derivative for neuron j Calculate output at output m Calculate error at output m

Traditional Computation – Backward Computation For first order algorithms Calculate delta [12] Do gradient vector For second order algorithms Calculate delta Calculate Jacobian elements

Proposed Forward-Only Algorithm Extend the concept of backpropagation factor δ –Original definition: backpropagated from output m to neuron j –Our definition: backpropagated from neuron k to neuron j

Proposed Forward-Only Algorithm Regular Table –lower triangular elements: k≥j, matrix δ has triangular shape –diagonal elements: δ k,k =s k –Upper triangular elements: weight connections between neurons

Proposed Forward-Only Algorithm Train arbitrarily connected neural networks

Proposed Forward-Only Algorithm Train networks with multiple outputs The more outputs the networks have, the more efficient the forward-only algorithm will be 1 output2 outputs 3 outputs4 outputs

Proposed Forward-Only Algorithm Pseudo codes of two different algorithms In forward-only computation, the backward computation (bold in left figure) is replaced by extra computation in forward process (bold in right figure) Traditional forward-backward algorithm Forward-only algorithm

Proposed Forward-Only Algorithm Computation cost estimation Properties of the forward-only algorithm –Simplified computation: organized in a regular table with general formula –Easy to be adapted for training arbitrarily connected neural networks –Improved computation efficiency for networks with multiple outputs Tradeoff –Extra memory is required to store the extended δ array Hagan and Menhaj Computation Forward PartBackward Part +/ – nn×nx + 3nn + nono×nn×ny ×/÷nn×nx + 4nnno×nn×ny + no×(nn – no) Expnn0 Forward-only computation ForwardBackward +/ – nn×nx + 3nn + no + nn×ny×nz0 ×/÷nn×nx + 4nn + nn×ny + nn×ny×nz0 Expnn0 Subtraction forward-only from traditional +/ – nn×ny×(no – 1) ×/÷nn×ny×(no – 1) + no×(nn – no) – nn×ny×nz exp0 MLP networks with one hidden layer; 20 inputs

Proposed Forward-Only Algorithm Experiments: training compact neural networks with good generalization ability Neur ons Success RateAverage IterationAverage Time (s) EBPFOEBPFOEBPFO 80%5%Failing222.5Failing0.33 90%25%Failing214.6Failing0.58 100%61%Failing183.5Failing0.70 110%76%Failing177.2Failing0.93 120%90%Failing149.5Failing1.08 1335%96%573,226142.5624.881.35 1442%99%544,734134.5651.661.76 1556%100%627,224119.3891.901.85 8 neurons, FO SSE Train =0.0044, SSE Verify =0.0080 8 neurons, EBP SSE Train =0.0764, SSE Verify =0.1271 Under-fitting 12 neurons, EBP SSE Train =0.0018, SSE Verify =0.4909 Over-fitting

Proposed Forward-Only Algorithm Experiments: comparison of computation efficiency Computation methods Time cost (ms/iteration) Relative time ForwardBackward Traditional8.241,028.74100.0% Forward-only61.130.005.9% Problems Computation Methods Time Cost (ms/iteration) Relative Time ForwardBackward 8-bit signalTraditional40.59468.14100.0% Forward-only175.720.0034.5% Computation methods Time cost (ms/iteration) Relative time ForwardBackward Traditional0.3070.771100.0% Forward-only0.7270.0067.4% ASCII to Images Forward Kinematics [13] Error Correction

Software The tool NBN Trainer is developed based on Visual C++ and used for training neural networks Pattern classification and recognition Function approximation Available online (currently free): http://www.eng.auburn.edu/~wilambm/nnt/index.htmhttp://www.eng.auburn.edu/~wilambm/nnt/index.htm

Parity-2 Problem Parity-2 Patterns

Conclusion Second order algorithms are more efficient and advanced in training neural networks The proposed second order computation removes Jacobian matrix storage and multiplication. It solves memory limitation The proposed forward-only algorithm simplifies the computation process in second order training: a regular table + a general formula The proposed forward-only algorithm can handle arbitrarily connected neural networks The proposed forward-only algorithm has speed benefit for networks with multiple outputs

Recent Research RBF networks –ErrCor algorithm: hierarchical training algorithm –Network size increases based on the training information –No more trial-by-trial Applications of Neural Networks (future work) –Dynamic controller design –Smart grid distribution systems –Pattern recognition in EDA software design

References [1] J. X. Peng, Kang Li, G.W. Irwin, "A New Jacobian Matrix for Optimal Learning of Single-Layer Neural Networks," IEEE Trans. on Neural Networks, vol. 19, no. 1, pp. 119-129, Jan 2008 [2] K. Hornik, M. Stinchcombe and H. White, "Multilayer Feedforward Networks Are Universal Approximators," Neural Networks, vol. 2, issue 5, pp. 359-366, 1989. [3] D. E. Rumelhart, G. E. Hinton and R. J. Wiliams, "Learning representations by back-propagating errors," Nature, vol. 323, pp. 533-536, 1986 MA. [4] V. V. Phansalkar, P.S. Sastry, "Analysis of the back-propagation algorithm with momentum," IEEE Trans. on Neural Networks, vol. 5, no. 3, pp. 505-506, March 1994. [5] M. Riedmiller, H. Braun, "A direct adaptive method for faster backpropagation learning: The RPROP algorithm". Proc. International Conference on Neural Networks, San Francisco, CA, 1993, pp. 586-591. [6] Scott E. Fahlman. Faster-learning variations on back-propagation: An empirical study. In T. J. Sejnowski G. E. Hinton and D. S. Touretzky, editors, 1988 Connectionist Models Summer School, San Mateo, CA, 1988. Morgan Kaufmann. [7] M. R. Osborne, "Fisher’s method of scoring," Internat. Statist. Rev., 86 (1992), pp. 271-286. [8] K. Levenberg, "A method for the solution of certain problems in least squares," Quarterly of Applied Machematics, 5, pp. 164-168, 1944. [9] D. Marquardt, "An algorithm for least-squares estimation of nonlinear parameters," SIAM J. Appl. Math., vol. 11, no. 2, pp. 431-441, Jun. 1963. [10] L. J. Cao, S. S. Keerthi, Chong-Jin Ong, J. Q. Zhang, U. Periyathamby, Xiu Ju Fu, H. P. Lee, "Parallel sequential minimal optimization for the training of support vector machines," IEEE Trans. on Neural Networks, vol. 17, no. 4, pp. 1039- 1049, April 2006. [11] D. C. Lay, Linear Algebra and its Applications. Addison-Wesley Publishing Company, 3 rd version, pp. 124, July, 2005. [12] H. N. Robert, "Theory of the Back Propagation Neural Network," Proc. 1989 IEEE IJCNN, 1593-1605, IEEE Press, New York, 1989. [13] N. J. Cotton and B. M. Wilamowski, "Compensation of Nonlinearities Using Neural Networks Implemented on Inexpensive Microcontrollers" IEEE Trans. on Industrial Electronics, vol. 58, No 3, pp. 733-740, March 2011.

Prepared Publications – Journals H. Yu, T. T. Xie, Stanisław Paszczyñski and B. M. Wilamowski, "Advantages of Radial Basis Function Networks for Dynamic System Design," IEEE Trans. on Industrial Electronics (Accepted and scheduled publication in December, 2011) H. Yu, T. T. Xie and B. M. Wilamowski, "Error Correction – A Robust Learning Algorithm for Designing Compact Radial Basis Function Networks," IEEE Trans. on Neural Networks (Major revision) T. T. Xie, H. Yu, J. Hewllet, Pawel Rozycki and B. M. Wilamowski, "Fast and Efficient Second Order Method for Training Radial Basis Function Networks," IEEE Trans. on Neural Networks (Major revision) A. Malinowski and H. Yu, "Comparison of Various Embedded System Technologies for Industrial Applications," IEEE Trans. on Industrial Informatics, vol. 7, issue 2, pp. 244-254, May 2011 B. M. Wilamowski and H. Yu, "Improved Computation for Levenberg Marquardt Training," IEEE Trans. on Neural Networks, vol. 21, no. 6, pp. 930-937, June 2010 (14 citations) B. M. Wilamowski and H. Yu, "Neural Network Learning Without Backpropagation," IEEE Trans. on Neural Networks, vol. 21, no.11, pp. 1793-1803, Nov. 2010 (5 citations) Pierluigi Siano, Janusz Kolbusz, H. Yu and Carlo Cecati, "Real Time Operation of a Smart Microgrid via FCN Networks and Optimal Power Flow," IEEE Trans. on Industrial Informatics (under reviewing)

Prepared Publications – Conferences H. Yu and B. M. Wilamowski, "Efficient and Reliable Training of Neural Networks," IEEE Human System Interaction Conference, HSI 2009, Catania. Italy, May 21-23, 2009, pp. 109-115. (Best paper award in Computational Intelligence section) (11 citations) H. Yu and B. M. Wilamowski, "C++ Implementation of Neural Networks Trainer," 13th IEEE Intelligent Engineering Systems Conference, INES 2009, Barbados, April 16-18, 2009, pp. 237-242 (8 citations) H. Yu and B. M. Wilamowski, "Fast and efficient and training of neural networks," in Proc. 3nd IEEE Human System Interaction Conf. HSI 2010, Rzeszow, Poland, May 13-15, 2010, pp. 175-181 (2 citations) H. Yu and B. M. Wilamowski, "Neural Network Training with Second Order Algorithms," monograph by Springer on Human-Computer Systems Interaction. Background and Applications, 31 st October, 2010. (Accepted) H. Yu, T. T. Xie, M. Hamilton and B. M. Wilamowski, "Comparison of Different Neural Network Architectures for Digit Image Recognition," in Proc. 3nd IEEE Human System Interaction Conf. HSI 2011, Yokohama, Japan, pp. 98-103, May 19-21, 2011 N. Pham, H. Yu and B. M. Wilamowski, "Neural Network Trainer through Computer Networks," 24 th IEEE International Conference on Advanced Information Networking and Applications, AINA2010, Perth, Australia, April 20-23, 2010, pp. 1203-1209 (1 citations) T. T. Xie, H. Yu and B. M. Wilamowski, "Replacing Fuzzy Systems with Neural Networks," in Proc. 3nd IEEE Human System Interaction Conf. HSI 2010, Rzeszow, Poland, May 13-15, 2010, pp. 189-193. T. T. Xie, H. Yu and B. M. Wilamowski, "Comparison of Traditional Neural Networks and Radial Basis Function Networks," in Proc. 20th IEEE International Symposium on Industrial Electronics, ISIE2011, Gdansk, Poland, 27- 30 June 2011 (Accepted)

Prepared Publications – Chapters for IE Handbook (2 nd Edition) H. Yu and B. M. Wilamowski, "Levenberg Marquardt Training," Industrial Electronics Handbook, vol. 5 – INTELLIGENT SYSTEMS, 2 nd Edition, 2010, chapter 12, pp. 12-1 to 12-16, CRC Press. H. Yu and M. Carroll, "Interactive Website Design Using Python Script," Industrial Electronics Handbook, vol. 4 – INDUSTRIAL COMMUNICATION SYSTEMS, 2 nd Edition, 2010, chapter 62, pp. 62-1 to 62-8, CRC Press. B. M. Wilamowski, H. Yu and N. Cotton, "Neuron by Neuron Algorithm," Industrial Electronics Handbook, vol. 5 – INTELLIGENT SYSTEMS, 2 nd Edition, 2010, chapter 13, pp. 13-1 to 13-24, CRC Press. T. T. Xie, H. Yu and B. M. Wilamowski, "Neuro-fuzzy System," Industrial Electronics Handbook, vol. 5 – INTELLIGENT SYSTEMS, 2 nd Edition, 2010, chapter 20, pp. 20-1 to 20-9, CRC Press. B. M. Wilamowski, H. Yu and K. T. Chung, "Parity-N problems as a vehicle to compare efficiency of neural network architectures," Industrial Electronics Handbook, vol. 5 – INTELLIGENT SYSTEMS, 2 nd Edition, 2010, chapter 10, pp. 10-1 to 10-8, CRC Press.

Thanks

Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici.

Similar presentations

Presentation on theme: "Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici.

Similar presentations

Presentation on theme: "Research on Advanced Training Algorithms of Neural Networks Hao Yu Ph.D Defense Aug 17 th 2011 Supervisor: Bogdan Wilamowski Committee Members: Hulya Kirkici."— Presentation transcript:

Similar presentations

About project

Feedback