Artificial Neural Networks

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Navneet Goyal, BITS-Pilani Perceptrons. Labeled data is called Linearly Separable Data (LSD) if there is a linear decision boundary separating the classes.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Artificial Neural Networks
Machine Learning Neural Networks
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
CS 484 – Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Introduction to Directed Data Mining: Neural Networks
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Neural Networks Prepared by David Douglas, University of ArkansasHosted.
Artificial Neural Networks
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Advanced information retreival Chapter 02: Modeling - Neural Network Model Neural Network Model.
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
NEURAL NETWORKS FOR DATA MINING
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Artificial Neural Network
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Neural networks.
Neural Networks.
Artificial Neural Networks I
Learning with Perceptrons and Neural Networks
Advanced information retreival
Artificial Intelligence (CS 370D)
Artificial neural networks:
Real Neurons Cell structures Cell body Dendrites Axon
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
ECE 471/571 - Lecture 17 Back Propagation.
Classification Neural Networks 1
Perceptron as one Type of Linear Discriminants
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Neural Networks Geoff Hulten.
Artificial Intelligence Lecture No. 28
Lecture Notes for Chapter 4 Artificial Neural Networks
Neural Networks ICS 273A UC Irvine Instructor: Max Welling
Neural networks (1) Traditional multi-layer perceptrons
Chapter - 3 Single Layer Percetron
COSC 4368 Machine Learning Organization
COSC 4335: Part2: Other Classification Techniques
David Kauchak CS158 – Spring 2019
Presentation transcript:

Artificial Neural Networks CSC 600: Data Mining Class 24

Today… Artificial Neural Networks (ANN) Inspiration Perceptron Hidden Nodes Learning

Inspiration Attempts to simulate biological neural systems Animal brains have complex learning systems consisting of closely interconnected sets of neurons

Human Brain Neurons: nerve cells Neurons linked (connected) to other neurons via axons A neuron is connected to the axons via dendrites Dentrites gather inputs from other neurons

Learning Neurons uses dendrites to gather inputs from other neurons Combines input information, and outputs a response “fires” when some threshold is reached Human brain learns by changing the strength of the connection between neurons, upon repeated stimulation by the same impulse

Inspiration Human brain contains approximately 1011 neurons Each connected on average to 10,000 other neurons Total of 1,000,000,000,000,000 = 1015 connections

Output Y is 1 if at least two of the three inputs are equal to 1.

Going to begin with simplest model…

Perceptron Perceptron has two types of nodes: Input nodes (for the input attributes) Output node (for the model’s output) Nodes in a neural network are commonly known as neurons.

Perceptron Each input node is connected to the output node via a weighted link Weighted link represents the strength of the connection between neurons Idea: learning the optimal weights

Perceptron – Output Value Weighted sum of inputs, subtracting a bias factor t, and examining sign of the result.

Perceptron – General Model Model is an assembly of inter- connected nodes and weighted links Output node sums up each of its input value according to the weights of its links Compare output node against some threshold t Perceptron Model

Learning Perceptron Model

Weight Update Formula Input: Observation i Attribute j Weight for attribute j, after k iterations New weight for attribute j Prediction error Learning rate parameter, between 0 and 1 Closer to 0: SLOW - new weight mostly influenced by value of old weight Closer to 1: FAST – more sensitive to error in current iteration

Weight Update Formula If y = +1 and yhat = 0: If y = 0 and yhat = 1: prediction error = (y – yhat) = 1 To compensate for the error: increase the value of the predicted output by increasing weights of all links with positive inputs and decreasing weights of all links with negative inputs. If y = 0 and yhat = 1: prediction error = (y – yhat) = -1 To compensate for the error: decrease the value of the predicted output by decreasing weights of all links with positive inputs and increasing weights of all links with negative inputs.

Perceptron Weight Convergence Perceptron learning algorithm is guaranteed to converge to an optimal solution (weights stop changing) … for linearly separable classification problems If problem is not linearly separable, the algorithm fails to converge XOR Problem Decision boundary of perceptron is a linear hyperplane.

Multilayer Artificial Neural Network More complex than perceptron model: Also contains one or more intermediary layers between input and output layers Called: hidden nodes Allows for modeling more complex relationships

Perceptron “learns” / “creates” one hyperplane. Think of each hidden node as a perceptron. Perceptron “learns” / “creates” one hyperplane. XOR problem can be classified with two hyperplanes.

Multilayer Artificial Neural Network More complex than perceptron model: Also contains one or more intermediary layers between input and output layers Called: hidden nodes Allows for modeling more complex relationships Use of other activation functions other than the sign function Alternative: sigmoid (logistic) function

Why Sigmoid Function? Combines nearly linear behavior, curvi-linear behavior, and nearly constant behavior, depending on the value of the input. Input: any real-valued input Output: between 0 and 1

Feed-Forward Neural Network Nodes in one layer are connected only to the nodes in the next layer. Completely connected (every node in layer i is connected to every node in layer i+1) (Perceptron is a single-layer, feed-forward neural network.) Other types: Recurrent neural network may connect nodes within the same layer, or to nodes in a previous layer.

Input Encoding Possible drawback: all attribute values must be numeric and normalized between 0 and 1 even categorical variables Numeric variables: apply min-max normalization works as longs as min and max are known What if new value (in testing set) is outside of range? Potential solution: assign value to either the min or max

Input Encoding Possible drawback: all attribute values must be numeric and normalized between 0 and 1 Categorical variables: Use flag (binary 0/1) variables to represent each category, if more than 2 categories (if # of possible categories is not too large) 2 categories can be represented by 0/1 numeric variable In general: k-1 indicator variables needed for categorical variable with k classes

Output Encoding Neural Networks will output a continuous value between 0 and 1 Binary problems: use some threshold, such as 0.5 Ordinal example: If 0 <= output < 0.25, classify first-grade reading level If 0.25 <= output < 0.50, classify second-grade reading level If 0.50 <= output < 0.75, classify third-grade reading level If output >= 0.75, classify fourth-grade reading level Classification: Ideas? Use 1-of-n output encoding with multiple output nodes

1-of-n Output Encoding Example: Assume marital status target variable with outputs: {divorced, married, separated, single, widowed, unknown} Each output node gets value between 0 and 1 Choose node with highest value Additional Benefit: Measure of confidence Difference between highest value output node and the second highest value output node

Output For numerical output problems: Neural net output is between 0 and 1 May need to transform output to a different scale Inverse of min-max normalization

Neural Network Structure # of input nodes: depends on number and type of attributes in dataset # of output nodes: depends on classification task # of hidden nodes: Configurable by data analyst More nodes increase power and flexibility of network Too many nodes will lead to overfitting Too few nodes will lead to poor learning # of hidden layers: Usually 1 for computational reasons

Neural Network Example: Predicted Value Data Inputs and Weights: Input Attributes: x1, x2, x3 Predicted Value: 0.8750

Learning the ANN Model Goal is to determine a set of weights w that minimize the total sum of squared errors

Gradient Descent Method May converge without finding optimal weights. No closed-form solution exists for minimizing the SSE Gradient Descent: Direction that weights should be adjusted Back Propagation: Takes prediction error and propagates error back through the network Weights of hidden nodes can also be adjusted

Learning the ANN Model Keep adjusting weights until some stopping criterion is met: SSE reduced below some threshold Weights are not changing anymore Elapsed training time exceeds limit Number of iterations exceeds limits

Non-Optimal Local Minimum Potential Solutions: Adjust learning parameter Momentum Term Non-Optimal Local Minimum Algorithm discovers weights that result in local minimum rather than global minimum

Characteristics of Artificial Neural Networks Important to choose appropriate network topology Very expensive hypothesis space Relatively lengthy training time Fast classification time Can handle redundant features Weights for redundant features tend to be very small Gradient descent for learning weights may converge to local minimum Use momentum term Learn multiple models (remember initial weights are random) Interpretability: what do weights of hidden nodes mean?

Sensitivity Analysis Measures relative influence each attribute has on the output result: Generate a new observation xmean, with each attribute in xmean, equal to the mean of each attribute Find the network output for input xmean Attribute by attribute, vary xmean to the min and max of that attribute. Find the network output for each variation and compare it to (2). Will discover which attributes the network is more sensitive to

References Data Science from Scratch, 1st Edition, Grus Introduction to Data Mining, 1st edition, Tan et al. Discovering Knowledge in Data, 2nd edition, Larose et al.