Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Neural Networks

Similar presentations


Presentation on theme: "Artificial Neural Networks"— Presentation transcript:

1 Artificial Neural Networks
CSC 600: Data Mining Class 24

2 Today… Artificial Neural Networks (ANN) Inspiration Perceptron
Hidden Nodes Learning

3 Inspiration Attempts to simulate biological neural systems
Animal brains have complex learning systems consisting of closely interconnected sets of neurons

4 Human Brain Neurons: nerve cells
Neurons linked (connected) to other neurons via axons A neuron is connected to the axons via dendrites Dentrites gather inputs from other neurons

5 Learning Neurons uses dendrites to gather inputs from other neurons
Combines input information, and outputs a response “fires” when some threshold is reached Human brain learns by changing the strength of the connection between neurons, upon repeated stimulation by the same impulse

6 Inspiration Human brain contains approximately 1011 neurons
Each connected on average to 10,000 other neurons Total of 1,000,000,000,000,000 = connections

7 Output Y is 1 if at least two of the three inputs are equal to 1.

8 Going to begin with simplest model…

9 Perceptron Perceptron has two types of nodes:
Input nodes (for the input attributes) Output node (for the model’s output) Nodes in a neural network are commonly known as neurons.

10 Perceptron Each input node is connected to the output node via a weighted link Weighted link represents the strength of the connection between neurons Idea: learning the optimal weights

11 Perceptron – Output Value
Weighted sum of inputs, subtracting a bias factor t, and examining sign of the result.

12 Perceptron – General Model
Model is an assembly of inter- connected nodes and weighted links Output node sums up each of its input value according to the weights of its links Compare output node against some threshold t Perceptron Model

13 Learning Perceptron Model

14 Weight Update Formula Input: Observation i
Attribute j Weight for attribute j, after k iterations New weight for attribute j Prediction error Learning rate parameter, between 0 and 1 Closer to 0: SLOW - new weight mostly influenced by value of old weight Closer to 1: FAST – more sensitive to error in current iteration

15 Weight Update Formula If y = +1 and yhat = 0: If y = 0 and yhat = 1:
prediction error = (y – yhat) = 1 To compensate for the error: increase the value of the predicted output by increasing weights of all links with positive inputs and decreasing weights of all links with negative inputs. If y = 0 and yhat = 1: prediction error = (y – yhat) = -1 To compensate for the error: decrease the value of the predicted output by decreasing weights of all links with positive inputs and increasing weights of all links with negative inputs.

16 Perceptron Weight Convergence
Perceptron learning algorithm is guaranteed to converge to an optimal solution (weights stop changing) … for linearly separable classification problems If problem is not linearly separable, the algorithm fails to converge XOR Problem Decision boundary of perceptron is a linear hyperplane.

17 Multilayer Artificial Neural Network
More complex than perceptron model: Also contains one or more intermediary layers between input and output layers Called: hidden nodes Allows for modeling more complex relationships

18 Perceptron “learns” / “creates” one hyperplane.
Think of each hidden node as a perceptron. Perceptron “learns” / “creates” one hyperplane. XOR problem can be classified with two hyperplanes.

19 Multilayer Artificial Neural Network
More complex than perceptron model: Also contains one or more intermediary layers between input and output layers Called: hidden nodes Allows for modeling more complex relationships Use of other activation functions other than the sign function Alternative: sigmoid (logistic) function

20 Why Sigmoid Function? Combines nearly linear behavior, curvi-linear behavior, and nearly constant behavior, depending on the value of the input. Input: any real-valued input Output: between 0 and 1

21 Feed-Forward Neural Network
Nodes in one layer are connected only to the nodes in the next layer. Completely connected (every node in layer i is connected to every node in layer i+1) (Perceptron is a single-layer, feed-forward neural network.) Other types: Recurrent neural network may connect nodes within the same layer, or to nodes in a previous layer.

22 Input Encoding Possible drawback: all attribute values must be numeric and normalized between 0 and 1 even categorical variables Numeric variables: apply min-max normalization works as longs as min and max are known What if new value (in testing set) is outside of range? Potential solution: assign value to either the min or max

23 Input Encoding Possible drawback: all attribute values must be numeric and normalized between 0 and 1 Categorical variables: Use flag (binary 0/1) variables to represent each category, if more than 2 categories (if # of possible categories is not too large) 2 categories can be represented by 0/1 numeric variable In general: k-1 indicator variables needed for categorical variable with k classes

24 Output Encoding Neural Networks will output a continuous value between 0 and 1 Binary problems: use some threshold, such as 0.5 Ordinal example: If 0 <= output < 0.25, classify first-grade reading level If 0.25 <= output < 0.50, classify second-grade reading level If 0.50 <= output < 0.75, classify third-grade reading level If output >= 0.75, classify fourth-grade reading level Classification: Ideas? Use 1-of-n output encoding with multiple output nodes

25 1-of-n Output Encoding Example:
Assume marital status target variable with outputs: {divorced, married, separated, single, widowed, unknown} Each output node gets value between 0 and 1 Choose node with highest value Additional Benefit: Measure of confidence Difference between highest value output node and the second highest value output node

26 Output For numerical output problems:
Neural net output is between 0 and 1 May need to transform output to a different scale Inverse of min-max normalization

27 Neural Network Structure
# of input nodes: depends on number and type of attributes in dataset # of output nodes: depends on classification task # of hidden nodes: Configurable by data analyst More nodes increase power and flexibility of network Too many nodes will lead to overfitting Too few nodes will lead to poor learning # of hidden layers: Usually 1 for computational reasons

28 Neural Network Example: Predicted Value
Data Inputs and Weights: Input Attributes: x1, x2, x3 Predicted Value:

29 Learning the ANN Model Goal is to determine a set of weights w that minimize the total sum of squared errors

30 Gradient Descent Method
May converge without finding optimal weights. No closed-form solution exists for minimizing the SSE Gradient Descent: Direction that weights should be adjusted Back Propagation: Takes prediction error and propagates error back through the network Weights of hidden nodes can also be adjusted

31 Learning the ANN Model Keep adjusting weights until some stopping criterion is met: SSE reduced below some threshold Weights are not changing anymore Elapsed training time exceeds limit Number of iterations exceeds limits

32 Non-Optimal Local Minimum
Potential Solutions: Adjust learning parameter Momentum Term Non-Optimal Local Minimum Algorithm discovers weights that result in local minimum rather than global minimum

33 Characteristics of Artificial Neural Networks
Important to choose appropriate network topology Very expensive hypothesis space Relatively lengthy training time Fast classification time Can handle redundant features Weights for redundant features tend to be very small Gradient descent for learning weights may converge to local minimum Use momentum term Learn multiple models (remember initial weights are random) Interpretability: what do weights of hidden nodes mean?

34 Sensitivity Analysis Measures relative influence each attribute has on the output result: Generate a new observation xmean, with each attribute in xmean, equal to the mean of each attribute Find the network output for input xmean Attribute by attribute, vary xmean to the min and max of that attribute. Find the network output for each variation and compare it to (2). Will discover which attributes the network is more sensitive to

35 References Data Science from Scratch, 1st Edition, Grus
Introduction to Data Mining, 1st edition, Tan et al. Discovering Knowledge in Data, 2nd edition, Larose et al.


Download ppt "Artificial Neural Networks"

Similar presentations


Ads by Google