CS 416 Artificial Intelligence Lecture 22 Statistical Learning Chapter 20.5 Lecture 22 Statistical Learning Chapter 20.5.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
Beyond Linear Separability
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Artificial Neural Networks
Perceptron.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Simple Neural Nets For Pattern Classification
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Neural Networks Marco Loog.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Back-Propagation Algorithm
Artificial Neural Networks
September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS 4700: Foundations of Artificial Intelligence
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Artificial Neural Networks
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Networks
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Machine Learning Dr. Shazzad Hosain Department of EECS North South Universtiy
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
CS 551/651 Search and “Through the Lens” Lecture 13 Search and “Through the Lens” Lecture 13.
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.
Neural Networks References: “Artificial Intelligence for Games” "Artificial Intelligence: A new Synthesis"
CS 416 Artificial Intelligence Lecture 18 Neural Nets Chapter 20 Lecture 18 Neural Nets Chapter 20.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Learning with Perceptrons and Neural Networks
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSE P573 Applications of Artificial Intelligence Neural Networks
CS621: Artificial Intelligence
CSC 578 Neural Networks and Deep Learning
Synaptic DynamicsII : Supervised Learning
of the Artificial Neural Networks.
CSE 573 Introduction to Artificial Intelligence Neural Networks
Neural Network - 2 Mayank Vatsa
Lecture Notes for Chapter 4 Artificial Neural Networks
Machine Learning: Lecture 4
COSC 4335: Part2: Other Classification Techniques
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Presentation transcript:

CS 416 Artificial Intelligence Lecture 22 Statistical Learning Chapter 20.5 Lecture 22 Statistical Learning Chapter 20.5

Perceptrons Each input is binary and has associated with it a weightEach input is binary and has associated with it a weight The sum of the inner product of the input and weights is calculatedThe sum of the inner product of the input and weights is calculated If this sum exceeds a threshold, the perceptron firesIf this sum exceeds a threshold, the perceptron fires Each input is binary and has associated with it a weightEach input is binary and has associated with it a weight The sum of the inner product of the input and weights is calculatedThe sum of the inner product of the input and weights is calculated If this sum exceeds a threshold, the perceptron firesIf this sum exceeds a threshold, the perceptron fires

Perceptrons are linear classifiers Consider a two-input neuron Two weights are “tuned” to fit the dataTwo weights are “tuned” to fit the data The neuron uses the equation w 1 *x 1 + w 2 *x 2 to fire or notThe neuron uses the equation w 1 *x 1 + w 2 *x 2 to fire or not –This is like the equation of a line mx + b - y Consider a two-input neuron Two weights are “tuned” to fit the dataTwo weights are “tuned” to fit the data The neuron uses the equation w 1 *x 1 + w 2 *x 2 to fire or notThe neuron uses the equation w 1 *x 1 + w 2 *x 2 to fire or not –This is like the equation of a line mx + b - y

Linearly separable These single-layer perceptron networks can classify linearly separable systems Consider a system like ANDConsider a system like AND x 1 x 2 x 1 AND x These single-layer perceptron networks can classify linearly separable systems Consider a system like ANDConsider a system like AND x 1 x 2 x 1 AND x

Linearly separable - AND Consider a system like ANDConsider a system like AND x 1 x 2 x 1 AND x Consider a system like ANDConsider a system like AND x 1 x 2 x 1 AND x  x1x1 x2x2 w1w1 w2w2  (x●w)

Linearly separable - XOR Consider a system like XORConsider a system like XOR x 1 x 2 x 1 XOR x Consider a system like XORConsider a system like XOR x 1 x 2 x 1 XOR x  x1x1 x2x2 w1w1 w2w2  (x●w)

Linearly separable - XOR IMPOSSIBLE!

2 nd Class Exercise x3 = ~x1, x4 = ~x2x3 = ~x1, x4 = ~x2 Find w1, w2, w3, w4, and theta such that Theta(x1*w1+x2* w2)= x1 xor x2Find w1, w2, w3, w4, and theta such that Theta(x1*w1+x2* w2)= x1 xor x2 Or, prove that it can’t be doneOr, prove that it can’t be done x3 = ~x1, x4 = ~x2x3 = ~x1, x4 = ~x2 Find w1, w2, w3, w4, and theta such that Theta(x1*w1+x2* w2)= x1 xor x2Find w1, w2, w3, w4, and theta such that Theta(x1*w1+x2* w2)= x1 xor x2 Or, prove that it can’t be doneOr, prove that it can’t be done

3 rd Class Exercise Find w1, w2, and f() such that f(x1*w1+x2*w2) = x1 xor x2Find w1, w2, and f() such that f(x1*w1+x2*w2) = x1 xor x2 Or, prove that it can’t be doneOr, prove that it can’t be done Find w1, w2, and f() such that f(x1*w1+x2*w2) = x1 xor x2Find w1, w2, and f() such that f(x1*w1+x2*w2) = x1 xor x2 Or, prove that it can’t be doneOr, prove that it can’t be done

Limitations of Perceptrons Minsky & Papert published (1969) “Perceptrons” stressing the limitations of perceptronsMinsky & Papert published (1969) “Perceptrons” stressing the limitations of perceptrons Single-layer perceptrons cannot solve problems that are linearly inseparable (e.g., xor)Single-layer perceptrons cannot solve problems that are linearly inseparable (e.g., xor) Most interesting problems are linearly inseparableMost interesting problems are linearly inseparable Kills funding for neural nets for yearsKills funding for neural nets for years Minsky & Papert published (1969) “Perceptrons” stressing the limitations of perceptronsMinsky & Papert published (1969) “Perceptrons” stressing the limitations of perceptrons Single-layer perceptrons cannot solve problems that are linearly inseparable (e.g., xor)Single-layer perceptrons cannot solve problems that are linearly inseparable (e.g., xor) Most interesting problems are linearly inseparableMost interesting problems are linearly inseparable Kills funding for neural nets for yearsKills funding for neural nets for years

A brief aside about Marvin Minsky Attended Bronx H.S. of ScienceAttended Bronx H.S. of Science Served in U.S. Navy during WW IIServed in U.S. Navy during WW II B.A. Harvard and Ph.D. PrincetonB.A. Harvard and Ph.D. Princeton MIT faculty since 1958MIT faculty since 1958 First graphical head-mounted display (1963)First graphical head-mounted display (1963) Co-inventor of Logo (1968)Co-inventor of Logo (1968) Nearly killed during 2001: A Space Odyssey but survived to write paper critical of neural networksNearly killed during 2001: A Space Odyssey but survived to write paper critical of neural networks Turing Award 1970Turing Award 1970 Attended Bronx H.S. of ScienceAttended Bronx H.S. of Science Served in U.S. Navy during WW IIServed in U.S. Navy during WW II B.A. Harvard and Ph.D. PrincetonB.A. Harvard and Ph.D. Princeton MIT faculty since 1958MIT faculty since 1958 First graphical head-mounted display (1963)First graphical head-mounted display (1963) Co-inventor of Logo (1968)Co-inventor of Logo (1968) Nearly killed during 2001: A Space Odyssey but survived to write paper critical of neural networksNearly killed during 2001: A Space Odyssey but survived to write paper critical of neural networks Turing Award 1970Turing Award 1970 From wikipedia.org

Single-layer networks for classification Single output with 0.5 as dividing line for binary classificationSingle output with 0.5 as dividing line for binary classification Single output with n-1 dividing lines for n-ary classificationSingle output with n-1 dividing lines for n-ary classification n outputs with 0.5 dividing line for n-ary classificationn outputs with 0.5 dividing line for n-ary classification Single output with 0.5 as dividing line for binary classificationSingle output with 0.5 as dividing line for binary classification Single output with n-1 dividing lines for n-ary classificationSingle output with n-1 dividing lines for n-ary classification n outputs with 0.5 dividing line for n-ary classificationn outputs with 0.5 dividing line for n-ary classification

Recent History of Neural Nets 1969 Minsky & Papert “kill” neural nets1969 Minsky & Papert “kill” neural nets 1974 Werbos describes back- propagation1974 Werbos describes back- propagation 1982 Hopfield reinvigorates neural nets1982 Hopfield reinvigorates neural nets 1986 Parallel Distributed Processing1986 Parallel Distributed Processing 1969 Minsky & Papert “kill” neural nets1969 Minsky & Papert “kill” neural nets 1974 Werbos describes back- propagation1974 Werbos describes back- propagation 1982 Hopfield reinvigorates neural nets1982 Hopfield reinvigorates neural nets 1986 Parallel Distributed Processing1986 Parallel Distributed Processing

Multi-layered Perceptrons Input layer, output layer, and “hidden” layersInput layer, output layer, and “hidden” layers Eliminates some concerns of Minsky and PapertEliminates some concerns of Minsky and Papert Modification rules are more complicated!Modification rules are more complicated! Input layer, output layer, and “hidden” layersInput layer, output layer, and “hidden” layers Eliminates some concerns of Minsky and PapertEliminates some concerns of Minsky and Papert Modification rules are more complicated!Modification rules are more complicated!

Why are modification rules more complicated? We can calculate the error of the output neuron by comparing to training data We could use previous update rule to adjust W 3,5 and W 4,5 to correct that errorWe could use previous update rule to adjust W 3,5 and W 4,5 to correct that error But how do W 1,3 W 1,4 W 2,3 W 2,4 adjust?But how do W 1,3 W 1,4 W 2,3 W 2,4 adjust? We can calculate the error of the output neuron by comparing to training data We could use previous update rule to adjust W 3,5 and W 4,5 to correct that errorWe could use previous update rule to adjust W 3,5 and W 4,5 to correct that error But how do W 1,3 W 1,4 W 2,3 W 2,4 adjust?But how do W 1,3 W 1,4 W 2,3 W 2,4 adjust?

First consider error in single-layer neural networks Sum of squared errors (across training data) For one sample: How can we minimize the error? Set derivative equal to zero and solve for weightsSet derivative equal to zero and solve for weights is that error affected by each of the weights in the weight vector?is that error affected by each of the weights in the weight vector? Sum of squared errors (across training data) For one sample: How can we minimize the error? Set derivative equal to zero and solve for weightsSet derivative equal to zero and solve for weights is that error affected by each of the weights in the weight vector?is that error affected by each of the weights in the weight vector?

Minimizing the error What is the derivative? The gradient,The gradient, –Composed of What is the derivative? The gradient,The gradient, –Composed of

Computing the partial By the Chain Rule: g ( ) = the activation function

Computing the partial g’( in ) = derivative of the activation function = g(1-g) in the case of the sigmoid

Minimizing the error Gradient descent

What changes in multilayer? Output is not one value, y Output is a vectorOutput is a vector We do not know the correct outputs for the hidden layers We will have to propagate errors backwardsWe will have to propagate errors backwards Back propagation (backprop) Output is not one value, y Output is a vectorOutput is a vector We do not know the correct outputs for the hidden layers We will have to propagate errors backwardsWe will have to propagate errors backwards Back propagation (backprop)

Multilayer

Backprop at the output layer Output layer error is computed as in single- layer and weights are updated in same fashion Let Err i be the i th component of the error vector y – h WLet Err i be the i th component of the error vector y – h W –Let Output layer error is computed as in single- layer and weights are updated in same fashion Let Err i be the i th component of the error vector y – h WLet Err i be the i th component of the error vector y – h W –Let

Backprop in the hidden layer Each hidden node is responsible for some fraction of the error  i in each of the output nodes to which it is connected  i is divided among all hidden nodes that connect to output i according to their strengths  i is divided among all hidden nodes that connect to output i according to their strengths Error at hidden node j: Each hidden node is responsible for some fraction of the error  i in each of the output nodes to which it is connected  i is divided among all hidden nodes that connect to output i according to their strengths  i is divided among all hidden nodes that connect to output i according to their strengths Error at hidden node j:

Backprop in the hidden layer Error is: Correction is: Error is: Correction is:

Summary of backprop 1.Compute the  value for the output units using the observed error 2.Starting with the output layer, repeat the following for each layer until done Propagate  value back to previous layerPropagate  value back to previous layer Update the weights between the two layersUpdate the weights between the two layers 1.Compute the  value for the output units using the observed error 2.Starting with the output layer, repeat the following for each layer until done Propagate  value back to previous layerPropagate  value back to previous layer Update the weights between the two layersUpdate the weights between the two layers

4 th Class Exercise Find w1, w2, w3, w4, w5, theta1, and theta2 such that output is x1 xor x2Find w1, w2, w3, w4, w5, theta1, and theta2 such that output is x1 xor x2 Or, prove that it can’t be doneOr, prove that it can’t be done Find w1, w2, w3, w4, w5, theta1, and theta2 such that output is x1 xor x2Find w1, w2, w3, w4, w5, theta1, and theta2 such that output is x1 xor x2 Or, prove that it can’t be doneOr, prove that it can’t be done

Back-Propagation (xor) Initial weights are randomInitial weights are random Threshold is now sigmoidal (function should have derivatives)Threshold is now sigmoidal (function should have derivatives) Initial weights are randomInitial weights are random Threshold is now sigmoidal (function should have derivatives)Threshold is now sigmoidal (function should have derivatives) Initial weights: w1=0.90, w2=-0.54 w3=0.21, w4=-0.03 w5 = 0.78

Back-Propagation (xor) Input layer – two unitInput layer – two unit Hidden layer – one unitHidden layer – one unit Output layer – one unitOutput layer – one unit Output is related to input byOutput is related to input by Performance is defined asPerformance is defined as Input layer – two unitInput layer – two unit Hidden layer – one unitHidden layer – one unit Output layer – one unitOutput layer – one unit Output is related to input byOutput is related to input by Performance is defined asPerformance is defined as For all samples in training set T

Back-Propagation (xor) Error at last layer (hidden  output) is defined as:Error at last layer (hidden  output) is defined as: Error at previous layer (input  hidden) is defined as:Error at previous layer (input  hidden) is defined as: Change in weight:Change in weight: Where:Where: Error at last layer (hidden  output) is defined as:Error at last layer (hidden  output) is defined as: Error at previous layer (input  hidden) is defined as:Error at previous layer (input  hidden) is defined as: Change in weight:Change in weight: Where:Where:

Back-Propagation (xor) (0,0)  0 – 1st example(0,0)  0 – 1st example Input to hidden unit is 0, sigmoid(0)=0.5Input to hidden unit is 0, sigmoid(0)=0.5 Input to output unit is (0.5)(-0.03)=-0.015Input to output unit is (0.5)(-0.03)= Sigmoid(-0.015)=  error= Sigmoid(-0.015)=  error= So,So, Example’s contribution to is –0.0062Example’s contribution to is – (0,0)  0 – 1st example(0,0)  0 – 1st example Input to hidden unit is 0, sigmoid(0)=0.5Input to hidden unit is 0, sigmoid(0)=0.5 Input to output unit is (0.5)(-0.03)=-0.015Input to output unit is (0.5)(-0.03)= Sigmoid(-0.015)=  error= Sigmoid(-0.015)=  error= So,So, Example’s contribution to is –0.0062Example’s contribution to is – Why are we ignoring the other weight changes?

Back-Propagation (xor) (0,1)  1 – 2nd example(0,1)  1 – 2nd example i h =-0.54  o h =0.3862i h =-0.54  o h = i o =(0.3862)(-.03)+0.78=0.769  o o =0.6683i o =(0.3862)(-.03)+0.78=0.769  o o = (0,1)  1 – 2nd example(0,1)  1 – 2nd example i h =-0.54  o h =0.3862i h =-0.54  o h = i o =(0.3862)(-.03)+0.78=0.769  o o =0.6683i o =(0.3862)(-.03)+0.78=0.769  o o = &c…

Back-Propagation (xor) Initial performance = Initial performance = After 100 iterations we have:After 100 iterations we have: w=(0.913, , 0.036, , 0.288)w=(0.913, , 0.036, , 0.288) Performance = Performance = After 100K iterations we have:After 100K iterations we have: w=(15.75, , 7.146, , )w=(15.75, , 7.146, , ) Performance = Performance = After 1M iterations we have:After 1M iterations we have: w=(21.38, , 9.798, , )w=(21.38, , 9.798, , ) Performance = Performance = Initial performance = Initial performance = After 100 iterations we have:After 100 iterations we have: w=(0.913, , 0.036, , 0.288)w=(0.913, , 0.036, , 0.288) Performance = Performance = After 100K iterations we have:After 100K iterations we have: w=(15.75, , 7.146, , )w=(15.75, , 7.146, , ) Performance = Performance = After 1M iterations we have:After 1M iterations we have: w=(21.38, , 9.798, , )w=(21.38, , 9.798, , ) Performance = Performance =

Some general artificial neural network (ANN) info The entire network is a function g( inputs ) = outputsThe entire network is a function g( inputs ) = outputs –These functions frequently have sigmoids in them –These functions are frequently differentiable –These functions have coefficients (weights) Backpropagation networks are simply ways to tune the coefficients of a function so it produces desired outputBackpropagation networks are simply ways to tune the coefficients of a function so it produces desired output The entire network is a function g( inputs ) = outputsThe entire network is a function g( inputs ) = outputs –These functions frequently have sigmoids in them –These functions are frequently differentiable –These functions have coefficients (weights) Backpropagation networks are simply ways to tune the coefficients of a function so it produces desired outputBackpropagation networks are simply ways to tune the coefficients of a function so it produces desired output

Function approximation Consider fitting a line to data Coefficients: slope and y-interceptCoefficients: slope and y-intercept Training data: some samplesTraining data: some samples Use least-squares fitUse least-squares fit This is what an ANN does Consider fitting a line to data Coefficients: slope and y-interceptCoefficients: slope and y-intercept Training data: some samplesTraining data: some samples Use least-squares fitUse least-squares fit This is what an ANN does x y

Function approximation A function of two inputs… Fit a smooth curve to the available dataFit a smooth curve to the available data –Quadratic –Cubic –n th -order –ANN! A function of two inputs… Fit a smooth curve to the available dataFit a smooth curve to the available data –Quadratic –Cubic –n th -order –ANN!

Curve fitting A neural network should be able to generate the input/output pairs from the training dataA neural network should be able to generate the input/output pairs from the training data You’d like for it to be smooth (and well-behaved) in the voids between the training dataYou’d like for it to be smooth (and well-behaved) in the voids between the training data There are risks of over fitting the dataThere are risks of over fitting the data A neural network should be able to generate the input/output pairs from the training dataA neural network should be able to generate the input/output pairs from the training data You’d like for it to be smooth (and well-behaved) in the voids between the training dataYou’d like for it to be smooth (and well-behaved) in the voids between the training data There are risks of over fitting the dataThere are risks of over fitting the data

When using ANNs Sometimes the output layer feeds back into the input layer – recurrent neural networksSometimes the output layer feeds back into the input layer – recurrent neural networks The backpropagation will tune the weightsThe backpropagation will tune the weights You determine the topologyYou determine the topology –Different topologies have different training outcomes (consider overfitting) –Sometimes a genetic algorithm is used to explore the space of neural network topologies Sometimes the output layer feeds back into the input layer – recurrent neural networksSometimes the output layer feeds back into the input layer – recurrent neural networks The backpropagation will tune the weightsThe backpropagation will tune the weights You determine the topologyYou determine the topology –Different topologies have different training outcomes (consider overfitting) –Sometimes a genetic algorithm is used to explore the space of neural network topologies

What is the Purpose of NN? To create an Artificial Intelligence, or Although not an invalid purpose, many people in the AI community think neural networks do not provide anything that cannot be obtained through other techniquesAlthough not an invalid purpose, many people in the AI community think neural networks do not provide anything that cannot be obtained through other techniques –It is hard to unravel the “intelligence” behind why the ANN works To study how the human brain works? Ironically, those studying neural networks with this in mind are more likely to contribute to the previous purposeIronically, those studying neural networks with this in mind are more likely to contribute to the previous purpose To create an Artificial Intelligence, or Although not an invalid purpose, many people in the AI community think neural networks do not provide anything that cannot be obtained through other techniquesAlthough not an invalid purpose, many people in the AI community think neural networks do not provide anything that cannot be obtained through other techniques –It is hard to unravel the “intelligence” behind why the ANN works To study how the human brain works? Ironically, those studying neural networks with this in mind are more likely to contribute to the previous purposeIronically, those studying neural networks with this in mind are more likely to contribute to the previous purpose

Some Brain Facts Contains ~100,000,000,000 neuronsContains ~100,000,000,000 neurons Hippocampus CA3 region contains ~3,000,000 neuronsHippocampus CA3 region contains ~3,000,000 neurons Each neuron is connected to ~10,000 other neuronsEach neuron is connected to ~10,000 other neurons ~ (10 15 )(10 15 ) connections!~ (10 15 )(10 15 ) connections! Consumes ~20-30% of the body’s energyConsumes ~20-30% of the body’s energy Contains about 2% of the body’s massContains about 2% of the body’s mass Contains ~100,000,000,000 neuronsContains ~100,000,000,000 neurons Hippocampus CA3 region contains ~3,000,000 neuronsHippocampus CA3 region contains ~3,000,000 neurons Each neuron is connected to ~10,000 other neuronsEach neuron is connected to ~10,000 other neurons ~ (10 15 )(10 15 ) connections!~ (10 15 )(10 15 ) connections! Consumes ~20-30% of the body’s energyConsumes ~20-30% of the body’s energy Contains about 2% of the body’s massContains about 2% of the body’s mass