Advanced information retreival Chapter 02: Modeling - Neural Network Model Neural Network Model.

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Slides from: Doug Gray, David Poole
NEURAL NETWORKS Backpropagation Algorithm
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Kostas Kontogiannis E&CE
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Artificial Intelligence (CS 461D)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Neural Networks for Information Retrieval Hassan Bashiri May 2005.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
Artificial Neural Networks
CS 484 – Artificial Intelligence
Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Introduction to Directed Data Mining: Neural Networks
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Neural Networks Prepared by David Douglas, University of ArkansasHosted.
MSE 2400 EaLiCaRA Spring 2015 Dr. Tom Way
Artificial Neural Networks
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Networks
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Chapter 9 Neural Network.
Machine Learning Chapter 4. Artificial Neural Networks
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.
1 Machine Learning The Perceptron. 2 Heuristic Search Knowledge Based Systems (KBS) Genetic Algorithms (GAs)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
NEURAL NETWORKS FOR DATA MINING
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Najah Alshanableh. Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can.
Introduction to Neural Networks. Biological neural activity –Each neuron has a body, an axon, and many dendrites Can be in one of the two states: firing.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
EEE502 Pattern Recognition
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Artificial Neural Networks (ANN). Artificial Neural Networks First proposed in 1940s as an attempt to simulate the human brain’s cognitive learning processes.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
Advanced information retreival
Real Neurons Cell structures Cell body Dendrites Axon
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Latent Semantic Indexing
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning Today: Reading: Maria Florina Balcan
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Backpropagation.
Recuperação de Informação B
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Advanced information retreival Chapter 02: Modeling - Neural Network Model Neural Network Model

Neural Network Model A neural network is an oversimplified representation of the neuron interconnections in the human brain:  nodes are processing units  edges are synaptic connections  the strength of a propagating signal is modelled by a weight assigned to each edge  the state of a node is defined by its activation level  depending on its activation level, a node might issue an output signal

Neural Networks Neural Networks Neural Networks –Complex learning systems recognized in animal brains –Single neuron has simple structure –Interconnected sets of neurons perform complex learning tasks –Human brain has synaptic connections –Artificial Neural Networks attempt to replicate non-linear learning found in nature Dendrites Cell Body Axon

Neural Networks (cont’d) –Dendrites gather inputs from other neurons and combine information –Then generate non-linear response when threshold reached –Signal sent to other neurons via axon –Artificial neuron model is similar –Data inputs (x i ) are collected from upstream neurons input to combination function (sigma)

Neural Networks (cont’d) –Activation function reads combined input and produces non- linear response (y) –Response channeled downstream to other neurons What problems applicable to Neural Networks? What problems applicable to Neural Networks? –Quite robust with respect to noisy data –Can learn and work around erroneous data –Results opaque to human interpretation –Often require long training times

Input and Output Encoding –Neural Networks require attribute values encoded to [0, 1] Numeric Numeric –Apply Min-max Normalization to continuous variables –Works well when Min and Max known –Also assumes new data values occur within Min-Max range –Values outside range may be rejected or mapped to Min or Max

Input and Output Encoding (cont’d) Output Output –Neural Networks always return continuous values [0, 1] –Many classification problems have two outcomes –Solution uses threshold established a priori in single output node to separate classes –For example, target variable is “leave” or “stay” –Threshold value is “leave if output >= 0.67” –Single output node value = 0.72 classifies record as “leave”

Simple Example of a Neural Network –Neural Network consists of layered, feedforward, completely connected network of nodes –Feedforward restricts network flow to single direction –Flow does not loop or cycle –Network composed of two or more layers Node 1 Node 2 Node 3 Node B Node A Node Z W 1A W 1B W 2A W 2B W AZ W 3A W 3B W 0A W BZ W 0Z W 0B Input Layer Hidden Layer Output Layer

Simple Example of a Neural Network (cont’d) –Most networks have Input, Hidden, Output layers –Network may contain more than one hidden layer –Network is completely connected –Each node in given layer, connected to every node in next layer –Every connection has weight (W ij ) associated with it –Weight values randomly assigned 0 to 1 by algorithm –Number of input nodes dependent on number of predictors –Number of hidden and output nodes configurable

Simple Example of a Neural Network (cont) –Combination function produces linear combination of node inputs and connection weights to single scalar value –For node j, x ij is i th input –W ij is weight associated with i th input node –I+ 1 inputs to node j –x 1, x 2,..., x I are inputs from upstream nodes –x 0 is constant input value = 1.0 –Each input node has extra input W 0j x 0j = W 0j Node 1 Node 2 Node 3 Node B Node A Node Z W 1A W 1B W 2A W 2B W AZ W 3A W 3B W 0A W BZ W 0Z W 0B Input Layer Hidden Layer Output Layer

Simple Example of a Neural Network (cont’d) –The scalar value computed for hidden layer Node A equals –For Node A, net A = 1.32 is input to activation function –Neurons “fire” in biological organisms –Signals sent between neurons when combination of inputs cross threshold x 0 = 1.0W 0A = 0.5W 0B = 0.7W 0Z = 0.5 x 1 = 0.4W 1A = 0.6W 1B = 0.9W AZ = 0.9 x 2 = 0.2W 2A = 0.8W 2B = 0.8W BZ = 0.9 x 3 = 0.7W 3A = 0.6W 3B = 0.4

Simple Example of a Neural Network (cont’d) –Firing response not necessarily linearly related to increase in input stimulation –Neural Networks model behavior using non-linear activation function –Sigmoid function most commonly used –In Node A, sigmoid function takes net A = 1.32 as input and produces output

Simple Example of a Neural Network (cont’d) –Node A outputs along connection to Node Z, and becomes component of net Z –Before net Z is computed, contribution from Node B required –Node Z combines outputs from Node A and Node B, through net Z

Simple Example of a Neural Network (cont’d) –Inputs to Node Z not data attribute values –Rather, outputs are from sigmoid function in upstream nodes –Value output from Neural Network on first pass –Represents predicted value for target variable, given first observation

Sigmoid Activation Function –Sigmoid function combines nearly linear, curvilinear, and nearly constant behavior depending on input value –Function nearly linear for domain values -1 < x < 1 –Becomes curvilinear as values move away from center –At extreme values, f(x) is nearly constant –Moderate increments in x produce variable increase in f(x), depending on location of x –Sometimes called “Squashing Function” –Takes real-valued input and returns values [0, 1]

Back-Propagation –Neural Networks are supervised learning method –Require target variable –Each observation passed through network results in output value –Output value compared to actual value of target variable –(Actual – Output) = Error –Prediction error analogous to residuals in regression models –Most networks use Sum of Squares (SSE) to measure how well predictions fit target values

Back-Propagation (cont’d) –Squared prediction errors summed over all output nodes, and all records in data set –Model weights constructed that minimize SSE –Actual values that minimize SSE are unknown –Weights estimated, given the data set

Back-Propagation Rules –Back-propagation percolates prediction error for record back through network –Partitioned responsibility for prediction error assigned to various connections –Back-propagation rules defined (Mitchell)

Back-Propagation Rules (cont’d) –Error responsibility computed using partial derivative of the sigmoid function with respect to net j –Values take one of two forms –Rules show why input values require normalization –Large input values x ij would dominate weight adjustment –Error propagation would be overwhelmed, and learning stifled

Example of Back-Propagation –Recall that first pass through network yielded output = –Assume actual target value = 0.8, and learning rate = 0.01 –Prediction error = = –Neural Networks use stochastic back-propagation –Weights updated after each record processed by network –Adjusting the weights using back-propagation shown next –Error responsibility for Node Z, an output node, found first Node 1 Node 2 Node 3 Node B Node A Node Z W 1A W 1B W 2A W 2B W AZ W 3A W 3B W 0A W BZ W 0Z W 0B Input Layer Hidden Layer Output Layer

Example of Back-Propagation (cont’d) –Now adjust “constant” weight w 0Z using rules –Move upstream to Node A, a hidden layer node –Only node downstream from Node A is Node Z

Example of Back-Propagation (cont’d) –Adjust weight w AZ using back-propagation rules –Connection weight between Node A and Node Z adjusted from 0.9 to –Next, Node B is hidden layer node –Only node downstream from Node B is Node Z

Example of Back-Propagation (cont’d) –Adjust weight w BZ using back-propagation rules –Connection weight between Node B and Node Z adjusted from 0.9 to –Similarly, application of back-propagation rules continues to input layer nodes –Weights {w 1A, w 2A, w 3A, w 0A } and {w 1B, w 2B, w 3B, w 0B } updated by process

Example of Back-Propagation (cont’d) –Now, all network weights in model are updated –Each iteration based on single record from data set Summary Summary –Network calculated predicted value for target variable –Prediction error derived –Prediction error percolated back through network –Weights adjusted to generate smaller prediction error –Process repeats record by record

Termination Criteria –Many passes through data set performed –Constantly adjusting weights to reduce prediction error –When to terminate? –Stopping criterion may be computational “clock” time? –Short training times likely result in poor model –Terminate when SSE reaches threshold level? –Neural Networks are prone to overfitting –Memorizing patterns rather than generalizing –And …

Learning Rate –Recall Learning Rate (Greek “eta”) is a constant –Helps adjust weights toward global minimum for SSE Small Learning Rate Small Learning Rate –With small learning rate, weight adjustments small –Network takes unacceptable time converging to solution Large Learning Rate Large Learning Rate –Suppose algorithm close to optimal solution –With large learning rate, network likely to “overshoot” optimal solution

Neural Network for IR: From the work by Wilkinson & Hingston, SIGIR’91 Document Terms Query Terms Documents kaka kbkb kckc kaka kbkb kckc k1k1 ktkt d1d1 djdj d j+1 dNdN

Neural Network for IR Three layers network Signals propagate across the network First level of propagation:  Query terms issue the first signals  These signals propagate accross the network to reach the document nodes Second level of propagation:  Document nodes might themselves generate new signals which affect the document term nodes  Document term nodes might respond with new signals of their own

Quantifying Signal Propagation Normalize signal strength (MAX = 1) Query terms emit initial signal equal to 1 Weight associated with an edge from a query term node ki to a document term node ki:  Wiq  Wiq = wiq sqrt (  i wiq ) Weight associated with an edge from a document term node ki to a document node dj:  Wij  Wij = wij sqrt (  i wij ) 2 2

Quantifying Signal Propagation WiqWij After the first level of signal propagation, the activation level of a document node dj is given by:  i Wiq Wij =  i wiq wij sqrt (  i wiq ) * sqrt (  i wij )  which is exactly the ranking of the Vector model New signals might be exchanged among document term nodes and document nodes in a process analogous to a feedback cycle A minimum threshold should be enforced to avoid spurious signal generation 2 22

Conclusions Model provides an interesting formulation of the IR problem Model has not been tested extensively It is not clear the improvements that the model might provide