CS-424 Gregory Dudek Today’s Lecture Administrative Details Learning –Decision trees: cleanup & details –Belief nets –Sub-symbolic learning Neural networks.

CS-424 Gregory Dudek Today’s Lecture Administrative Details Learning –Decision trees: cleanup & details –Belief nets –Sub-symbolic learning Neural networks

CS-424 Gregory Dudek Administrativia Assignment 3 now available. Due in 1 week. The game for the final project has been defined. It’s description can be found via the course home page at: Examine the game now, try it, see what good strategies might be. Note: the normal late policy will not apply to the project. –You **must** submit the electronic (executable) on time, or it may not be evaluated (i.e. you get zero)! –It must run on LINUX. Be certain to compile and test it on one of the linux machines in the lab well before it’s due. If you are developing on another platform, regularly test on linux during development.

CS-424 Gregory Dudek ID3 (more…) Last class, discussed using entropy to select a question for building a decision tree. This idea first developed by Quinlan (1979) in the ID3 system, later improved resulting in C4.5 Recap: –Entropy for classification into sets p+ and p- is I(p+,p-) –Consider information gain per attribute A. For each subtree A i we have a few bits given by the distribution of cases on the subtree I(p i +,p i -). To fully classify all sub-cases, we need Remainder(A) =  i weight(i) I(p i +,p i -) Thus, info gain is what’s left: Gain = I(p+,p-) - Remainder(A)

CS-424 Gregory Dudek Final thoughts on entropy... Provoked by the seminar yesterday. Idea: –Entropy tells you about unpredictability, or randomness. –When selecting a question, the one with highest entropy will carry the most information with respect to what you knew, because the answer is hardest to predict. –When asking if we know about something, then high entropy is bad –Consider the PDF of a robot’s position estimate… More entropy means more uncertainty.

CS-424 Gregory Dudek Training & testing When constructing a learning system, we what to generalize to new example. (already discussed ad nauseam) How can we tell if it’s working? –Look at how well we do with our training data. But…. What if we just learned a “quirk” of the data? Overfitting? Bad features? –(tank classification example; table lookup) –Look at how we do on a set of examples never used for any training: a “test set”. But… what if we can’t afford the data? –Cross validation: learner L on training set X e(L:X) =  i in X  S=X-i error(L(S) on case i) 2

CS-424 Gregory Dudek Simple functions? Is there a fixed circuit network topology that can be used to represent a family of functions? Yes! Neural-like networks (a.k.a. artificial neural networks) allow us this flexibility and more; we can represent arbitrary families of continuous functions using fixed topology networks.

CS-424 Gregory Dudek Belief networks (ch. 15) - briefly We will cover only R&N Section 15.1 & 15.2 (briefly), and then segue to chapter 19. You should read 15.3. This will be cursory coverage only. A belief net is a formalism for describing probabilistic relationships (a.k.a. Bayes nets). A graph (in fact a DAG) G(V,E). Nodes are random variables. Directed edges indicate a node has direct influence on another node. Each node has an associated conditional probability table quantifying the effects of it’s “parents”. No directed cycles.

CS-424 Gregory Dudek Why? Objective: –Compute probabilities of variables of interest, query variables –Given observations of specific phenomena in the world, evidence variables. The net you get depends on how you construct it, not just the problem and probabilities. Seek compactness (fewer links, tighter clusters): called locally structured nets.

CS-424 Gregory Dudek See overheads...

CS-424 Gregory Dudek Issues Where do the probabilities come from? –They can be learned or inferred from data. Where does the causal structure come from (the topology)? –It’s (sometimes) very hard to learn. –Problem: lots of alternative topologies are possible. What’s really cause and what’s effect? Did it really rain because I brought my umbrella? Can a computer infer this (or the opposite) just from weather data? Both these topics are current research areas. Not in text

CS-424 Gregory Dudek Neural Networks? Artificial Neural Nets a.k.a. Connectionist Nets (connectionist learning) a.k.a. Sub-symbolic learning a.k.a. Perceptron learning (a special case)

CS-424 Gregory Dudek Networks that model the brain? Note there is an interesting connection to Bayes nets: it isn’t considered in the book. –Something to reflect on. Idea: model intelligence withour “jumping ahead” to symbolic representations. Related to earliest work on cybernetics.

CS-424 Gregory Dudek The idealized neuron Artificial neural networks come in several “flavors”. –Most of based on a simplified model of a neuron. A set of (many) inputs. One output. Output is a function of the sum on the inputs. –Typical functions: Weighted sum Threshold Gaussian

CS-424 Gregory Dudek Why neural nets? Motives: –We wish to create systems with abilities akin to those of the human mind. The mind is usually assumed to be be a direct consequence of the structure of the brain. –Let’s mimic the structure of the brain! –By using simple computing elements, we obtain a system that might scale up easily to parallel hardware. –Avoids (or solves?) the key unresolved problem of how to get from “signal domain” to symbolic representations. –Fault tolerance Not in text

CS-424 Gregory Dudek Not in text

CS-424 Gregory Dudek Real and fake neurons Signals in neurons are coded by “spike rate”. In ANN’s, inputs can be either: –0 or 1 (binary) –[0,1] –[-1,1] –R (real) Each input I i has an associated real- valued weight w i Learning by changing weights at synapses.

CS-424 Gregory Dudek Not in text

CS-424 Gregory Dudek Brains The brain seems divided into functional areas These are often seen as analogous to modules in a software system. Why would it be like this? (2 possible answers) –Evolution: incremental improvement easier in a modular system. –Advantage of combining complementary solutions. –It isn’t!

CS-424 Gregory Dudek Inductive bias? Where’s the inductive bias? –In the topology and architecture of the network. –In the learning rules. –In the input and output representation. –In the initial weights. Not in text

CS-424 Gregory Dudek Simple neural models Oldest ANN model is McCulloch-Pitts neuron [1943]. –Inputs are +1 or -1 with real-valued weights. –If sum of weighted inputs is > 0, then the neuron “fires” and gives +1 as an output. –Showed you can comput logical functions. –Relation to learning proposed (later!) by Donald Hebb [1949]. Perceptron model [Rosenblatt, 1958]. –Single-layer network with same kind of neuron. Firing when input is about a threshold: ∑x i w i >t. –Added a learning rule to allow weight selection. Not in text

CS-424 Gregory Dudek Perceptron nets

CS-424 Gregory Dudek Perceptron learning Perceptron learning: –Have a set of training examples (TS) encoded as input values (I.e. in the form of binary vectors) –Have a set of desired output values associated with these inputs. This is supervised learning. –Problem: how to adjust the weights to make the actual outputs match the training examples. NOTE: we to not allow the topology to change! [You should be thinking of a question here.] Intuition: when a perceptron makes a mistake, it’s weights are wrong. –Modify them so make the output bigger or smaller, as desired.

CS-424 Gregory Dudek Learning algorithm Desired T i Actual output O i Weight update formula (weight from unit j to i): W j,i = W j,I + k* x j * (T i - O i ) Where k is the learning rate. If the examples can be learned (encoded), then the perceptron learning rule will find the weights. –How? Gradient descent. Key thing to prove is the absence of local minima.

CS-424 Gregory Dudek Perceptrons: what can they learn? Only linearly separable functions [Minsky & Papert 1969]. N dimensions: N-dimensional hyperplane.

CS-424 Gregory Dudek More general networks Generalize in 3 ways: –Allow continuous output values [0,1] –Allow multiple layers. This is key to learning a larger class of functions. –Allow a more complicated function than thresholded summation [why??] Generalize the learning rule to accommodate this: let’s see how it works.

CS-424 Gregory Dudek The threshold The key variant: –Change threshold into a differentiable function –Sigmoid, known as a “soft non-linearity” (silly). M = ∑x i w i O = 1 / (1 + e -k M )

CS-424 Gregory Dudek Today’s Lecture Administrative Details Learning –Decision trees: cleanup & details –Belief nets –Sub-symbolic learning Neural networks.

Similar presentations

Presentation on theme: "CS-424 Gregory Dudek Today’s Lecture Administrative Details Learning –Decision trees: cleanup & details –Belief nets –Sub-symbolic learning Neural networks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS-424 Gregory Dudek Today’s Lecture Administrative Details Learning –Decision trees: cleanup & details –Belief nets –Sub-symbolic learning Neural networks.

Similar presentations

Presentation on theme: "CS-424 Gregory Dudek Today’s Lecture Administrative Details Learning –Decision trees: cleanup & details –Belief nets –Sub-symbolic learning Neural networks."— Presentation transcript:

Similar presentations

About project

Feedback