Learning 3: Bayesian Networks Reinforcement Learning Genetic Algorithms based on material from Ray Mooney, Daphne Koller, Kevin Murphy.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Markov Decision Process
Genetic Algorithms (Evolutionary Computing) Genetic Algorithms are used to try to “evolve” the solution to a problem Generate prototype solutions called.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Planning under Uncertainty
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Reinforcement Learning
Game Playing CSC361 AI CSC361: Game Playing.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Statistical Learning Methods Russell and Norvig: Chapter 20 (20.1,20.2,20.4,20.5) CMSC 421 – Fall 2006.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
1 Kunstmatige Intelligentie / RuG KI Reinforcement Learning Johan Everts.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Reinforcement Learning Game playing: So far, we have told the agent the value of a given board position. How can agent learn which positions are important?
The People Have Spoken.... Administrivia Final Project proposal due today Undergrad credit: please see me in office hours Dissertation defense announcements.
Learning: Reinforcement Learning Russell and Norvig: ch 21 CMSC421 – Fall 2005.
Reinforcement Learning (1)
Kunstmatige Intelligentie / RuG KI Reinforcement Learning Sander van Dijk.
Chapter 6: Transform and Conquer Genetic Algorithms The Design and Analysis of Algorithms.
Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006.
CPSC 7373: Artificial Intelligence Lecture 11: Reinforcement Learning Jiang Bian, Fall 2012 University of Arkansas at Little Rock.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Genetic Algorithm.
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
Reinforcement Learning
1 Dr. Itamar Arel College of Engineering Electrical Engineering & Computer Science Department The University of Tennessee Fall 2009 August 24, 2009 ECE-517:
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Introduction to GAs: Genetic Algorithms How to apply GAs to SNA? Thank you for all pictures and information referred.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
Derivative Free Optimization G.Anuradha. Contents Genetic Algorithm Simulated Annealing Random search method Downhill simplex method.
Genetic Algorithms. Evolutionary Methods Methods inspired by the process of biological evolution. Main ideas: Population of solutions Assign a score or.
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
Evolution Programs (insert catchy subtitle here).
1 Genetic Algorithms and Ant Colony Optimisation.
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
Genetic Algorithms What is a GA Terms and definitions Basic algorithm.
EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
Reinforcement learning (Chapter 21)
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Reinforcement Learning Based on slides by Avi Pfeffer and David Parkes.
1 Chapter 17 2 nd Part Making Complex Decisions --- Decision-theoretic Agent Design Xin Lu 11/04/2002.
COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Reinforcement Learning Guest Lecturer: Chengxiang Zhai Machine Learning December 6, 2001.
1 Comparative Study of two Genetic Algorithms Based Task Allocation Models in Distributed Computing System Oğuzhan TAŞ 2005.
Deep Learning and Deep Reinforcement Learning. Topics 1.Deep learning with convolutional neural networks 2.Learning to play Atari video games with Deep.
REINFORCEMENT LEARNING Unsupervised learning 1. 2 So far ….  Supervised machine learning: given a set of annotated istances and a set of categories,
EVOLUTIONARY SYSTEMS AND GENETIC ALGORITHMS NAME: AKSHITKUMAR PATEL STUDENT ID: GRAD POSITION PAPER.
Genetic Algorithms.
Reinforcement Learning (1)
Reinforcement learning (Chapter 21)
Markov Decision Processes
Announcements Homework 3 due today (grace period through Friday)
Reinforcement learning
Metaheuristic methods and their applications. Optimization Problems Strategies for Solving NP-hard Optimization Problems What is a Metaheuristic Method?
Reinforcement Learning
CS 188: Artificial Intelligence Spring 2006
Searching for solutions: Genetic Algorithms
CS 416 Artificial Intelligence
Presentation transcript:

Learning 3: Bayesian Networks Reinforcement Learning Genetic Algorithms based on material from Ray Mooney, Daphne Koller, Kevin Murphy

Project Teams: 2-3 Two components: Agent/World Program Learning Agent

Agent/World Program Implements a function State x Action -> Reward State x Action -> State’ Representation: State – 8 bits may think of this as 8 binary features, 4 features with 4 possible values, etc. Action – 4 possible actions Reward – integer range from -10…10

Example: Boring_Agent State: mood: happy, sad, mad, bored physical: hungry, sleepy personality: optimist, pessimist Action: smile, hit, tell-joke, tickle State x Action -> Reward: s x a -> 0 State x Action -> State s x a -> bored = T, all others same as s

Example: Agent_with_Personality State: mood: happy, sad, mad, bored physical: hungry, sleepy personality: optimist, pessimist Action: smile, hit, tell-joke, tickle State x Action -> Reward: mood = ? x physical <> sleepy, smile -> 5 mood = ? x physical <> sleepy, tell-joke -> 10 w/ prob..8, -10 w/ prob..2 etc. State x Action -> State s x a -> mood = happy if reward is positive s x a -> mood = mad if action is hit etc.

Example: Robot Navigation State: location Action: forward, back, left, right State x Action -> Reward: define rewards of states in your grid State x Action -> State defined by movements

Learning Agent Calls Agent Program to get a training set Learns a function Calls Agent Program to get an evaluation set Computes Optimal set of actions Calls Agent Program to evaluate the set of actions

Schedule Thursday, Dec. 4 In class, hand in a 1 page description of your agent, and some notes on your learning approach Friday, Dec. 5 Electronically submit your agent/world program Thursday, Dec. 12 Submit your learning agent

Learning, cont.

Learning Bayesian networks Inducer Data + Prior information E R B A C.9.1 e b e be b b e BEP(A | E,B) we won’t cover in this class… 

Naïve Bayes aka Idiot Bayes particularly simple BN makes overly strong independence assumptions but works surprisingly well in practice…

Bayesian Diagnosis suppose we want to make a diagnosis D and there are n possible mutually exclusive diagnosis d 1, …, d n suppose there are m boolean symptoms, E 1, …, E m how do we make diagnosis? we need:

Naïve Bayes Assumption Assume each piece of evidence (symptom) is independent give the diagnosis then what is the structure of the corresponding BN?

Naïve Bayes Example possible diagnosis: Allergy, Cold and OK possible symptoms: Sneeze, Cough and Fever WellColdAllergy P(d) P(sneeze|d) P(cough|d) P(fever|d) my symptoms are: sneeze & cough, what is the diagnosis?

Learning the Probabilities aka parameter estimation we need P(d i ) – prior P(e k |d i ) – conditional probability use training data to estimate

Maximum Likelihood Estimate (MLE) use frequencies in training set to estimate: where n x is shorthand for the counts of events in training set

Example: DSneezeCoughFever Allergyyesno Wellyesno Allergyyesnoyes Allergyyesno Coldyes Allergyyesno Wellno Wellno Allergyno Allergyyesno what is: P(Allergy)? P(Sneeze| Allergy)? P(Cough| Allergy)?

Laplace Estimate (smoothing) use smoothing to eliminate zeros: where n is number of possible values for d and e is assumed to have 2 possible values many other smoothing schemes…

Comments Generally works well despite blanket assumption of independence Experiments show competitive with decision trees on some well known test sets (UCI) handles noisy data

Learning more complex Bayesian networks Two subproblems: learning structure: combinatorial search over space of networks learning parameters values: easy if all of the variables are observed in the training set; harder if there are ‘hidden variables’

Clustering Aka ‘unsupervised’ learning Find natural partitions of the data

Reinforcement Learning supervised learning is simplest and best-studied type of learning another type of learning tasks is learning behaviors when we don’t have a teacher to tell us how the agent has a task to perform; it takes some actions in the world; at some later point gets feedback telling it how well it did on performing task the agent performs the same task over and over again it gets carrots for good behavior and sticks for bad behavior called reinforcement learning because the agent gets positive reinforcement for tasks done well and negative reinforcement for tasks done poorly

Reinforcement Learning The problem of getting an agent to act in the world so as to maximize its rewards. Consider teaching a dog a new trick: you cannot tell it what to do, but you can reward/punish it if it does the right/wrong thing. It has to figure out what it did that made it get the reward/punishment, which is known as the credit assignment problem. We can use a similar method to train computers to do many tasks, such as playing backgammon or chess, scheduling jobs, and controlling robot limbs.

Reinforcement Learning for blackjackblackjack for robot motionrobot motion for controllercontroller

Formalization we have a state space S we have a set of actions a1, …, ak we want to learn which action to take at every state in the space At the end of a trial, we get some reward, positive or negative want the agent to learn how to behave in the environment, a mapping from states to actions example: Alvinn state: configuration of the car learn a steering action for each state

Accessible or observable state Repeat:  s  sensed state  If s is terminal then exit  a  choose action (given s)  Perform a Reactive Agent Algorithm

Policy (Reactive/Closed-Loop Strategy) A policy  is a complete mapping from states to actions

Repeat:  s  sensed state  If s is terminal then exit  a   (s)  Perform a Reactive Agent Algorithm

Approaches learn policy directly– function mapping from states to actions learn utility values for states, the value function

Value Function An agent knows what state it is in and it has a number of actions it can perform in each state. Initially it doesn't know the value of any of the states. If the outcome of performing an action at a state is deterministic then the agent can update the utility value U() of a state whenever it makes a transition from one state to another (by taking what it believes to be the best possible action and thus maximizing): U(oldstate) = reward + U(newstate) The agent learns the utility values of states as it works its way through the state space.

Exploration The agent may occasionally choose to explore suboptimal moves in the hopes of finding better outcomes. Only by visiting all the states frequently enough can we guarantee learning the true values of all the states. A discount factor is often introduced to prevent utility values from diverging and to promote the use of shorter (more efficient) sequences of actions to attain rewards. The update equation using a discount factor gamma is: U(oldstate) = reward + gamma * U(newstate) Normally gamma is set between 0 and 1.

Q-Learning augments value iteration by maintaining a utility value Q(s,a) for every action at every state. utility of a state U(s) or Q(s) is simply the maximum Q value over all the possible actions at that state.

Q-Learning foreach state s foreach action a Q(s,a)=0 s=currentstate do forever a = select an action do action a r = reward from doing a t = resulting state from doing a Q(s,a) += alpha * (r + gamma * (Q(t)-Q(s,a)) s = t Notice that a learning coefficient, alpha, has been introduced into the update equation. Normally alpha is set to a small positive constant less than 1.

Selecting an Action simply choose action with highest expected utility? problem: action has two effects gains reward on current sequence information received and used in learning for future sequences trade-off immediate good for long-term well-being stuck in a rut jumping off a cliff just because you’ve never done it before…

Exploration policy wacky approach: act randomly in hopes of eventually exploring entire environment greedy approach: act to maximize utility using current estimate need to find some balance: act more wacky when agent has little idea of environment and more greedy when the model is close to correct example: one-armed bandits…

RL Summary active area of research both in OR and AI several more sophisticated algorithms that we have not discussed applicable to game-playing, robot controllers, others

Genetic Algorithms use evolution analogy to search for ‘successful’ individuals individuals may be policy (like RL), a computer program (in this case called genetic programming), a decision tree, a neural net, etc. success is measured in terms of fitness function. start with a pool of individuals and use selection and reproduction to evolve the pool

Basic Algorithm 1. [Start] Generate random population of n individuals (suitable solutions for the problem) 2. [Fitness] Evaluate the fitness f(x) of each individual 3. [New population] Create a new population by repeating following steps until the new population is complete 4. [Selection] Select two parents from a population according to their fitness (the better fitness, the bigger chance to be selected) 5. [Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents. 6. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome). 7. [Accepting] Place new offspring in a new population 8. [Loop] Go to step 2

GA Issues what is fitness function? how is an individual represented? how are individuals selected? how do individuals reproduce?

GA summary easy to apply to a wide range of problems results good on some problems and not so hot on others “neural networks are the second best way of doing just about anything… and genetic algorithms are the third”