Announcements Homework 3 due today (grace period through Friday)

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Markov Decision Process
Adversarial Search Reference: “Artificial Intelligence: A Modern Approach, 3 rd ed” (Russell and Norvig)
Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.
CS 484 – Artificial Intelligence
Reinforcement Learning
Minimax and Alpha-Beta Reduction Borrows from Spring 2006 CS 440 Lecture Slides.
Lecture 13 Last time: Games, minimax, alpha-beta Today: Finish off games, summary.
This time: Outline Game playing The minimax algorithm
Midterm Review CMSC421 – Fall CH1 Summary: Intro AI Definitions: dimensions human/rational think/act Three Major Components of AI Algorithms Representation.
Reinforcement Learning
Games with Chance Other Search Algorithms CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 3 Adapted from slides of Yoonsuck Choe.
Game Playing CSC361 AI CSC361: Game Playing.
Uninformed Search Reading: Chapter 3 by today, Chapter by Wednesday, 9/12 Homework #2 will be given out on Wednesday DID YOU TURN IN YOUR SURVEY?
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
MIDTERM REVIEW. Intelligent Agents Percept: the agent’s perceptual inputs at any given instant Percept Sequence: the complete history of everything the.
Reinforcement Learning
Quiz 4 : Minimax Minimax is a paranoid algorithm. True
Adversarial Search Chapter Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time limits.
MDPs (cont) & Reinforcement Learning
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
ARTIFICIAL INTELLIGENCE (CS 461D) Princess Nora University Faculty of Computer & Information Systems.
Adversarial Search. Game playing u Multi-agent competitive environment u The most common games are deterministic, turn- taking, two-player, zero-sum game.
Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.
Chapter 5 Adversarial Search. 5.1 Games Why Study Game Playing? Games allow us to experiment with easier versions of real-world situations Hostile agents.
CS 5751 Machine Learning Chapter 13 Reinforcement Learning1 Reinforcement Learning Control learning Control polices that choose optimal actions Q learning.
Search: Games & Adversarial Search Artificial Intelligence CMSC January 28, 2003.
1 Passive Reinforcement Learning Ruti Glick Bar-Ilan university.
Game Playing Why do AI researchers study game playing?
Adversarial Search and Game-Playing
Announcements Homework 1 Full assignment posted..
Last time: search strategies
Announcements Grader office hours posted on course website
Iterative Deepening A*
PENGANTAR INTELIJENSIA BUATAN (64A614)
CS 460 Spring 2011 Lecture 4.
Adversarial Search and Game Playing (Where making good decisions requires respecting your opponent) R&N: Chap. 6.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Artificial Intelligence Lecture No. 5
Markov Decision Processes
Expectimax Lirong Xia. Expectimax Lirong Xia Project 2 MAX player: Pacman Question 1-3: Multiple MIN players: ghosts Extend classical minimax search.
Games with Chance Other Search Algorithms
Artificial Intelligence
CS 188: Artificial Intelligence
Lecture 1B: Search.
Problem Solving and Searching
CAP 5636 – Advanced Artificial Intelligence
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Kevin Mason Michael Suggs
CS 188: Artificial Intelligence Fall 2007
Artificial Intelligence
Problem Solving and Searching
Reinforcement Learning
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
Expectimax Lirong Xia.
Minimax strategies, alpha beta pruning
Game Playing Fifth Lecture 2019/4/11.
Search.
SEG 4560 Midterm Review.
Search.
Games & Adversarial Search
Minimax strategies, alpha beta pruning
Adversarial Search Game Theory.
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Markov Decision Processes
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning
Presentation transcript:

Announcements Homework 3 due today (grace period through Friday) Midterm Friday!

Fundamental question for this lecture (and really this whole class!): How do you turn a real-world problem into an AI solution?

AI – Agents and Environments Much (though not all!) of AI is concerned with agents operating in environments. Agent – an entity that perceives and acts Environment – the problem setting

Fleshing it out Performance – measuring desired outcomes Environment – what populates the task’s world? Actuators – what can the agent act with? Sensors – how can the agent perceive the world?

Agent Function – how does it choose the action? What makes an Agent? Agent – an entity that perceives its environment through sensors, and acts on it with actuators. Agent ? Sensors Actuators Environment Percepts Actions Percepts are constrained by Sensors + Environment Actions are constrained by Actuators + Environment Agent Function – how does it choose the action?

What have we done so far? State-based search Determining an optimal sequence of actions to reach the goal Choose actions using knowledge about the goal Assumes a deterministic problem with known rules Single agent only

Uninformed search: BFS/DFS/UCS Breadth-first search Good: optimal, works well when many options, but not many actions required Bad: assumes all actions have equal cost Depth-first search Good: memory-efficient, works well when few options, but lots of actions required Bad: not optimal, can run infinitely, assumes all actions have equal cost Uniform-cost search Good: optimal, handles variable-cost actions Bad: explores all options, no information about goal location

Informed search: A* A* uses both backward costs and (estimates of) forward costs A* is optimal with admissible / consistent heuristics Heuristic design is key: often use relaxed problems

What have we done so far? Adversarial state-based search Determining the best next action, given what opponents will do Choose actions using knowledge about the goal Assumes a deterministic problem with known rules Multiple agents, but in a zero-sum competitive game

Adversarial Search (Minimax) Minimax search: A state-space search tree Players alternate turns Compute each node’s minimax value: the best achievable utility against a rational (optimal) adversary Use alpha-beta pruning for efficiency Can have multiple minimizing opponents Choose actions that yield best subtrees! Minimax values: computed recursively 5 max 2 5 min Terminal values: part of the game 8 2 5 6

What have we done so far? Knowledge-based agents Using existing knowledge to infer new things about the world Determining best next action, given changes to the world Choose actions using knowledge about the world Assumes a deterministic problem; may be able to infer rules Any number of agents, but limited to KB contents

+ Logical agents ? Knowledge Base Sensors Actuators Environment Percepts Actions + Knowledge Base Contains sentences describing the state of the world Supports inference and derivation Dynamic; changes as a result of agent interactions with the environment!

Summary: Knowledge-based agents Use knowledge about the world to choose actions Inference with existing knowledge + new observations Resolution, forward-backward chaining, instantiation and unification, etc. Knowledge represented in knowledge bases Contain statements about the world Structured with an ontology Represents how different kinds of objects/events/etc are categorized Supports higher-level inference Designed for a particular set of problems

What have we done so far? Reinforcement learning agents Iteratively update estimates of state/action pair expected utilities Adapting to random outcomes with expectation Choose actions using learned information from the world Handles stochastic and unknown problems Focused on learning process of a single agent

Reinforcement Learning Agent State: s Reward: r Actions: a Environment Basic idea: Receive feedback in the form of rewards Agent’s utility is defined by the reward function Must (learn to) act so as to maximize expected rewards All learning is based on observed samples of outcomes!

Generalized Policy Iteration Evaluation: For fixed current policy , use N starts into random (s0,a0) and run the policy to halt. Then, for each (s,a), go through the N episodes and find the M episodes that use (s,a) Update Q(s,a) with the average of discounted rewards starting at (s,a) on in each episode Si Improvement: With sampled Q values, get a better policy using policy extraction 𝑄 𝑘+1 𝜋 𝑖 𝑠,𝑎 ←𝑅 𝑠 + 1 𝑀 𝑖=1 𝑀 𝐷𝑖𝑠𝑐𝑜𝑢𝑛𝑡𝑒𝑑𝑅𝑒𝑤𝑎𝑟𝑑(𝑠,𝑎| 𝑆 𝑖 ) 𝜋 𝑖+1 𝑠 = argmax 𝑎∈𝐴(𝑠) 𝑠 ′ 𝑄 𝜋 𝑖 (𝑠,𝑎)

On-policy vs off-policy learning In Sarsa, we choose a specific next action a’ for updating Q(s,a). This is called on-policy learning, because we’re using the current policy to help decide an action for learning from. Like policy iteration last time We can also look at all possible next actions A(s’), and use the best one to update Q(s,a). This is off-policy learning, because the action we use for updating is separate from the action we actually take. Similar to value iteration, but active

Key things to know about Intelligent Agents How do we formulate AI problems? What is the structure of an agent? Search How do uninformed search methods work? How do informed search methods work? Compare to uninformed What makes a good heuristic? How do we deal with opposing agents?

Key things to know about Logic What methods are available to us for inference? What different formalisms do we know for representing knowledge? Why is structuring our knowledge useful? Decision Processes How do we handle stochastic outcomes of actions? How do we learn from observed experiences?