Download presentation
Presentation is loading. Please wait.
Published byGeorge Stephens Modified over 8 years ago
1
CS 188: Artificial Intelligence Fall 2008 Lecture 27: Conclusion 12/9/2008 Dan Klein – UC Berkeley 1
2
Announcements Check your grades on glookup! P1-4 up, 5-6 soon W1-8 up, 9 soon Email staff list with any questions Final prep page up on web Main review sessions: 12/15 and 12/17 in 120 Latimer We’ll have extra OHs and review sessions as well 2
3
Autonomous Robotics 3
4
Policy Search Problem: often the feature-based policies that work well aren’t the ones that approximate V / Q best E.g. your value functions from project 2 were probably horrible estimates of future rewards, but they still produced good decisions Same distinction between modeling and prediction showed up in classification (where?) Solution: learn the policy that maximizes rewards rather than the value that predicts rewards This is the idea behind policy search, such as what controlled the upside-down helicopter 4 [demo]
5
POMDPs Up until now: Search / MDPs: decision making when the world is fully observable (even if the actions are non- deterministic Probabilistic reasoning: computing beliefs in a static world Learning: discovering how the world works What about acting under uncertainty? In general, the problem formalization is the partially observable Markov decision process (POMDP) A simple case: value of information 5
6
POMDPs MDPs have: States S Actions A Transition fn P(s’|s,a) (or T(s,a,s’)) Rewards R(s,a,s’) POMDPs add: Observations O Observation function P(o|s) (or O(s,o)) POMDPs are MDPs over belief states b (distributions over S) a s s, a s,a,s’ s’ a b b, a o b’ 6
7
Example: Ghostbusters In (static) Ghostbusters: Belief state determined by evidence to date {e} Tree really over evidence sets Probabilistic reasoning needed to predict new evidence given past evidence Solving POMDPs One way: use truncated expectimax to compute approximate value of actions What if you only considered busting or one sense followed by a bust? You get the VPI agent from project 4! a {e} e, a e’ {e, e’} a b b, a o b’ a bust {e} {e}, a sense e’ {e, e’} a sense U(a bust, {e}) a bust U(a bust, {e, e’}) 7
8
More Generally General solutions map belief functions to actions Can divide regions of belief space (set of belief functions) into policy regions (gets complex quickly) Can build approximate policies using discretization methods Can factor belief functions in various ways Overall, POMDPs are very (actually PSACE-) hard Most real problems are POMDPs, but we can rarely solve then in general! 8
9
Pacman Contest Team A: Royal Chan Long Cheng Wei Hua Peng Jonathan Kotker Adam Lin Jillian Moore Team B: Willy Wong Kevin Lin Team C: Yiding Jia Team D: Larry Pezzaglia Gerard Sunga Samuel Zats Team E: Steven Schlansker Jinna Lei Sally Ahn Tina Yau Stewart Liu Wayne Lin Team F: Niels Joubert Rohit Nambiar Tim Chen Michael Ngo Team G: William Li York Wu Team H: Dan Kinder Evan Rosky John Wang Aaron Hong 9
10
Pacman Contest 8 teams, 26 people qualified 3 rd Place: Niels Joubert, Michael Ngo, Rohit Nambiar, Tim Chen What they did: split offense / defense Strong offense: feature-based balance between eating dots against helping defend 10
11
Pacman Contest Blue Team: Yiding Jia Red Team: William Li, York Wu What they did (Yiding): Reflex plus tracking! Probabilistic inference, particle filtering, consider direct ghost observations and dot vanishings Defense: move toward distributions, hope to get better info and hunt, stay near remaining food Offense: move toward guard-free dots, flee guard clouds What they did (William, York): … ?? [DEMO] 11
12
Example: Stratagus [DEMO] 12
13
Stratagus: Example of a large RL task Stratagus is hard for reinforcement learning algorithms > 10 100 states > 10 30 actions at each point Time horizon ≈ 10 4 steps Stratagus is hard for human programmers Typically takes several person-months for game companies to write computer opponent Still, no match for experienced human players Programming involves much trial and error Hierarchical RL Humans supply high-level prior knowledge using partial program Learning algorithm fills in the details Hierarchical RL [From Bhaskara Marthi’s thesis] 13
14
(defun top () (loop (choose (gather-wood) (gather-gold)))) (defun gather-wood () (with-choice (dest *forest-list*) (nav dest) (action ‘get-wood) (nav *base-loc*) (action ‘dropoff))) (defun gather-gold () (with-choice (dest *goldmine-list*) (nav dest)) (action ‘get-gold) (nav *base-loc*)) (action ‘dropoff))) (defun nav (dest) (until (= (pos (get-state)) dest) (with-choice (move ‘(N S E W NOOP)) (action move)))) Partial “Alisp” Program 14
15
Hierarchical RL Define a hierarchical Q-function which learns a linear feature-based mini-Q-function at each choice point Very good at balancing resources and directing rewards to the right region Still not very good at the strategic elements of these kinds of games (i.e. the Markov game aspect) [DEMO] 15
16
Bugman AI = Animal Intelligence? Wim van Eck at Leiden University Pacman controlled by a human Ghosts controlled by crickets Vibrations drive crickets toward or away from Pacman’s location http://pong.hku.nl/~wim/bugman.htm [DEMO] 16
17
Where to go next? Congratulations, you’ve seen the basics of modern AI! More directions: Robotics / vision / IR / language: cs194 There will be a web form to get more info, from the 188 page Machine learning: cs281a Cognitive modeling: cog sci 131 NLP: 288 Vision: 280 … and more; ask if you’re interested 17
18
That’s It! Help us out with some course evaluations Have a good break, and always maximize your expected utilities! 18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.