CS 188: Artificial Intelligence Fall 2008 Lecture 27: Conclusion 12/9/2008 Dan Klein – UC Berkeley 1.

CS 188: Artificial Intelligence Fall 2008 Lecture 27: Conclusion 12/9/2008 Dan Klein – UC Berkeley 1

Announcements  Check your grades on glookup!  P1-4 up, 5-6 soon  W1-8 up, 9 soon  Email staff list with any questions  Final prep page up on web  Main review sessions: 12/15 and 12/17 in 120 Latimer  We’ll have extra OHs and review sessions as well 2

Autonomous Robotics 3

Policy Search  Problem: often the feature-based policies that work well aren’t the ones that approximate V / Q best  E.g. your value functions from project 2 were probably horrible estimates of future rewards, but they still produced good decisions  Same distinction between modeling and prediction showed up in classification (where?)  Solution: learn the policy that maximizes rewards rather than the value that predicts rewards  This is the idea behind policy search, such as what controlled the upside-down helicopter 4 [demo]

POMDPs  Up until now:  Search / MDPs: decision making when the world is fully observable (even if the actions are non- deterministic  Probabilistic reasoning: computing beliefs in a static world  Learning: discovering how the world works  What about acting under uncertainty?  In general, the problem formalization is the partially observable Markov decision process (POMDP)  A simple case: value of information 5

POMDPs  MDPs have:  States S  Actions A  Transition fn P(s’|s,a) (or T(s,a,s’))  Rewards R(s,a,s’)  POMDPs add:  Observations O  Observation function P(o|s) (or O(s,o))  POMDPs are MDPs over belief states b (distributions over S) a s s, a s,a,s’ s’ a b b, a o b’ 6

Example: Ghostbusters  In (static) Ghostbusters:  Belief state determined by evidence to date {e}  Tree really over evidence sets  Probabilistic reasoning needed to predict new evidence given past evidence  Solving POMDPs  One way: use truncated expectimax to compute approximate value of actions  What if you only considered busting or one sense followed by a bust?  You get the VPI agent from project 4! a {e} e, a e’ {e, e’} a b b, a o b’ a bust {e} {e}, a sense e’ {e, e’} a sense U(a bust, {e}) a bust U(a bust, {e, e’}) 7

More Generally  General solutions map belief functions to actions  Can divide regions of belief space (set of belief functions) into policy regions (gets complex quickly)  Can build approximate policies using discretization methods  Can factor belief functions in various ways  Overall, POMDPs are very (actually PSACE-) hard  Most real problems are POMDPs, but we can rarely solve then in general! 8

Pacman Contest  Team A:  Royal Chan  Long Cheng  Wei Hua Peng  Jonathan Kotker  Adam Lin  Jillian Moore  Team B:  Willy Wong  Kevin Lin  Team C:  Yiding Jia  Team D:  Larry Pezzaglia  Gerard Sunga  Samuel Zats  Team E:  Steven Schlansker  Jinna Lei  Sally Ahn  Tina Yau  Stewart Liu  Wayne Lin  Team F:  Niels Joubert  Rohit Nambiar  Tim Chen  Michael Ngo  Team G:  William Li  York Wu  Team H:  Dan Kinder  Evan Rosky  John Wang  Aaron Hong 9

Pacman Contest  8 teams, 26 people qualified  3 rd Place: Niels Joubert, Michael Ngo, Rohit Nambiar, Tim Chen  What they did: split offense / defense  Strong offense: feature-based balance between eating dots against helping defend 10

Pacman Contest  Blue Team: Yiding Jia  Red Team: William Li, York Wu  What they did (Yiding):  Reflex plus tracking!  Probabilistic inference, particle filtering, consider direct ghost observations and dot vanishings  Defense: move toward distributions, hope to get better info and hunt, stay near remaining food  Offense: move toward guard-free dots, flee guard clouds  What they did (William, York):  … ?? [DEMO] 11

Example: Stratagus [DEMO] 12

 Stratagus: Example of a large RL task  Stratagus is hard for reinforcement learning algorithms  > 10 100 states  > 10 30 actions at each point  Time horizon ≈ 10 4 steps  Stratagus is hard for human programmers  Typically takes several person-months for game companies to write computer opponent  Still, no match for experienced human players  Programming involves much trial and error  Hierarchical RL  Humans supply high-level prior knowledge using partial program  Learning algorithm fills in the details Hierarchical RL [From Bhaskara Marthi’s thesis] 13

(defun top () (loop (choose (gather-wood) (gather-gold)))) (defun gather-wood () (with-choice (dest *forest-list*) (nav dest) (action ‘get-wood) (nav *base-loc*) (action ‘dropoff))) (defun gather-gold () (with-choice (dest *goldmine-list*) (nav dest)) (action ‘get-gold) (nav *base-loc*)) (action ‘dropoff))) (defun nav (dest) (until (= (pos (get-state)) dest) (with-choice (move ‘(N S E W NOOP)) (action move)))) Partial “Alisp” Program 14

Hierarchical RL  Define a hierarchical Q-function which learns a linear feature-based mini-Q-function at each choice point  Very good at balancing resources and directing rewards to the right region  Still not very good at the strategic elements of these kinds of games (i.e. the Markov game aspect) [DEMO] 15

Bugman  AI = Animal Intelligence?  Wim van Eck at Leiden University  Pacman controlled by a human  Ghosts controlled by crickets  Vibrations drive crickets toward or away from Pacman’s location http://pong.hku.nl/~wim/bugman.htm [DEMO] 16

Where to go next?  Congratulations, you’ve seen the basics of modern AI!  More directions:  Robotics / vision / IR / language: cs194  There will be a web form to get more info, from the 188 page  Machine learning: cs281a  Cognitive modeling: cog sci 131  NLP: 288  Vision: 280  … and more; ask if you’re interested 17

That’s It!  Help us out with some course evaluations  Have a good break, and always maximize your expected utilities! 18

CS 188: Artificial Intelligence Fall 2008 Lecture 27: Conclusion 12/9/2008 Dan Klein – UC Berkeley 1.

Similar presentations

Presentation on theme: "CS 188: Artificial Intelligence Fall 2008 Lecture 27: Conclusion 12/9/2008 Dan Klein – UC Berkeley 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 188: Artificial Intelligence Fall 2008 Lecture 27: Conclusion 12/9/2008 Dan Klein – UC Berkeley 1.

Similar presentations

Presentation on theme: "CS 188: Artificial Intelligence Fall 2008 Lecture 27: Conclusion 12/9/2008 Dan Klein – UC Berkeley 1."— Presentation transcript:

Similar presentations

About project

Feedback