12/16 Projects Write-up – Generally, 1-2 pages – What was your idea? – How did you try to accomplish it? – What worked? – What you could have done differently?

Slides:

Advertisements

Similar presentations

Lirong Xia Reinforcement Learning (1) Tue, March 18, 2014.

Advertisements

Reinforcement Learning

Markov Decision Process

Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: School of EECS, Oregon State.

Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning Applications Summary.

Announcements  Homework 3: Games  Due tonight at 11:59pm.  Project 2: Multi-Agent Pacman  Has been released, due Friday 2/21 at 5:00pm.  Optional.

Ai in game programming it university of copenhagen Reinforcement Learning [Outro] Marco Loog.

Decision Theoretic Planning

The Bioloid Robot Project Presenters: Michael Gouzenfeld Alexey Serafimov Supervisor: Ido Cohen Winter Department of Electrical Engineering.

1 Reinforcement Learning Problem Week #3. Figure reproduced from the figure on page 52 in reference [1] 2 Reinforcement Learning Loop state Agent Environment.

MDP Presentation CS594 Automated Optimal Decision Making Sohail M Yousof Advanced Artificial Intelligence.

1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.

Markov Decision Processes

Planning under Uncertainty

Announcements Homework 3: Games Project 2: Multi-Agent Pacman

91.420/543: Artificial Intelligence UMass Lowell CS – Fall 2010

91.420/543: Artificial Intelligence UMass Lowell CS – Fall 2010 Lecture 16: MEU / Utilities Oct 8, 2010 A subset of Lecture 8 slides of Dan Klein – UC.

Reinforcement Learning

Planning in MDPs S&B: Sec 3.6; Ch. 4. Administrivia Reminder: Final project proposal due this Friday If you haven’t talked to me yet, you still have the.

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Reinforcement Learning Introduction Presented by Alp Sardağ.

Markov Decision Processes Value Iteration Pieter Abbeel UC Berkeley EECS TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

Department of Computer Science Undergraduate Events More

CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.

Welcome to MM212! Unit 1 Seminar To resize your pods: Place your mouse here. Left mouse click and hold. Drag to the right to enlarge the pod. To maximize.

Utility Theory & MDPs Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.

MAKING COMPLEX DEClSlONS

Reinforcement Learning

General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning Duke University Machine Learning Group Discussion Leader: Kai Ni June 17, 2005.

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Computer Science CPSC 502 Lecture 14 Markov Decision Processes (Ch. 9, up to 9.5.3)

Eick: Reinforcement Learning. Reinforcement Learning Introduction Passive Reinforcement Learning Temporal Difference Learning Active Reinforcement Learning.

Reinforcement Learning

Reinforcement Learning 主講人：虞台文 Content Introduction Main Elements Markov Decision Process (MDP) Value Functions.

MathCounts Workshop Ross Nelson 29 October Topics Practices Choosing a team Sprint Round Target Round Team Round Countdown Round State and Beyond.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Quiz 6: Utility Theory  Simulated Annealing only applies to continuous f(). False  Simulated Annealing only applies to differentiable f(). False  The.

MDPs (cont) & Reinforcement Learning

gaflier-uas-battles-feral-hogs/ gaflier-uas-battles-feral-hogs/

Decision Theoretic Planning. Decisions Under Uncertainty  Some areas of AI (e.g., planning) focus on decision making in domains where the environment.

Announcements  Upcoming due dates  Wednesday 11/4, 11:59pm Homework 8  Friday 10/30, 5pm Project 3  Watch out for Daylight Savings and UTC.

Reinforcement Learning Dynamic Programming I Subramanian Ramamoorthy School of Informatics 31 January, 2012.

Department of Computer Science Undergraduate Events More

Markov Decision Process (MDP)

MDPs and Reinforcement Learning. Overview MDPs Reinforcement learning.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Reinforcement Learning.

Possible actions: up, down, right, left Rewards: – 0.04 if non-terminal state Environment is observable (i.e., agent knows where it is) MDP = “Markov Decision.

Markov Decision Processes AIMA: 17.1, 17.2 (excluding ), 17.3.

CS 188: Artificial Intelligence Spring 2007 Lecture 21:Reinforcement Learning: II MDP 4/12/2007 Srini Narayanan – ICSI and UC Berkeley.

Reinforcement Learning. Overview Supervised Learning: Immediate feedback (labels provided for every input). Unsupervised Learning: No feedback (no labels.

Announcements  Homework 3: Games  Due tonight at 11:59pm.  Project 2: Multi-Agent Pacman  Has been released, due Friday 2/19 at 5:00pm.  Optional.

Def gradientDescent(x, y, theta, alpha, m, numIterations): xTrans = x.transpose() replaceMe =.0001 for i in range(0, numIterations): hypothesis = np.dot(x,

Reinforcement Learning  Basic idea:  Receive feedback in the form of rewards  Agent’s utility is defined by the reward function  Must learn to act.

CS 541: Artificial Intelligence Lecture X: Markov Decision Process Slides Credit: Peter Norvig and Sebastian Thrun.

Markov Decision Process (MDP)

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Markov Decision Processes

Timothy Boger and Mike Korostelev

Chapter 3: The Reinforcement Learning Problem

Quiz 6: Utility Theory Simulated Annealing only applies to continuous f(). False Simulated Annealing only applies to differentiable f(). False The 4 other.

Dr. Unnikrishnan P.C. Professor, EEE

CS 188: Artificial Intelligence Fall 2007

CS 188: Artificial Intelligence Fall 2008

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem

Markov Decision Processes

Markov Decision Processes

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 3

Presentation transcript:

12/16 Projects Write-up – Generally, 1-2 pages – What was your idea? – How did you try to accomplish it? – What worked? – What you could have done differently? Video – Show off your idea working – Target audience: Junior/Senior EECS students who haven’t taken this class Presentation – minutes per group – Summarize your write-up, show off your prototype

Bei, Derrick, Nan AR Drone should be able to detect someone’s face and follow the detected person. We will use OpenCV to implement it. Possible challenges: The features of the face should be harder to catch than one object with specific color as we have done in the lab. Different people have different facial features like different hair style, wearing glasses or not etc. We have to decide some key features which are easy for the AR Drone to detect. We need to develop a face detection algorithm which is applicable for the AR Drone to implement. The possible moving face may be one challenge since the AR Drone has to dynamically detect the face and follow it, which is some function we never develop in our labs. Keep a safe distance away from people’s face, which will involve calculating the AR Drone’s distance away from the face, while still being able to move and follow the person.

Duy, Justin, Jordan Search for a hoop or ring positioned in the room and fly through it.

Joshua Clark I propose to develop a way to read road signs. For the purposes of this class I will test using 3 signs. Two speed limit signs (similar signs) and a stop sign. All signs will be US DOT standard signs. I intend to have the robot react accordingly to the signs. This will be developed with the turtlebot in mind however if technical issues prevent the use of the turtlebot I will use the ardrone. The end result will be a robot that adjusts speed based upon speed limit signs and stops at stop signs.

Kyle, Brooks, James We would like to design in simulation, a path following program for the ARDrone. In Gazebo, we will use vision processing to follow a path of a given color until it reaches some sort of flag image that would trigger the command for the drone to land. Secondary Goals: Forcing the drone to search for the path rather than starting on the path Upon successful simulations of the above goals, we would then attempt the same goals in lab with the physical drones.

Andrew, Nick, Josh Woldstad For the final project, our team is considering using the AR drone and OpenCV. Our plan is to use the AR drone to simulate a pick-up and delivery system. We will mark an object with a tag to be picked up, and use the camera with opencv to detect the mark. The AR will find, pick-up, then drop off the marked object in a designated drop off zone. This project will allow us to create a small-scale system that is similar to projects that companies such as Amazon and other delivery services are testing.

Nick Hayward For my final project, I’d like to continue using the AR Drone and OpenCV. I’m not 100% sure on exactly what I want to do, but In know the direction I would like to take. After doing some research online as well as discussion with my group, I would like to try design something with delivery or pickup in mind. I’ll use the camera with opencv to detect the object I will pick up, or use it as the “drop off zone” either way I think this will be a fun project and will help me continue my skills with opencv as well as ROS.

Chris, Karan, Mike Our proposal is that the turtlebot simulates a person or thing that needs to be tracked for some reason (maybe lost, or under surveillance). The turtlebot moves in an arbitrary way, while the ardrone locates it, and follows it around. We plan on doing this by putting a tag on top of the turtlebot and using the built in detection on the ardrone to keep track of the tag. Then we'll simply try to keep the tag in the midpoint of the ardrone's bottom camera (moving the drone to make this the case). Our idea is that this will be kind of like a police helicopter following a car chase, or tracking a person stuck in some kind of disaster situation.

CptS 580: Reinforcement Learning Graduate or Undergradate && (CptS 450 || Matt’s permission) How can an agent learn to act in an environment? No absolute truth Trial and Error

Example: Mario Agent = Mario Actions = Right, Left, Jump, Run, Shoot fireball Reward = Score Learning = Maximize score State = ??? Can look at last year’s webpage for possible topics:

Reinforcement Learning Basic idea: – Receive feedback in the form of rewards – Agent’s utility is defined by the reward function – Must (learn to) act so as to maximize expected rewards

Sequential tasks How could we learn to play tic-tac-toe? – What do we want to optimize (i.e., what’s the reward)? – What are the actions? – How would we represent state? – How would we figure out how to act (to maximize the reward)?

Reinforcement Learning (RL) RL algorithms attempt to find a policy for maximizing cumulative reward for the agent over the course of the problem. Typically represented by a Markov Decision Process RL differs from supervised learning in that correct input/output pairs are never presented, nor sub- optimal actions explicitly corrected.

Grid World  The agent lives in a grid  Walls block the agent’s path  The agent’s actions do not always go as planned:  80% of the time, the action North takes the agent North (if there is no wall there)  10% of the time, North takes the agent West; 10% East  If there is a wall in the direction the agent would have been taken, the agent stays put  Small “living” reward each step  Big rewards come at the end  Goal: maximize sum of rewards*

Grid Futures 15 Deterministic Grid WorldStochastic Grid World X X E N S W X ? X XX

Markov Decision Processes An MDP is defined by: – A set of states s  S – A set of actions a  A – A transition function T(s,a,s’) Prob that a from s leads to s’ i.e., P(s’ | s,a) Also called the model – A reward function R(s, a, s’) Sometimes just R(s) or R(s’) – A start state (or distribution) – Maybe a terminal state – A discount factor: γ MDPs are a family of non- deterministic search problems – Reinforcement learning: MDPs where we don’t know the transition or reward functions 16

What is Markov about MDPs? Andrey Markov ( ) “Markov” generally means that given the present state, the future and the past are independent For Markov decision processes, “Markov” means:

Solving MDPs In deterministic single-agent search problems, want an optimal plan, or sequence of actions, from start to a goal In an MDP, we want an optimal policy  *: S → A – A policy  gives an action for each state – An optimal policy maximizes expected utility if followed – Defines a reflex agent Optimal policy when R(s, a, s’) = for all non-terminal s

Example Optimal Policies R(s) = -2.0 R(s) = -0.4 R(s) = -0.03R(s) =