Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University.

Slides:

Advertisements

Similar presentations

Wall Building for RTS Games Patrick Schmid. Age of Empires.

Advertisements

Heuristic Search techniques

Psych 5500/6500 t Test for Two Independent Groups: Power Fall, 2008.

Conceptual Clustering

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Traveling Salesperson Problem

On-line learning and Boosting

Informed Search Methods How can we improve searching strategy by using intelligence? Map example: Heuristic: Expand those nodes closest in “as the crow.

Neural Networks  A neural network is a network of simulated neurons that can be used to recognize instances of patterns. NNs learn by searching through.

Types of Algorithms.

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

CSE 380 – Computer Game Programming Pathfinding AI

Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.

The adjustment of the observations

Multiagent Probabilistic Smart Terrain Dr. John R. Sullins Youngstown State University.

Artificial Intelligence in Game Design Introduction to Learning.

Planning under Uncertainty

Visual Recognition Tutorial

Assuming normally distributed data! Naïve Bayes Classifier.

Chapter 7 – K-Nearest-Neighbor

A Heuristic Bidding Strategy for Multiple Heterogeneous Auctions Patricia Anthony & Nicholas R. Jennings Dept. of Electronics and Computer Science University.

Motion Analysis (contd.) Slides are from RPI Registration Class.

2D1431 Machine Learning Boosting.

CSci 6971: Image Registration Lecture 4: First Examples January 23, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI Dr.

Ensemble Learning: An Introduction

Visual Recognition Tutorial

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Radial Basis Function Networks

Clustering Unsupervised learning Generating “classes”

PATTERN RECOGNITION AND MACHINE LEARNING

Artificial Intelligence in Game Design Lecture 22: Heuristics and Other Ideas in Board Games.

Simultaneous Localization and Mapping Presented by Lihan He Apr. 21, 2006.

Naive Bayes Classifier

1 Statistical Distribution Fitting Dr. Jason Merrick.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Artificial Intelligence in Game Design Dynamic Path Planning Algorithms.

Detection, Classification and Tracking in a Distributed Wireless Sensor Network Presenter: Hui Cao.

Probabilistic Smart Terrain Dr. John R. Sullins Youngstown State University.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.

Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.

Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.

1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.

CS 3343: Analysis of Algorithms Lecture 19: Introduction to Greedy Algorithms.

Artificial Intelligence in Game Design Influence Maps and Decision Making.

Probabilistic Smart Terrain Dr. John R. Sullins Youngstown State University.

Univariate Gaussian Case (Cont.)

Graph Search II GAM 376 Robin Burke. Outline Homework #3 Graph search review DFS, BFS A* search Iterative beam search IA* search Search in turn-based.

Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Univariate Gaussian Case (Cont.)

Traveling Salesperson Problem

Data Science Algorithms: The Basic Methods

Statistics in MSmcDESPOT

Simulation-Based Approach for Comparing Two Means

Artificial Intelligence Problem solving by searching CSC 361

Types of Algorithms.

Finding Heuristics Using Abstraction

K Nearest Neighbor Classification

Types of Algorithms.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

LECTURE 09: BAYESIAN LEARNING

Types of Algorithms.

Presentation transcript:

Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 2 Problem Definition Agent given a map of world Map gives locations where goals may possibly be Different categories of locations have different probabilities

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 3 Learned Probabilities Problem: Agent does not know these probabilities Agent must learn them from examples [a, b] of that category a i = number of past examples of category C i where goal has been present b i = number of past examples of category C i where goal has not been present

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 4 Learning with Costs Agent must physically move to a target to know whether it meets goal Cost usually proportional to distance traveled

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 5 Learning with Costs Knowledge gained by exploring target Cost of exploring target tradeoff Requires a rational strategy for exploration

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 6 Outline Learning as reducing future costs Beta functions and probabilistic smart terrain Defining an information gain function –Estimating extra distances traveled due to errors –Factoring in category prevalence Creating an influence map for agent movement Benchmark and empirical testing

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 7 Exploration Strategy Main idea: Exploration now reduces travel time in future –t 1 is instance of category C 1 with prior knowledge [a 1, b 1 ] –t 2 is instance of category C 2 with prior knowledge [a 2, b 2 ] Agent t1t1 t2t2 d d

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 8 Value of Information Rational action: Move to target in more probable category first Problem: Agent must estimate probabilities from examples Fewer examples  Greater likelihood estimate wrong Agent t1t1 t2t2 d d

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 9 Value of Information Probabilities estimated from limited data: p 1 estimate = 0.15 p 2 estimate = 0.2 –Agent will move towards t 2 Suppose actual probabilities different: p 1 actual = 0.25 p 2 actual = 0.1 Would have been better to move to t 1 first Agent t1t1 t2t2

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 10 Value of Information Agent will have to backtrack to t 1 if goal not met by t 2 Expected distance traveled will be greater than if moved towards t 1 first Better estimates of probabilities  less travel time Agent t1t1 t2t2

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 11 Outline Learning as reducing future costs Beta functions and probabilistic smart terrain Defining an information gain function –Estimating extra distances traveled due to errors –Factoring in category prevalence Creating an influence map for agent movement Benchmark and empirical testing

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 12 Beta Distribution Estimate of probability category meets goal given examples [a, b] of category beta[a, b](  ) = α  a -1  b -1 “Liklihood” the actual probability is  given [a, b] Best estimate of actual probability = Exp(beta[a, b](  ) )

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 13 Beta Distribution “Narrows” as more examples explored More examples  less error in estimate of 

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 14 Probabilistic Smart Terrain Agent movement in worlds where targets have probability of meeting goal –p i : probability target i meets goal –d i : distance (in moves) from agent to target i –Based on targets within d max moves For each adjacent tile, computes expected distance to some target that meets goal

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 15 Probabilistic Smart Terrain Expected number of moves character must travel from x to target that meets goal d max Dist(x) = Σ  (1 – p i ) d d i < d Probability no target within d moves of x meets goal (assumption of conditional independence) Summed over all distances up to some maximum d max (otherwise sum could be infinite)

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 16 Probabilistic Smart Terrain Compute expected distance Dist(x) for all tiles x Agent moves to adjacent tile with lowest Dist(x)

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 17 Outline Learning as reducing future costs Beta functions and probabilistic smart terrain Defining an information gain function –Estimating extra distances traveled due to errors –Factoring in category prevalence Creating an influence map for agent movement Benchmark and empirical testing

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 18 Simple Two-target Case Simple case where agent must “choose” between two targets to explore –t i is instance of category C i with prior knowledge [a i, b i ] –t J is instance of category C j with prior knowledge [a J, b J ] Targets equidistant at distance d d is average distance between targets in world Agent titi tJtJ d d

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 19 Estimating Distance Traveled Assume t i has higher estimated probability ( Exp(beta[a i, b i ](  i ) ) > Exp(beta[a j, b j ](  j ) ) Expected distance traveled: Dist(  i,  J ) = d + 2d(1 -  i ) + (d max - 3d) (1 -  i ) (1-  J ) Agent titi tJtJ d d Move to t i Backtrack to t j if t i does not meet goal Case where neither t i nor t j meet goal

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 20 Defining an Error Function  i,  J may take on many values –Likelihood of a particular  defined by beta(  )[a, b] Moving to t i first is error in cases where  i <  J CiCi CJCJ  i i  J J

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 21 Defining an Error Function Amount of error for given (  i,  J ) defined as Err Dist (  i,  J ) = Dist(  i,  J ) - Dist(  J,  i ) = 2d(  J -  i ) if  J >  i 0 otherwise Expected distance if move to t i first Expected distance if move to t j first

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 22 Defining an Error Function Error weighted by likelihood of  i,  J (as defined by beta function) Err Pair ([a i, b i ], [a J, b J ]) =  0  0 Err Dist (  i,  J ) beta[a i, b i ](  i ) beta[a J, b J ](  J )  i  J Total error possible given these examples of C i and C j Summed over all possible combinations of  i,  J weighted by their likelihoods

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 23 Value of Information Additional values of [a, b] narrow the beta distributions Narrow distributions allow less error P(  i <  J ) much smaller CJCJ CiCi

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 24 Value of Information Categories with similar [a, b] may still overlap However,  i and  j will likely be similar even if  i <  j Err Dist (  i,  j ) will be very small CjCj CiCi  i i  j j

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 25 Outline Learning as reducing future costs Beta functions and probabilistic smart terrain Defining an information gain function –Estimating extra distances traveled due to errors –Factoring in category prevalence Creating an influence map for agent movement Benchmark and empirical testing

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 26 Category Prevalence Prioritize instances of more prevalent categories –t i  category C i with | C i | instances in world –t J  category C J with | C J | instances in world –| C i | >> | C J | (many more instances of C i ) More benefit to be gained by exploring t i Agent titi tJtJ

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 27 Category Pair Likilihood Agent is between two targets in different categories What is likelihood those categories are C i and C j ? L(C i, C j ) = | C i | | C J | + | C i | | C J | | C total | ( | C total | - | C j | ) | C total | ( | C total | - | C J | ) C total = total number of targets in all categories Agent titi tJtJ d d

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 28 Category Error Measure Total error measure for category C i based on relationship to all other categories C J : –Error Err Pair ([a i, b i ], [a J, b J ]) relative to that category (based on overlap of their beta functions) –Likelihood L(C i, C J ) agent must choose between two targets in those categories Err Cat (C i, [a i, b i ]) =  Err Pair ([a i, b i ], [a J, b J ]) L(C i, C J ) i ≠ J

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 29 Defining Information Gain Information gain from exploring instance of C i  How incrementing [a i, b i ] would decrease Err Cat (C i, [a i, b i ]) by narrowing the beta function Gain(C i, [a i, b i ]) ) = Err Cat (C i, [a i, b i ]) – Err Cat (C i, [a i ′, b i ′]) Current error before target explored Estimated error if target were explored

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 30 Defining Information Gain Problem: Do not know whether given target meets goal until explored –Do not know whether it would increment a i or b i Solution: Estimate from current expected value Exp(beta[a i, b i ](  i )) [a i ′, b i ′] = [a i + Exp(beta[a i, b i ](  i )), b i + (1 - Exp(beta[a i, b i ](  i )))]

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 31 Example of Information Gain Example: Information gain for [2, 6] and [4, 4] –Same prevalence, average distance = 10 New Examples Category [4, 4] Category [2, 6]

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 32 Prior Category Knowledge More existing examples  Less valuable future examples become Preference given to categories about which less is known

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 33 Outline Learning as reducing future travel costs Beta functions and probabilistic smart terrain Defining an information gain function –Estimating extra distances traveled due to errors –Factoring in category prevalence Creating an influence map for agent movement Benchmark and empirical testing

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 34 Influence Maps Targets influence nearby agents –Influence = information gain of target category Influence decreases with distance from target Agent moves in direction of increasing influence

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 35 Falloff Function Inverse function used to decrease influence over distance Influence(t) = Gain(C i, [a i, b i ])) 1 + t / d t = distance in tiles d = average distance between targets

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 36 Combining Influences Question: How should influences from multiple targets be combined Goal: Prioritize exploring groups of targets –|Ci| ≈ |Cj| ≈ |Ck|–|Ci| ≈ |Cj| ≈ |Ck| –| [a i, b i ] | ≈ | [a j, b j ] | ≈ | [a k, b k ] | Can quickly explore both t i and t k by moving left Agent titi tjtj tktk Prior information and prevalence similar

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 37 Additive Combined Influences Influences from targets in different categories added to compute total influence at a tile Inverse falloff function chosen to minimize possibility of local maxima in influence map

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 38 Influences in Single Category Information gain decreases for each target explored in same category Decrease must be factored into influence map Agent ti1ti1 ti3ti3 ti2ti

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 39 Computing Total Influence Influence at tile t from all targets in category C i : TotalInfluence(t, i) =  Gain(C i, [a i, b i ], k) k 1 + t k / d –t k = distance to k th nearest target –Gain(C i, [a i, b i ], k) = expected information gain from k th example Influence at tile t from targets in all categories: TotalInfluence(t) =   Gain(C i, [a i, b i ], k) i k 1 + t k / d

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 40 Updating the Influence Map Influence map computed for all tiles in area of agent Agent moves in direction of increasing influence until some target t i reached Agent determines whether target meets goal, and either increments a i or b i for category C i Information gain recomputed for all categories Influence map recomputed (with t i removed)

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 41 Updating the Influence Map

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 42 Outline Learning as reducing future travel costs Beta functions and probabilistic smart terrain Defining an information gain function –Estimating extra distances traveled due to errors –Factoring in category prevalence Creating an influence map for agent movement Benchmark and empirical testing

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 43 Prior Knowledge Benchmark Instance of category with knowledge [1, 2] Instance of category with knowledge [2, 4] –Category prevalence similar Agent should move towards instance of category with less knowledge

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 44 Category Prevalence Benchmark Instance of category with two instances Instance of category with single instance –Prior knowledge of both = [1, 2] Agent should move towards instance of category with greater prevalence

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 45 Much Closer Distance Benchmark Knowledge = [10, 15] and prevalence = 7 Knowledge = [8, 12] and prevalence = 8 Even though further target has better information gain and prevalence, agent should move towards significantly closer targets

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 46 Large-scale Testing 30 x 20 world (with obstacles) 4 categories of targets Targets placed randomly for each trial Probability tile contains target = 0.05

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 47 Category Data Preva- lence Actual probability Prior knowledge [a, b] A0.20.1[10, 90] B0.20.1[1, 3] C [1, 5] D [25, 75] High priority due to information gain Somewhat high priority due to category prevalence

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 48 Importance of Learning Limited category data can cause errors in estimated probabilities This can lead to incorrect decisions about which target to move to next Actual PPrior knowledge A 0.1[10, 90] B 0.1[1, 3] C 0.25[1, 5] D 0.25[25, 75] Overestimates probability of B – moves towards instances too often Underestimates probability of C – ignores instances too often

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 49 Does the Learning Strategy Work? 100 trials with targets randomly placed For each trial, agent given 50 moves for learning –Influence map generated –Agent followed influence map to target –Actual probabilities used to update [a, b] for that category –Information gains updated and map recomputed Question: Which categories were explored most?

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 50 Does the Learning Strategy Work? Average number of each category explored per trial: CategoryAverage explored per trial A1.17 B2.52 C3.66 D2.10 Greater information gain Higher prevalence

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 51 Is the Learning Strategy Useful? Does the information gain strategy reduce future search time for targets that meet goals? Comparison of results to simpler “naïve” strategy –During learning phase, simply move to closest target instead of computing information gains

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 52 Training and Testing Training phase: –Learning strategy (information gain or naïve) used to move agent 50 moves –Each time target in category C i reached, update its [a i, b i ] based on actual category probabilities –Product of learning: estimated probabilities p i for each category computed as Exp(beta[a i, b i ](  i )

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 53 Training and Testing Testing Phase: –Agent placed at every location in world (536 non-wall tiles) –Existing probabilistic smart terrain algorithm used to search for a target that meets goal from that point Based on estimated probabilities from training phase Question: How many moves were required on average to find a goal?

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 54 Results of Testing 100 trials using both naïve and information gain learning Information gain learning focused on categories about which less was known (B and C) More accurate estimated probabilities Less travel time due to moving to wrong targets StrategyAverage tiles explored until goal found Information gain Naive 6.473

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 55 Ongoing Work Learning while acting to meet goals –Agent must meet current needs (which presumably have some urgency) –Agent must also explore to learn knowledge to better meet future needs Tradeoff Costs of not meeting current needs while exploring Costs of extra travel in future if exploration not done now

John Sullins Youngstown State University Exploration Strategies for Learned Probabilities in Smart Terrain 56 Ongoing Work Learning in hierarchical worlds –Agent does not know exact location of all targets –Agent only knows expected number in a given region –Will not know what region actually contains until move to it A Exp(C 1 ) = 3.2 Exp(C 2 ) = 2.4 Exp(C 1 ) = 1.7 Exp(C 2 ) = 4.5 ? ?

Exploration Strategies for Learned Probabilities in Smart Terrain Dr. John R. Sullins Youngstown State University