Download presentation
Presentation is loading. Please wait.
Published byCory Johns Modified over 9 years ago
1
RADHA-KRISHNA BALLA 19 FEBRUARY, 2009 UCT for Tactical Assault Battles in Real-Time Strategy Games
2
Overview I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion
3
I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion
4
Domain RTS games Resource Production Tactical Planning Tactical Assault battles
5
RTS game - Wargus Screenshot of a typical battle scenario in Wargus
6
Planning problem Large state space Temporal actions Spatial reasoning Concurrency Stochastic actions Changing goals
7
I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion
8
Related Work Board games – bridge, poker, Go etc., Monte Carlo simulations RTS games Resource Production Means-ends analysis – Chan et al., Tactical Planning Monte Carlo simulations – Chung et al., Nash strategies – Sailer et al., Reinforcement learning – Wilson et al., Bandit-based problems, Go UCT – Kocsis et al., Gelly et al.,
9
Our Approach Monte Carlo simulations UCT algorithm Advantage Complex plans from simple abstract actions Exploration/Exploitation tradeoff Changing goals
10
I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion
11
Method Planning architecture UCT Algorithm Search space formulation Monte Carlo simulations Challenges
12
Planning Architecture Online Planner State space abstraction Grouping of units Abstract actions Join(G) Attack(f,e)
13
UCT Algorithm Exploration/Exploitation tradeoff Monte Carlo simulation – get subsequent states Search tree Root node – current state Edges – available actions Intermediate nodes – subsequent states Leaf nodes – terminal states Rollout-based construction Value estimates
14
UCT Algorithm – Pseudo Code 1 At each interesting time point in the game: build_UCT_tree(current state); choose argmax action(s) based on the UCT policy; execute the aggregated actions in the actual game; wait until one of the actions get executed; build_UCT_tree(state): for each UCT pass do run UCT_rollout(state); (.. continued)
15
UCT Algorithm – Pseudo Code 2 UCT_rollout(state): recursive algorithm if leaf node reached then estimate final reward; propagate reward up the tree and update value functions; return; populate possible actions; if all actions explored at least once then choose the action with best value function; else if there exists unexplored action choose an action based on random sampling; run Monte-Carlo simulation to get next state based on current state and action; call UCT_rollout(next state);
16
UCT Algorithm - Formulae Action Selection: Value Updation:
17
Search Space Formulation Abstract State Friendly and enemy groups Hit points Location Current actions Current time Calculation of group hit points: Calculation of mean location: centroid
18
Monte Carlo Simulations Domain-specific Actual game play – Wargus Join actions Attack actions Reward calculation – objective function Time Hit points Note: Partial simulations (time cutoff)
19
Domain-specific Challenges State space abstraction Grouping of units (proximity-based) Concurrency Aggregation of actions Join actions – simple Attack actions – complex (partial simulations)
20
Planning problem - revisited Large state space – abstraction Temporal actions – Monte Carlo simulations Spatial reasoning – Monte Carlo simulations Concurrency – aggregation of actions Stochastic actions – UCT (online planning) Changing goals – UCT (different objective functions)
21
I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion
22
Experiments # Scenario Name # of friendly groups Friendly groups composition # of enemy groups Enemy groups composition # of possible ‘Join’ actions # of possible ‘Attack’ actions Total # of possible actions 12vs22{6,6}2{5,5}145 23vs23{6,2,4}2{5,5}369 34vs2_14{2,4,2,4}2{5,5}6814 44vs2_24{2,4,2,4}2{5,5}6814 54vs2_34{2,4,2,4}2{5,5}6814 64vs2_44{2,4,2,4}2{5,5}6814 74vs2_54{2,4,2,4}2{5,5}6814 84vs2_64{2,4,2,4}2{5,5}6814 94vs2_74{3,3,6,4}2{5,9}6814 104vs2_84{3,3,3,6}2{5,8}6814 112vs4_12{9,9}4{4,5,5,4}189 122vs4_22{9,9}4{5,5,5,5}189 132vs4_32{9,9}4{5,5,5,5}189 142vs5_12{9,9}5{5,5,5,5,5}11011 152vs5_22{10,10}5{5,5,5,5,5}11011 163vs43{12,4,4}4{5,5,5,5}31215 Table 1: Details of the different game scenarios
23
Planners UCT Planners UCT(t) UCT(hp) Number of rollouts – 5000 Averaged over – 5 runs
24
Planners Baseline Planners Random Attack-Closest Attack-Weakest Stratagus-AI Human
25
Video – Planning in action Simple scenario Complex scenario
26
Results Figure 1: Time results for UCT(t) and baselines.
27
Results Figure 2: Hit point results for UCT(t) and baselines.
28
Results Figure 3: Time results for UCT(hp) and baselines.
29
Results Figure 4: Hit point results for UCT(hp) and baselines.
30
Results - Comparison Figures 1, 2, 3 & 4: Comparison between UCT(t) and UCT(hp) metrics Time resultsHit point results U C T (t) U C T (hp)
31
Results Figure 5: Time results for UCT(t) with varying rollouts.
32
I. Introduction II. Related Work III. Method IV. Experiments & Results V. Conclusion
33
Conclusion Hard planning problem Less expert knowledge Different objective functions Future Work Computational time – engineering aspects Machine Learning techniques Beyond Tactical Assault
34
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.