By: Kenny Raharjo 1. Agenda Problem scope and goals Game development trend Multi-armed bandit (MAB) introduction Integrating MAB into game development.

Slides:

Advertisements

Similar presentations

TAU Agent Team: Yishay Mansour Mariano Schain Tel Aviv University TAC-AA 2010.

Advertisements

Questions?. Setting a reward function, with and without subgoals Difference between agent and environment AI for games, Roomba Markov Property – Broken.

1 Monte-Carlo Planning: Introduction and Bandit Basics Alan Fern.

Extraction and Transfer of Knowledge in Reinforcement Learning A.LAZARIC Inria “30 minutes de Science” Seminars SequeL Inria Lille – Nord Europe December.

Let’s Play! Mobile Health Games for Adults UbiComp’10.

Mortal Multi-Armed Bandits Deepayan Chakrabarti,Yahoo! Research Ravi Kumar,Yahoo! Research Filip Radlinski, Microsoft Research Eli Upfal,Brown University.

The Value of Knowing a Demand Curve: Regret Bounds for Online Posted-Price Auctions Bobby Kleinberg and Tom Leighton.

K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?

Reinforcement Learning Yishay Mansour Tel-Aviv University.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

DR. DAWNE MARTIN MKTG 241 FEB. 15, 2011 Marketing Research.

Search and Planning for Inference and Learning in Computer Vision

Reinforcement Learning Evaluative Feedback and Bandit Problems Subramanian Ramamoorthy School of Informatics 20 January 2012.

Reynold Cheng†, Eric Lo‡, Xuan S

Simulation Selection Problems: Overview of an Economic Analysis Based On Paper By: Stephen E. Chick Noah Gans Presented By: Michael C. Jones MSIM 852.

Introduction Many decision making problems in real life

Upper Confidence Trees for Game AI Chahine Koleejan.

Probabilistic Results for Mixed Criticality Real-Time Scheduling Bader N. Alahmad Sathish Gopalakrishnan.

Idea1 : Net Aooni Arcade Idea2 : Shooting Arcade Project Brainstorming Computer Game 2011 Fall ♣ Lee Sang Min.

Hypothesis Testing.  Select 50% users to see headline A ◦ Titanic Sinks  Select 50% users to see headline B ◦ Ship Sinks Killing Thousands  Do people.

© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.

Reinforcement Learning Yishay Mansour Tel-Aviv University.

1 Monte-Carlo Planning: Policy Improvement Alan Fern.

COMP 2208 Dr. Long Tran-Thanh University of Southampton Bandits.

Patch Scheduling for On-line Games Chris Chambers Wu-chang Feng Portland State University.

Distributed Learning for Multi-Channel Selection in Wireless Network Monitoring — Yuan Xue, Pan Zhou, Tao Jiang, Shiwen Mao and Xiaolei Huang.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Multi-armed Bandit Problems WAIM 2014.

SPARK MARKETING | 2015 HOWDY, PARTNER. Mobile Search Optimization: Turning Neighbors Into Patients.

Mazetool Analytics Project 2 IMGD 2905.

A Conceptual Design of Multi-Agent based Personalized Quiz Game

Official App 1 Talking Stick Resort.

Figure 5: Change in Blackjack Posterior Distributions over Time.

Gamifying Citation Management: A Mendeley Relay Race

Pre-writing topic discussion: Speak to your partner

Done Done Course Overview What is AI? What are the Major Challenges?

Reinforcement Learning (1)

Reinforcement learning (Chapter 21)

Making Your Awesome Game Even Better

Reinforcement Learning

Voice User Interface for Flash Card on Android Device

Feedback-Aware Social Event-Participant Arrangement

Adaptive Ambulance Redeployment via Multi-armed Bandits

A special case of calibration

Mazetool Analytics Project 2 IMGD 2905.

Tuning bandit algorithms in stochastic environments

CS 188: Artificial Intelligence

Reinforcement Learning

Announcements Homework 3 due today (grace period through Friday)

Back to Table of Contents

Presented By Aaron Roth

CAP 5636 – Advanced Artificial Intelligence

Instructors: Fei Fang (This Lecture) and Dave Touretzky

CASE − Cognitive Agents for Social Environments

Scottish Improvement Skills

The use of gambling Cheating Playing Cards devices in Chennai is able to give you victory in the casino. The games of the casino are really helpful for.

Chapter 2: Evaluative Feedback

Approaching an ML Problem

Reinforcement Learning

Introduction to Reinforcement Learning and Q-Learning

Mazetool Analytics Project 2 IMGD 2905.

Software Name (Function Type)

CLASSIC BOARD GAME LUDO CHAT Dice game Ludo chat online is the new entertaining gameLudo chat.

Mobile App Vs Responsive Website: Best Option For E-Commerce

Objectives To understand what makes a good game

Chapter 2: Evaluative Feedback

Markov Decision Processes

Markov Decision Processes

Reinforcement Learning

Presentation transcript:

By: Kenny Raharjo 1

Agenda Problem scope and goals Game development trend Multi-armed bandit (MAB) introduction Integrating MAB into game development Project finding and results 2

Background Video Games rapidly developing Current generation have shorter attention span Thanks technology! Easily give up /frustrated on games. Too hard Repetitive Boring Challenge: Develop engaging games Use MAB to optimize happiness Player satisfaction -> Revenue 3

Why Game? People play video games to : Destress Pass time Edutainment (Educational Entertainment) Self – esteem boost 4

Gaming Trend Technology advancing rapidly Gaming has become a “norm” Smartphone availability All ages 32% of phone time: games Nintendo Console -> Mobile 5

Gaming Trend Offline games: hard to collect data and fix bugs. Glitches and bugs will stay. Developers will pay. Online games: easy data collection and bug patches Bug fixes and Downloadable Content (DLC) Consumers: short attention span & picky. < Goldfish attention span Complain if game is bad Solution? 6

Multi-Armed Bandit Origin: Gambler in casino want to maximize winnings by playing slot machines Balance exploration and exploitation Objective: Given a set of K distinct arms, each with unknown reward distribution, find the maximum sum of rewards. Example: 3 slot machines 7

Multi-Armed Bandit Example: Restaurant example Arms: Restaurants Reward: Happiness Should I: Go to a new place to try? Stick with best option? 8

Multi-Armed Bandit Arms: Parameter choices What you want to optimize Rewards: observed outcome What you want to maximize Might be a quantifiable value Or a measure of happiness (Opportunity cost) Regret: Additional reward if started with optimal choice. What you want to minimize MAB algorithms guarantee minimal regret 9

Multi-Armed Bandit Bandit algorithm: basic idea 10

Multi-Armed Bandit Bandit algorithm: ε-greedy P(explore choices) = ε P(exploit best choice so far) = 1 – ε Guarantee random behavior Common practice: Initial data (seed) 11

Multi-Armed Bandit Pros: Re-evaluate bad draws No supervision needed Cons: Result focused Does not explain trends 12

Applying MAB to game development Dynamically changing content Infinite Mario Mario Kart MAB constantly improve game (difficult or design) over time Maximize Satisfaction Happy customers -> $$ 13

Study Outline Simple, short retro RPG game (5-7 minutes) Arms: different skin (aesthetic) Reward: Game time Compute optimal skin One skin per game Data saved in a database 14

Result and Summary ε-Greedy algorithm Plot Frequency VS Iteration # 42 data points Winner: Variant 0 (Forest) All options re-evaluated Could Improve by: More samples Reward as function of time and score 15

Thank You! Q&A Contact: 16