Experiments & Statistics. Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design.

Slides:



Advertisements
Similar presentations
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
Advertisements

Objectives 10.1 Simple linear regression
Hypothesis Testing When “p” is small, we reject the Ho.
Section 6-4 Sampling Distributions and Estimators.
General Statistics Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Statistics Versus Parameters
Sampling: Final and Initial Sample Size Determination
Inference about the Difference Between the
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
1. Estimation ESTIMATION.
Evaluating Hypotheses
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Experimental Evaluation
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
Inferential Statistics
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
General Statistics Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) Each has some error or uncertainty.
Vegas Baby A trip to Vegas is just a sample of a random variable (i.e. 100 card games, 100 slot plays or 100 video poker games) Which is more likely? Win.
AP Statistics Section 11.1 A Basics of Significance Tests
1 Psych 5500/6500 t Test for Two Independent Means Fall, 2008.
Essential Statistics Chapter 131 Introduction to Inference.
INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1.
CHAPTER 14 Introduction to Inference BPS - 5TH ED.CHAPTER 14 1.
2.6 Confidence Intervals and Margins of Error. What you often see in reports about studies… These results are accurate to within +/- 3.7%, 19 times out.
AP Statistics Section 13.1 A. Which of two popular drugs, Lipitor or Pravachol, helps lower bad cholesterol more? 4000 people with heart disease were.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Hypothesis Testing A procedure for determining which of two (or more) mutually exclusive statements is more likely true We classify hypothesis tests in.
1 1 Slide Chapter 11 Comparisons Involving Proportions n Inference about the Difference Between the Proportions of Two Populations Proportions of Two Populations.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Introduction  Populations are described by their probability distributions and parameters. For quantitative populations, the location and shape are described.
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
MATH 2400 Ch. 15 Notes.
PPAL Intro to Inference Chapter 14 and 15 (16 is review) March 8-9, 2011 (revised March 8 22:30)
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
CONFIDENCE STATEMENT MARGIN OF ERROR CONFIDENCE INTERVAL 1.
Chapter 3: Statistical Significance Testing Warner (2007). Applied statistics: From bivariate through multivariate. Sage Publications, Inc.
 The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are.
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Lecture 11 Dustin Lueker.  A 95% confidence interval for µ is (96,110). Which of the following statements about significance tests for the same data.
Warm Up 1. The probability of getting the numbers 1,2,3,4 out of hat are 3/8, 3/8,1/8,1/8. Construct a probability distribution (table) for the data and.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Hypothesis Testing and Statistical Significance
More on Inference.
CHAPTER 9 Testing a Claim
Experiments, Simulations Confidence Intervals
Inference for Proportions
Unit 5: Hypothesis Testing
Warm Up Check your understanding P. 586 (You have 5 minutes to complete) I WILL be collecting these.
Hypothesis Testing Is It Significant?.
More on Inference.
Stat 217 – Day 28 Review Stat 217.
CHAPTER 22: Inference about a Population Proportion
CHAPTER 9 Testing a Claim
Essential Statistics Introduction to Inference
Significance Tests: The Basics
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern April 9, 2018.
Inference on the Mean of a Population -Variance Known
CHAPTER 9 Testing a Claim
Chapter Outline Inferences About the Difference Between Two Population Means: s 1 and s 2 Known.
CHAPTER 9 Testing a Claim
Chapter 9: Significance Testing
Presentation transcript:

Experiments & Statistics

Experiment Design Playtesting Experiments don’t have to be “big”--many game design experiments take only 30 minutes to design and conduct, and the results are obvious Two approaches: Measure a Quantity Test a Hypothesis (Can do both in the same experiment) Experiments are much weaker than proofs!

Control Group Establish a baseline Detect any outside factors that might influence the experiment e.g., location, testing process itself, temperature, day of week, recent events

Countering Bias Your bias: Predict and then test against new data, don’t just fit a theory to existing data Sample bias: Did you select playtesters who actually represent your target market? Is your experiment designed to reveal their true preferences? (beware of incenting them to “make you happy” or to seek outcomes that they don’t actually desire) Did you prevent them from “cheating”? Community bias: anonymous (blind) reviews

Measurement (and Statistics)

Example: Measuring Time Play N turns of a game, measuring the time per turn We can now predict how long the game will run without further testing, even after we change the rules. (How large should N be?)

Accuracy vs. Precision Experiments estimate values; they are never exact Accuracy is how close your measurement is to the true value (significant digits) Precision is the number of decimal places in your measurement

Population vs. Sample Population statistics (truth): μ = Mean (“average” or “expected value”) σ = Standard deviation Sample statistics (measured): N = Number of samples m = Mean s = Sample deviation Note the n-1 where you expected to see n

Is the Mean Accurate?N95%99% s9.925 s s5.841 s s4.604 s s3.250 s s2.861 s s2.680 s s2.626 s t distribution Let N = sample size Let m = sample average Let s 2 = sample variance Assume normal distribution For N = 10, the true population mean is on the interval: m ± s with 99% probability.

Exercise Experimental Results*: Played N = 20 turns of Carcassonne Average turn time was m = 20 seconds Sample deviation was s = 1.9 What range are you 95% confident contains the true mean? 95% Confidence Interval: m ± s Conclusion: More than 95% confident that the true average turn time is between 16 and 24 seconds Sample Times: *Artificial Results to make computation easier

Extrapolation We usually want to measure a relatively small fraction of the population and then generalize, e.g., political polling data. Any Distribution: At least (1-1/k 2 )*100% of the values are within μ ± kσ. (Chebyshev’s Inequality) Normal Distribution: See table. kPercent within μ ± kσ Normal (=)Any Distribution (≥) 168%0% 295%75% 399.7%89% %94% %97%

Is the Variance Accurate? The previous slide assumed that we knew the population variables μ and σ! We know how to tell if m is accurate... But is s accurate? Good question. In this class, we’ll just assume that it is...

Exercise We estimated that for Carcassonne, the turn time was m = 20 with s = 1.9. There are 71 turns in the game. Assume turns times are normally distributed. How many turns per game do you expect to take more than 22 seconds? What is the range of total play times you expect for 99.9% of all games? 68% within [18, 22] 32% outside [18, 22] Half of the 32% are on the high side 16% chance of one turn running long Conclusion: 71 turns * 16% ≈ 11 turns m game = 71 * m = 71 * 20 seconds = 1,380 seconds = 23 minutes s game 2 = 71 * s 2 = 71 * ; s game = 16 seconds Normal distribution, so 99.7% within 3 standard deviations (48 seconds) Conclusion: About 99.9% of games within minutes.

Hypothesis Testing 1. Form a hypothesis 2. Design an experiment to test Analyze the statistical validity of the test 3. Run the experiment 4. Evaluate results 5. (often...go back to step 1)

Objective and Quantitative Bad! “People played our game and said that it was fun, therefore it was engaging.” Better “On average, our game was 2nd in a ranking from `most fun’ to `least fun’ of ten other commercial games in a survey of 100 players. 20% of subjects rated our game #1 ” Good “100 subjects were randomly assigned to play our game or a hand-made version of Pit. They then decided individually which game to play again. 82% of respondents chose to play our game, so we conclude that it is about 4 times more engaging than Pit.”

Exercises “Our new rules increased engagement in the game.” “The chance of drawing an unplayable tile in Carcassonne is less than 0.1%.” “Experienced players usually choose the highest resource intersection first and then maximize resource distribution second in Settlers of Catan.” “In Guitar Hero, the intro for More Than a Feeling is harder than the chorus for most players.” Design experiments to test the following hypotheses:

Exercises “Our new rules increased engagement in the game.” “The chance of drawing an unplayable tile in Carcassonne is less than 0.1%.” “Experienced players usually choose the highest resource intersection first and then maximize resource distribution second in Settlers of Catan.” “In Guitar Hero, the intro for More Than a Feeling is harder than the chorus for most players.” Design experiments to test the following hypotheses: