Factors Affecting Diminishing Returns for Searching Deeper Matej Guid and Ivan Bratko CGW 2007.

Slides:



Advertisements
Similar presentations
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
Advertisements

Chapter 7 Hypothesis Testing
Adversarial Search We have experience in search where we assume that we are the only intelligent being and we have explicit control over the “world”. Lets.
Learning Positional Features for Annotating Chess Games: A Case Study Matej Guid, Martin Mozina, Jana Krivec, Aleksander Sadikov and Ivan Bratko CG 2008.
Sta220 - Statistics Mr. Smith Room 310 Class #14.
A Sampling Distribution
Computer analysis of World Chess Champions Matej Guid and Ivan Bratko CG 2006.
Psych 5500/6500 The Sampling Distribution of the Mean Fall, 2008.
CS 484 – Artificial Intelligence
Hoe schaakt een computer? Arnold Meijster. Why study games? Fun Historically major subject in AI Interesting subject of study because they are hard Games.
Artificial Intelligence for Games Game playing Patrick Olivier
Artificial Intelligence in Game Design Heuristics and Other Ideas in Board Games.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Introduction to Hypothesis Testing
Writing the Research Report The purpose of the written report is to present the results of your research, but more importantly to provide a persuasive.
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Introduction to Hypothesis Testing
2-1 Sample Spaces and Events Conducting an experiment, in day-to-day repetitions of the measurement the results can differ slightly because of small.
Meta Analysis & Programmatic Research Re-introduction to Programmatic research Importance of literature reviews Meta Analysis – the quantitative contribution.
Chapter Sampling Distributions and Hypothesis Testing.
8-2 Basics of Hypothesis Testing
Experimental Evaluation
Personality, 9e Jerry M. Burger
Using Statistics in Research Psych 231: Research Methods in Psychology.
The Sampling Distribution Introduction to Hypothesis Testing and Interval Estimation.
Writing an Undergraduate Criminology Thesis at Penn John M. MacDonald Undergraduate Chair.
Chapter 9 Numerical Integration Numerical Integration Application: Normal Distributions Copyright © The McGraw-Hill Companies, Inc. Permission required.
CSC 412: AI Adversarial Search
Lecture Slides Elementary Statistics Twelfth Edition
Hypothesis Testing II The Two-Sample Case.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Chapter 11: Estimation Estimation Defined Confidence Levels
A Sampling Distribution
Estimation of Statistical Parameters
Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Chapter 1: Research Methods
Othello Artificial Intelligence With Machine Learning
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Timothy Reeves: Presenter Marisa Orr, Sherrill Biggers Evaluation of the Holistic Method to Size a 3-D Wheel/Soil Model.
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Research and Statistics in Psychology Wadsworth, a division of Thomson Learning.
Psychological Research Strategies Module 2. Why is Research Important? Gives us a reliable, systematic way to consider our questions Helps us to draw.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Unit 1 Sections 1-1 & : Introduction What is Statistics?  Statistics – the science of conducting studies to collect, organize, summarize, analyze,
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
The Impact of Student Self-e ffi cacy on Scientific Inquiry Skills: An Exploratory Investigation in River City, a Multi-user Virtual Environment Presenter:
1 ES Chapter 11: Goals Investigate the variability in sample statistics from sample to sample Find measures of central tendency for sample statistics Find.
Game tree search Chapter 6 (6.1 to 6.3 and 6.6) cover games. 6.6 covers state of the art game players in particular. 6.5 covers games that involve uncertainty.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Research Word has a broad spectrum of meanings –“Research this topic on ….” –“Years of research has produced a new ….”
Othello Artificial Intelligence With Machine Learning Computer Systems TJHSST Nick Sidawy.
Adversarial Search Chapter 6 Section 1 – 4. Games vs. search problems "Unpredictable" opponent  specifying a move for every possible opponent reply Time.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Definition Slides Unit 2: Scientific Research Methods.
Definition Slides Unit 1.2 Research Methods Terms.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 8 1 MER301: Engineering Reliability LECTURE 8: Chapter 4: Statistical Inference,
Chapter 9 Introduction to the t Statistic
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Lecture Slides Elementary Statistics Twelfth Edition
Fighting Knowledge Acquisition Bottleneck
Influence of Search Depth on Position Evaluation
Statistical Data Analysis
8.1 Sampling Distributions
Overview and Basics of Hypothesis Testing
Designing Experiments
Chapter 18 The Binomial Test
1 Chapter 8: Introduction to Hypothesis Testing. 2 Hypothesis Testing The general goal of a hypothesis test is to rule out chance (sampling error) as.
Presentation transcript:

Factors Affecting Diminishing Returns for Searching Deeper Matej Guid and Ivan Bratko CGW 2007

Introduction  Deep search behaviour and diminishing returns for additional search in chess have been burning issues in the game-playing scientific community  Two different approaches took place in the rich history of research on this topic: self-play and go-deep. Self-play experiments  Two otherwise identical programs are matched with one having a handicap.  Usually the handicap is search depth. Go-deep experiments  Best move changes resulting from different search depths of a set of positions are observed.

Go-deep approach  Go-deep experiments were introduced for determining the expectation of a new move being discovered by searching one ply deeper.  Based on Newborn's discovery: the results of self-play experiments are closely correlated with the rate of best move changes. Newborn’s hypothesis (1985)  RI (d+1) – the rating improvement with increasing search depth by one ply  BC (d+1) – the expectation of finding a best move at the next ply  Although there were some objections about the above equation, determining best move changes were consistently used in several experiments.

Go-deep experiments  In 1997, Phoenix (Schaeffer) and The Turk (Junghanns et al.) were used to record best move changes at iteration depths up to 9 plies.  In the same year, Hyatt and Newborn let Crafty search up to 14 plies.  Heinz (1998) repeated their go deep experiment with DarkThought. Diminishing returns for additional search effort  All these experiments were performed on somehow limited datasets.  They did NOT provide any conclusive empirical evidence that the best move changes decrease continuously with increasing search depth.

Search and Knowledge  An interesting go-deep experiment was performed by Sadikov and Bratko in  Very deep searches were made possible by concentrating on chess endgames with limited number of pieces.  The results confirmed the existence of diminishing returns in chess.  More importantly: they showed that the amount of knowledge a program has influences when diminishing returns will start to manifest themselves.

Going deeper  Remarkable follow-up on previous work done on deep search behaviour using chess programs was published in 2005 by Steenhuisen.  Crafty was used to repeat go-deep experiment on positions taken from previous experiments to push the search horizon to 20 plies.  Also a set of positions were used for searching up to 18 plies.  The results showed that the chance of new best moves being discovered: decreases exponentionally when searching to higher depths, decreases faster for positions closer to the end of the game.  Steenhuisen also reported that the speed with which the best-change rate decreases depends on the test set used.

Different test sets – different results  How can one rely on statistical evidence from different go-deep experiments, if they obviously depend on the dataset used?  We address this issue, and investigate the hypothesis that the rate at which returns diminish depends on the value of the position. Diminishing returns revisited again  A large dataset of more than 40,000 positions taken from real games has been used.  Go-deep experiments with programs Crafty and Rybka were conducted.  We show that the chance of new best moves being discovered at higher depths depend on: the values of positions in the dataset, the quality of evaluation function of the program used, and to some extent also on the phase of the game, and the amount of material on the board.

Go-deep design  Chess programs Crafty and Rybka were used to analyse more than positions from real games played in World Championship matches.  Each position occurring in these games after move 12 was searched to plies ranging from 2 to 12.  For the measurements done we use the same definitions as provided by Heinz and Steenhuisen: Best Change Fresh Best (d-2) Best (d-3) Best B(d) ≠ B(d - 1) B(d) ≠ B(j) j < d B(d) = B(d - 2) and B(d) ≠ B(d - 1) B(d) = B(d - 3) and B(d) ≠ B(d - 2) and B(d) ≠ B(d - 1)  The estimated probabilities (in %) have been obtained for each measurement of best change.  In each experiment, the original test set was divided into subsets, based on the values of positions.

Crafty goes deep  Several researchers have used Crafty for their go-deep experiments. However, none had such a large set of test positions at disposal.  Steenhuisen observed deep-search behaviour of Crafty on different test sets and reported different best change rates and best change rate decreases for different test sets. Division into subsets  We divided the original test set into six subsets, based on the values of positions.  Evaluations obtained at depth 12 served as the best possible approximations of the “real” values of positions. Group Evaluation(x)x<-2-2≤x<-1-1≤x<00≤x<11≤x<2x≥2 Positions4,0113,57110,16918,0386,0086,203

Best Change and Fresh Best behaviour (1) Search depth Best Change in % (SE) Fresh Best in % (0.36) (0.35) (0.35) (0.35) (0.34) (0.34) (0.33) (0.33) (0.32) (0.31)32.26  Results of Crafty for the approximately equal positions of Group 4. The rates for Fresh Best are given as conditional to the occurrence of Best Change.  Both Best Change and Fresh Best rates decrease consistently with increasing search depth.

Best Change and Fresh Best behaviour (2) Search depth Best Change in % (SE) Fresh Best in % (0.61) (0.59) (0.58) (0.56) (0.56) (0.54) (0.53) (0.51) (0.49) (0.49)29.27  Results of Crafty for the won positions of Group 6.  The Best Change and Fresh Best rates again decrease consistently with increasing search depth. However, they decrease faster than in the subset of approximately equal positions.

Rybka goes deep  Rybka is currently the strongest chess program according to the SSDF rating list, more than 250 higher rated than Crafty. The results  Confirm that best-change rates depend on the values of positions.  Demonstrate that the chance of new best moves being discovered at higher depths is lower at all depths compared to Crafty Group Evaluation(x)x<-2-2≤x<-1-1≤x<00≤x<11≤x<2x≥2 Positions 1,2631,4699,80822,6443,1522,133  Subsets with positions of different range of evaluations obtained with Rybka at level 12:

Diminishing returns and phase of the game  The experiments in this and the following section were performed by Crafty, with more or less balanced positions with depth 12 evaluation in range between and The results  There is no obvious correlation between move number and the chance of new best moves being discovered at higher depth.  In the positions very close to the end of the game it nevertheless decreases faster than in the positions of other groups. Group12345 Move no.(x)x<2020<x<3030<x<4040<x<50x>50 Positions 7,5805,3162,9181,  Six subsets of positions of different phases of the game, with evaluations in range between and 0.50, obtained at search depth 12:

Diminishing returns and material  Phase of the game is closely correlated with the amount of material on the board, so the rate of best-change properties will be lower in positions with less pieces on the board.  The pawns are counted in and the commonly accepted values of pieces were taken (queen = 9, rook = 5, bishop = 3, knight = 3, pawn = 1). The results  Material and best move changes are NOT clearly correlated.  Only the curve for positions with the total piece value of less than 15 points of material (for each of the players) slightly deviates from the others. Group Material(x)x<1515≤x<2020≤x<2525≤x<3030≤x≤35x>35 Positions 3,2361,7372,3222,6125,882 4,112  Six subsets of positions with different amount of material on the board (each player starts with the amount of 39 points) obtained at depth 12:

Possible applications  The new discoveries are not only of theoretical importance – in particular, knowing that a quality of evaluation function influences diminishing returns can be very useful for practical purposes. Comparing evaluation functions of the programs  While there are obvious ways to compare the strength of the programs, so far there were no possibilities of evaluating the strength of the evaluation functions.  Observing best change rates in evaluations of different programs seems to provide such a possibility. Adjusting weights of attributes in evaluation functions  Instead of computationally expensive self-play approaches, evaluating an appropriate set of positions at different depths could lead to desirable results.  Possible approach: optimisation of the weights guided by diminishing returns based score function.

Conclusions  Deep-search behaviour and the phenomenon of diminishing returns for additional search effort have been studied by several researchers, whereby different results were obtained on the different datasets used in go- deep experiments.  In this contribution we studied some factors that affect diminishing returns for searching deeper.  The results obtained on a large set of more than 40,000 positions from real chess games using programs Crafty and Rybka show that diminishing returns depend on: the values of positions in the dataset, the quality of evaluation function of the program used, and to some extent also on the phase of the game, and the amount of material on the board.