Experiment Databases: Towards better experimental research in machine learning and data mining Hendrik Blockeel Katholieke Universiteit Leuven.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

Overview of Inferential Statistics
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Author: Steven L. Salzberg Presented by: Zheng Liu.
Dimensional Modeling.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Decision trees for hierarchical multilabel classification A case study in functional genomics.
Clustering V. Outline Validating clustering results Randomization tests.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
The Comparison of the Software Cost Estimating Methods
Search Engines and Information Retrieval
Non-Experimental designs: Developmental designs & Small-N designs
Evaluating Hypotheses
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
The AutoSimOA Project Katy Hoad, Stewart Robinson, Ruth Davies Warwick Business School OR49 Sept 07 A 3 year, EPSRC funded project in collaboration with.
EEM332 Design of Experiments En. Mohd Nazri Mahmud
Information Extraction from HTML: General Machine Learning Approach Using SRV.
Experimental Evaluation
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Project Workshops Results and Evaluation. General The Results section presents the results to demonstrate the performance of the proposed solution. It.
Meta Learning and Active Learning: Meta Learning and Active Learning: Collaborative Knowledge Discovery in Distributed Systems Dr Yonghong Peng Department.
How to write a publishable qualitative article
IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Understanding Data Analytics and Data Mining Introduction.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Search Engines and Information Retrieval Chapter 1.
Inductive learning Simplest form: learn a function from examples
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Verification & Validation
Simulation Prepared by Amani Salah AL-Saigaly Supervised by Dr. Sana’a Wafa Al-Sayegh University of Palestine.
Advanced Technology Center Slide 1 Requirements-Based Testing Dr. Mats P. E. Heimdahl University of Minnesota Software Engineering Center Dr. Steven P.
An Introduction to Design Patterns. Introduction Promote reuse. Use the experiences of software developers. A shared library/lingo used by developers.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
CSE 219 Computer Science III Program Design Principles.
Learning from Observations Chapter 18 Through
Object Persistence Design Chapter 13. Key Definitions Object persistence involves the selection of a storage format and optimization for performance.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
FORS 8450 Advanced Forest Planning Lecture 5 Relatively Straightforward Stochastic Approach.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ECML 2001 A Framework for Learning Rules from Multi-Instance Data Yann Chevaleyre and Jean-Daniel Zucker University of Paris VI – LIP6 - CNRS.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
1 Algorithms  Algorithms are simply a list of steps required to solve some particular problem  They are designed as abstractions of processes carried.
Association Rule Mining. 2 Clearly not limited to market-basket analysis Associations may be found among any set of attributes – If a representative.
1 Information Retrieval LECTURE 1 : Introduction.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Data Mining and Decision Support
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Types of method Quantitative: – Questionnaires – Experimental designs Qualitative: – Interviews – Focus groups – Observation Triangulation.
Joaquin Vanschoren Hendrik Blockeel Experiment Databases for Machine Learning MLOSS Workshop - NIPS 2008.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Understanding AI of 2 Player Games. Motivation Not much experience in AI (first AI project) and no specific interests/passion that I wanted to explore.
7. Performance Measurement
How to write a publishable qualitative article
Lecture 3 of Computer Science II
COMP 430 Intro. to Database Systems
Data Mining Lecture 11.
Professor S K Dubey,VSM Amity School of Business
Dr. Arslan Ornek MATHEMATICAL MODELS
Presentation transcript:

Experiment Databases: Towards better experimental research in machine learning and data mining Hendrik Blockeel Katholieke Universiteit Leuven

Motivation Much research in ML / DM involves experimental evaluation Interpreting results is more difficult than it may seem Typically, a few specific implementations of algorithms, with specific parameter settings, are compared on a few datasets, and then general conclusions are drawn How generalizable are these results really? Evidence exists that too general conclusions are often drawn E.g., Perlich & Provost: different relative performance of techniques depending on size of dataset

Very sparse evidence Dataset space (DS) Algorithm parameter space (AP) x x x A few points in an N-dim space, where N is very large: very sparse evidence!

An improved methodology We here argue in favour of an improved experimental methodology: Perform much more experiments Better coverage of algorithm – dataset space Store results in an “experiment database” Better reproducability Mine that database for patterns More advanced analysis possible The approach shares characteristics of inductive databases: The database will be mined for specific kinds of patterns: inductive queries, constraint based mining

Classical setup of experiments Currently, performance evaluations of algorithms rely on few specific instantiations of algorithms (implementations, parameters), tested on few datasets (with specific properties), often focusing on specific evaluation criteria, and with a specific research question in mind Disadvantages: Limited generalisability (see before) Limited reusability of experiments If we want to test another hypothesis, we need to run new experiments, with a different setup, and now recording other information

Setup of an experiment database The ExpDB is filled with results from random instantiations of algorithms, on random datasets Algorithm parameters, dataset properties are recorded Performance criteria are measured and stored These experiments cover the whole DS x AP-space Choose alg.Choose param Generate dataset Run CART C4.5 Ripper... Leaf size > 2 Heuristic = gain... #examples=1000 #attr=20... Store Alg. par., dataset prop., results

Setup of an experiment database When experimenting with 1 learner, e.g., C4.5: ExAttrCompl 2gain MLSheur TPFPRT... Algorithm parameters Dataset characteristics Performance

Setup of an experiment database When experimenting with multiple learners: More complicated setting, will not be considered here DTC4.5C45-1 Alg.Inst.PI 2gain... MLSheur... C4.5ParInst C45-1 PI ExpDB ExAttrCompl TPFPRT... yesGini... BSheur... CA-1 PI CART-ParInst DTCARTCA

Experimental questions and hypotheses Example questions: What is the effect of Parameter X on runtime ? What is the effect of the number of examples in the dataset on TP and FP?.... With classical methodology: Different sets of experiments needed for each (Unless all questions known in advance, and experiments designed in order to answer all of them) ExpDB approach: Just query the ExpDB table for the answer New question = 1 new query, not new experiments

Inductive querying To find the right patterns in the ExpDB, we need a suitable query language Many queries can be answered with standard SQL, but (probably) not all (easily) We illustrate this with some simple examples

Investigating a simple effect The effect of #Items on Runtime for frequent itemset algorithms SELECT NItems, Runtime FROM ExpDB SORT BY NItems SELECT NItems, AVG Runtime FROM ExpDB GROUP BY NItems SORT BY NItems NItems Runtime x x x x x x x x x x

Investigating a simple effect Note: Setting all parameters randomly creates more variance in the results In the classical approach, these other parameters would simply be kept constant This leads to clearer, but possibly less generalisable results This can be simulated easily in the ExpDB setting! + : condition is explicit in the query - : we use only a part of the ExpDB So, ExpDB needs to have many experiments SELECT NItems, Runtime FROM ExpDB WHERE MinSupport=0.05 SORT BY NItems

Investigating interaction of effects E.g., does effect of NItems on Runtime change with MinSupport and NTrans? FOR a=0.01, 0.02, 0.05, 0.1 DO FOR b=10 3,10 4, 10 5,10 6,10 7 DO PLOT SELECT Nitems, Runtime FROM ExpDB WHERE MinSupport=$a AND $b <= NTrans < 10*$b SORT BY NITems

Direct questions instead of repeated hypothesis testing (“true” data mining) What is the algorithm parameter that has the strongest influence on the runtime of my decision tree learner? SELECT ParName, Var(A)/Avg(V) as Effect FROM AlgorithmParameters, (SELECT $ParName, Var(Runtime) as V, Avg(Runtime) as A FROM ExpDB GROUP BY $ParName) GROUP BY ParName SORT BY Effect Not (easily) expressible in standard SQL ! (pivoting: possible by hardcoding all attribute names in the query: not very readable or reusable)

A comparison Classical approachExpDB approach 1) Experiments are goal-oriented 2) Experiments seem more convincing than they are 3) Need to do new experiments when new research questions pop up 4) Conditions under which results are valid are unclear 5) Relatively simple analysis of results 6) Mostly repeated hypothesis testing, rather than direct questions 7) Low reusability and reproducibility 1) Experiments are general-purpose 2) Experiments seem as convincing as they are 3) No new experiments needed when new research questions pop up 4) Conditions under which results are valid are explicit in the query 5) Sophisticated analysis of results possible 6) Direct questions possible, given suitable inductive query languages 7) Better reusability and reproducibility

Summary ExpDB approach Is more efficient The same set of experiments is reusable and reused Is more precise and thrustworthy Conditions under which the conclusions hold are explicitly stated Yields better documented experiments Precise information on all experiments is kept, experiments are reproducible Allows more sophisticated analysis of results Interaction of effects, true data mining capacity Note: interesting for meta-learning!

The challenges... (*) Good dataset generators necessary Generating truly varying datasets is not easy Could start from real-life datasets (build variations) Extensive descriptions of datasets and algorithms Vary as many possibly relevant properties as possible Database schema for multi-algorithm ExpDB Suitable inductive query languages (*) note: even without solving all these problems, some improvement over the current situation is feasible and easy to achieve