Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan.

Slides:

Advertisements

Similar presentations

The Scientific MEthod 8th science 2013.

Advertisements

Animal, Plant & Soil Science

What is Science?.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Testing Theories: Three Reasons Why Data Might not Match the Theory.

Programming Types of Testing.

G. Alonso, D. Kossmann Systems Group

The Scientific Method.

Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark.

ITEC 451 Network Design and Analysis. 2 You will Learn: (1) Specifying performance requirements Evaluating design alternatives Comparing two or more systems.

Statistics CSE 807.

Experiments in Computer Science Mark Claypool. Introduction Some claim computer science is not an experimental science –Computers are man-made, predictable.

Statistical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 1 Statistical Methods in Computer Science Descriptive Statistics Data 1: Frequency.

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Statistical Methods in Computer Science Why? Ido Dagan.

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.

Developing Ideas for Research and Evaluating Theories of Behavior

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Lecture 10 Comparison and Evaluation of Alternative System Designs.

Applying Multi-Criteria Optimisation to Develop Cognitive Models Peter Lane University of Hertfordshire Fernand Gobet Brunel University.

The Scientific Method Chapter 1.

Statistical Methods in Computer Science © 2006-now Gal Kaminka / Ido Dagan 1 Statistical Methods in Computer Science Data 1: Frequency Distributions Ido.

Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.

Unit 1 THE SCIENTIFIC METHOD & VARIABLES. I. The goal of science  To INVESTIGATE! And UNDERSTAND! The natural world...  To explain events in the natural.

How to write a publishable qualitative article

The Research Process Interpretivist Positivist

1 CS 178H Introduction to Computer Science Research What is CS Research?

Chapter 1: Introduction to Statistics

How can you find a supported answer to an investigative question?

Theory testing Part of what differentiates science from non-science is the process of theory testing. When a theory has been articulated carefully, it.

Introduction to Experimental Design

Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437.

© 2003, Carla Ellis Experimentation in Computer Systems Research Why: “It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you.

Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.

Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.

Chapter 2 Section 1. Objectives Be able to define: science, scientific method, system, research, hypothesis, experiment, analysis, model, theory, variable,

Experimentation in Computer Science (Part 1). Outline  Empirical Strategies  Measurement  Experiment Process.

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 1. The Statistical Imagination.

ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.

CS529 Multimedia Networking Experiments in Computer Science.

EXPERIMENTAL DESIGN Science answers questions with experiments.

Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?

Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.

Building Simulation Model In this lecture, we are interested in whether a simulation model is accurate representation of the real system. We are interested.

Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental.

META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

©2010 John Wiley and Sons Chapter 2 Research Methods in Human-Computer Interaction Chapter 2- Experimental Research.

1 Common Mistakes in Performance Evaluation (1) 1.No Goals  Goals  Techniques, Metrics, Workload 2.Biased Goals  (Ex) To show that OUR system is better.

Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.

Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.

Research for Nurses: Methods and Interpretation Chapter 1 What is research? What is nursing research? What are the goals of Nursing research?

© 2003, Carla Ellis Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final.

CSC490 – Effect of Internship Experience on Technical Knowledge of Graduating CS Students By Tong Zou.

Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental.

Introduction Andy Wang CIS Computer Systems Performance Analysis.

Assess usability of a Web site’s information architecture: Approximate people’s information-seeking behavior (Monte Carlo simulation) Output quantitative.

FYP 446 /4 Final Year Project 2 Dr. Khairul Farihan Kasim FYP Coordinator Bioprocess Engineering Program Universiti Malaysia Perls.

CHAPTER ONE: INTRODUCTION TO ACTION RESEARCH CONNECTING THEORY TO PRACTICE IMPROVING EDUCATIONAL PRACTICE EMPOWERING TEACHERS.

Connecting Theory to Practice Improving Educational Practice Empowering Teachers Chapter One: Introduction to Action Research.

Survival Skills for Researchers Study Design. Typical Process in Research Design study Generate hypotheses Develop tentative new theories Analyze & interpret.

Common Mistakes in Performance Evaluation The Art of Computer Systems Performance Analysis By Raj Jain Adel Nadjaran Toosi.

Scientific Method and Experiment Additional Terms

Network Performance and Quality of Service

Section 2: Science as a Process

ITEC 451 Network Design and Analysis

Stat 217 – Day 28 Review Stat 217.

Scientific Inquiry Unit 0.3.

The Scientific Method Unit 1.

DESIGN OF EXPERIMENTS by R. C. Baker

Presentation transcript:

Statistical Methods in Computer Science Hypothesis Life-cycle Ido Dagan

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 2 Why to experiment? W. Tichy, “Should Computer Scientists Experiment More?” (on course web page) System/Model/theory testing –Identify incorrectness, incompleteness in your “theory”/assumptions This can save money and lives! –e.g. underlying assumptions that are violated by reality –Can lead to revising model and/or system Exploration –Find new phenomena –E.g. unknown user behaviors in using systems

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 3 Empirical Research Cycle Established methodology, with very long tradition Natural sciences, social sciences Cycle: Form theory/model E.g. search engine ranking function Hypothesize based on theory More relevant pages higher than less relevant ones Experiment (when possible) Ask people to judge relevance (binary, score, relative, …) Observe results Find discrepancies between hypothesized predictions and results Revise theory (and publish results) This course covers especially [hypothesis.... discrepancy] Heavy use of statistics and analytical skills (a bit of art)

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 4 Common Practice Vague idea No preliminary investigation No articulation of precise hypothesis Bad experimental design No iterations

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 5 Lots of Ways to Attack Experimentation Not general – only applies to the “system/setting under test”. E.g. general claims on user behavior true only for one system Not forward-looking motivations and observations based on the past. Lack of representative comparison inadequate benchmarks (users are happy with my system…) difficult/costly to implement comparisons Not enabling independent replication of experiments Real data can be messy – difficult to choose which data to gather E.g. which aspects of user behavior (speed, satisfaction, success,…)

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 6 Vague idea 1. Understand the problem, frame the questions, articulate the goals. A problem well-stated is half-solved. “groping around” experiences Model/ Theory Hypothesis Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental Lifecycle

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 7 A Systematic Approach 1.Understand the problem, frame the questions, articulate the goals. A problem well-stated is half-solved. Be able to answer “why” as well as “what” E.g. why people search? Find website? / Find information? 2.Select metrics that will help answer the questions. Rank of correct website / Percentage or relevant pages in top 10 3.Identify the parameters that affect behavior System parameters (e.g., HW config, search speed) Workload parameters (e.g., user request patterns) Data parameters (e.g. long/short documents) 4.Decide which parameters to study (vary in experiment)

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 8 What can go wrong at this stage? Never understanding the problem well enough: Can we crisply articulate the goals / hypothesis? Having no clear goal, but building an apparatus Getting invested in a solution before verifying a problem exists Getting invested in any desired result. Not being unbiased enough to follow proper methodology. Fishing expeditions (groping around forever).

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 9 Vague idea 2. Select metrics that will help answer the questions. 3. Identify the parameters that affect behavior “groping around” experiences Model/ Theory Hypothesis Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental Lifecycle

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 10 A Systematic Approach 1.Understand the problem, frame the questions, articulate the goals. A problem well-stated is half-solved. Must remain objective Be able to answer “why” as well as “what” 2.Select metrics that will help answer the questions. 3.Identify the parameters that affect behavior Those become part of your model, theory

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 11 Behavior Parameters/Variables Example: software performance Hardware parameters CPU model and organization, cache organization, latencies in the system (these will affect running time) System parameters Memory availability, usage CPU running time (sometimes approximated by world-clock time) Communication bandwidth, usage Program characteristics requires floating-point, heavy disk usage, integer math, graphics

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 12 Additional Behavior Variables Algorithm parameters: Algorithm choice, correctness/accuracy of results (may compromise) Performance curves (accuracy vs. run-time) Size of input Worst case, best case, average case Other Development/QA person-hours (e.g. expected bugs) User (programmer) satisfaction, productivity Lines of code, number of components,... Robotics: Speed of movement, accuracy of positioning

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 13 Now build a model (theory) Mathematically precise Memory = 2*sizeof(input) + 3 Runtime = *sizeof(input) + 20 Asymptotically correct Memory = O(sizeof(input)) in worst case, Runtime = O(log (sizeof(input))) in best case Accuracy is proportional to run-time Qualitative User performance is increased with reduced cognitive load Number of bugs discovered is monotonically decreasing if the same programmer is used, otherwise it increases

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 14 Now form hypothesis Translate qualitative into quantitative Use of new system will (these are different hypotheses): + Increase operator accuracy (compared to not using it) by X - Decrease failures by Y - Decrease performance time Z Introducing link information to relevance score will increase ranking quality by 10% Operationalize the hypothesis

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 15 What can go wrong at this stage? Wrong metrics (they don’t address the questions at hand) e.g., ads click through, rather than purchase Bad metrics: too difficult to measure, too costly Overlooking significant parameters that affect the system Not clear about where the “system under test” boundaries are E.g. poor ad content rather than poor ad matching Unrepresentative test-setting. Not predictive of real usage. Just what everyone else uses (adopted blindly) NOT what anyone else uses (no comparison possible)

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 16 Vague idea “groping around” experiences Model/ Theory Hypothesis Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental Lifecycle 1.Decide which parameters to vary 2.Select technique 3.Select measurements

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 17 1.Decide which parameters to study (vary) 2.Select measurement technique: Can we directly measure what we want? Intrusive (invasive) versus unobtrusive measurement How invasive? Can we quantify interference of monitoring? E.g. should user mark relevance, or we just follow clicks? Simulation – how detailed? Validated against what? Benchmarks Repeatability 3.Experiment design –Lesion studies / ablation tests (with and without component) –Iron-man (e.g. human performance), straw-man –Baseline, ceilings and floors –Factorial design A Systematic Approach

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 18 Vague idea “groping around” experiences Hypothesis Model Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental Lifecycle 1.Run experiments 2.Analyze and interpret data 3.Data presentation

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 19 1.Run experiments How many trials? How many combinations of parameter settings? (e.g. users age groups) Practically limited 2.Analyze and interpret data Descriptive statistics Dealing with variability, outliers Hypothesis testing: sample vs. population Potentially infinite population (e.g. software runs) Claims on variable values for population based on sample variables Statistical significance 3.Data presentation A Systematic Approach

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 20 What can go wrong at this stage? Not choosing to study the parameters that matter most – factors Choosing the wrong values for parameters you aren’t going to vary. Not considering the effect of other values (sensitivity analysis) Wrong experimental technique E.g. test run time of alternative algorithms in Java in same process – memory accumulates Mistake in Data processing (!!!)

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 21 What can go wrong at this stage? One trial – data from a single run when variation can arise. Multiple runs – reporting average but not variability Tricks of statistics No interpretation of what the results mean Ignoring errors and outliers Over-generalizing conclusions Omitting assumptions and limitations of study.

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 22 1.Run experiments How many trials? How many combinations of parameter settings? Sensitivity analysis on other parameter values. 2.Analyze and interpret data Statistics, dealing with variability, outliers 3.Data presentation 4.Where does it lead us next? New hypotheses, new questions, a new round of experiments A Systematic Approach

Statistical Methods in Computer Science © 2006-now Gal Kaminka/ Ido Dagan. Portions © Carla Ellis at Duke University 23 Vague idea “groping around” experiences Model/ Theory Hypothesis Initial observations Experiment Data, analysis, interpretation Results & final Presentation Experimental Lifecycle