Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and.

Slides:



Advertisements
Similar presentations
Critical Reading Strategies: Overview of Research Process
Advertisements

On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach Author: Steven L. Salzberg Presented by: Zheng Liu.
CSE594 Fall 2009 Jennifer Wong Oct. 14, 2009
1 A Systematic Review of Cross- vs. Within-Company Cost Estimation Studies Barbara Kitchenham Emilia Mendes Guilherme Travassos.
Quantitative vs. Qualitative Research Method Issues Marian Ford Erin Gonzales November 2, 2010.
Welcome to Survey of Research Methods and Statistics! PS 510.
Dissemination and Critical Evaluation of Published Research Peg Bottjen, MPA, MT(ASCP)SC.
Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and.
How to Read a CS Research Paper? Philip W. L. Fong.
Aims Correlation between ISI citation counts and either Google Scholar or Google Web/URL citation counts for articles in OA journals in eight disciplines.
Introduction to Communication Research
Jukka-Pekka Suomela 2014 Ethics and quality in research and publishing.
University of Florida Mechanical and Aerospace Engineering 1 Useful Tips for Presenting Data and Measurement Uncertainty Analysis Ben Smarslok.
1 CS 178H Introduction to Computer Science Research What is CS Research?
Section 2: Science as a Process
Research Methods Key Points What is empirical research? What is the scientific method? How do psychologists conduct research? What are some important.
© Copyright McGraw-Hill CHAPTER 1 The Nature of Probability and Statistics.
The Research on Credibility of Knowledge Management System Wang FanLin Department of Accounting Capital University of Economic Business Beijing, China.
Program Evaluation. Program evaluation Methodological techniques of the social sciences social policy public welfare administration.
A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis.
Research in Computing สมชาย ประสิทธิ์จูตระกูล. Success Factors in Computing Research Research Computing Knowledge Scientific MethodAnalytical Skill Funding.
Evaluation of software engineering. Software engineering research : Research in SE aims to achieve two main goals: 1) To increase the knowledge about.
Class Starter Please list the first five words or phrases that come to your mind when you hear the word : CHEMISTRY.
Intelligent Database Systems Lab Presenter : NENG-KAI, HONG Authors : CÉSAR DOMÍNGUEZ, ARTURO JAIME 2014, CE Database design learning: A project-based.
Chapter 2 Section 1. Objectives Be able to define: science, scientific method, system, research, hypothesis, experiment, analysis, model, theory, variable,
The student will demonstrate an understanding of how scientific inquiry and technological design, including mathematical analysis, can be used appropriately.
The Scientific Method in Psychology.  Descriptive Studies: naturalistic observations; case studies. Individuals observed in their environment.  Correlational.
1 Literature review. 2 When you may write a literature review As an assignment For a report or thesis (e.g. for senior project) As a graduate student.
Scientific Method for a controlled experiment. Observation Previous data Previous results Previous conclusions.
Assessing the Frequency of Empirical Evaluation in Software Modeling Research Workshop on Experiences and Empirical Studies in Software Modelling (EESSMod)
Experimentation in Computer Science (Part 1). Outline  Empirical Strategies  Measurement  Experiment Process.
Experiments in computer science Emmanuel Jeannot INRIA – LORIA Aleae Kick-off meeting April 1st 2009.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
How to Read Research Papers? Xiao Qin Department of Computer Science and Software Engineering Auburn University
What is Computer Science?  Three paradigms (CACM 1/89) Theory (math): definitions, theorems, proofs, interpretations Abstraction (science): hypothesize,
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Research Methods in Computational Informatics IST 501 Fall 2014 Dongwon Lee, Ph.D.
CS529 Multimedia Networking Experiments in Computer Science.
LITERATURE REVIEW  A GENERAL GUIDE  MAIN SOURCE  HART, C. (1998), DOING A LITERATURE REVIEW: RELEASING THE SOCIAL SCIENCE RESEARCH IMAGINATION.
The Nature of Science p. 33 of Worksheet Packet Fill in the blanks, please.
MODES-650 Advanced System Simulation Presented by Olgun Karademirci VERIFICATION AND VALIDATION OF SIMULATION MODELS.
HiST Tacitly developed research methods in ICS Ivar Tormod Berg Ørstavik.
Review of the Scientific Method Chapter 1. Scientific Method – –Organized, logical approach to scientific research. Not a list of rules, but a general.
Introduction to Research Methods PSY 101. How to read a journal reference Courtney, K.E. & Polich, J. (2009). Binge drinking in young adults: Data, definitions,
Quantitative Wildlife Ecology Thinking Quantitatively Fear of mathematics Uncertainty & the art Sampling, experimental design, & analyses Presentation.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Basics and Principles of Scientific Research By Ass. Prof. Dr. Majid S. Naghmash Diglah University College Department of Computer Engineering Techniques.
The Process of Conducting Research. What is a theory? a set of general principles that explains the how and why of phenomena. Theories are not directly.
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
Research Methodology II Term review. Theoretical framework  What is meant by a theory? It is a set of interrelated constructs, definitions and propositions.
Scientific Methodology Vodcast 1.1 Unit 1: Introduction to Biology.
Data Analysis. Warm Up: Finish up the notes on Health and Stress. You will have 20 minutes. StandardObjective IA-3.1 Describe the elements of an experiment.
??? Steps The Are What  1. OBSERVATION (or problem): Develop a question based on the observation/problem  2. GATHER INFORMATION: You need to get educated.
D10A Metode Penelitian MP-04b Metodologi Penelitian di dalam Ilmu Komputer/Informatika Program Studi S-1 Teknik Informatika FMIPA Universitas.
HiST Tacitly developed research methods in ICS Ivar Tormod Berg Ørstavik.
BPK 304W Scientific Method
Chapter 1: Introduction to Econometrics
Scientific Methodology
CSC 682: Advanced Computer Security
Software Engineering Experimentation
How to Read Research Papers?
Quantitative vs. Qualitative Research Method Issues
Scientific Method and Graphing
สมชาย ประสิทธิ์จูตระกูล
Copyright Cmassengale
A Template for Producing IT Research and Publication
Financial Econometrics Fin. 505
CS 791Graduate Topics in Computer Science [Software Engineering]
Presentation transcript:

Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and Software January 1995

Outline Motivation Related Work Methodology Observations Accuracy Conclusions Future work!

Introduction Large part of CS research new designs –systems, algorithms, models Objective study needs experiments Hypothesis –Experimental study often neglected in CS If accepted, CS inferior to natural sciences, engineering and applied math Paper ‘scientifically’ tests hypothesis

Related Work 1979 surveys say experiments lacking –1994 say experimental CS under funded 1980, Denning defines experimental CS –“ Measuring an apparatus in order to test a hypothesis ” –“If we do not live up to traditional science standards, no one will take us seriously” Articles on role of experiments in various CS disciplines 1990 experimental CS seen as growing, but 1994 –“Falls short of science on all levels” No systematic attempt to assess research

Methodology Select Papers Classify Results Analysis Dissemination (this paper)

Select CS Papers Sample broad set of CS publications (200 papers) –ACM Transactions on Computer Systems (TOCS), volumes 9-11 –ACM Transactions on Programming Languages and Systems (TOPLAS), volumes –IEEE Transactions on Software Engineering (TSE), volume 19 –Proceedings of 1993 Conference on Programming Language Design and Implementation Random Sample (50 papers) –74 titles by ACM via INSPEC (24 discarded) +30 refereed

Select Comparison Papers Neural Computing (72 papers) –Neural Computation, volume 5 –Interdsciplinary: bio, CS, math, medicine … –Neural networks, neural modeling … –Young field (1990) and CS overlap Optical Engineering (75 papers) –Optical Engineering, volume 33, no 1 and 3 –Applied optics, opto-mech, image proc. –Contributors from: ee, astronomy, optics… –Applied, like CS, but longer history

Classify Same person read most Two read all, save NC

Major Categories Formal Theory –Formally tractable: theorem’s and proofs Design and Modeling –Systems, techniques, models –Cannot be formally proven  require experiments Empirical Work –Analyze performance of known objects Hypothesis Testing –Describe hypotheses and test Other –Ex: surveys

Subclasses of Design and Modeling Amount of physical space for experiments –Setups, Results, Analysis 0-10%, 11-20%, 21-50%, 51%+ To shallow? Assumptions: –Amount of space proportional to importance by authors and reviewers –Amount of space correlated to importance to research Also, concerned with those that had no experimental evaluation at all

Assessing Experimental Evaluation Look for execution of apparatus, techniques or methods, models validated Tables, graphs, section headings… No assessment of quality But count only ‘true’ experimental work –Repeatable –Objective (ex: benchmark) No demonstrations, no examples Some simulations –Supplies data for other experiments –Trace driven

Outline Motivation Related Work Methodology Observations Accuracy Conclusions Future work!

Observation of Major Categories Majority is design and modeling The CS samples have lower percentage of empirical work than OE and NC Hypothesis testing is rare (4 articles out of 403!)

Observation of Major Categories Combine hypothesis testing with empirical

Observation of Design Sub-Classes Higher percentage with no evaluation for CS vs. NC+OE (43% vs. 14%)

Observation of Design Sub-Classes Many more NC+OE with 20%+ than in CS Software engineering (TSE and TOPLAS) worse than random

Observation of Design Sub-Classes Shows percentage that have 20%+ or more to experimental evaluation

Groupwork: How Experimental is WPI CS? Take 2 papers: KDDRG, PEDS, SERG, DSRG, AIDG, GTRG Read abstract, flip through Categorize: –Formal Theory –Design and Modelling +Count pages for experiments –Empirical –Hypothesis Testing –Other Swap with another group

Outline Motivation Related Work Methodology Observations Accuracy Conclusions Future work

Accuracy of Study Deals with humans, so subjective Psychology techniques to get objective measure –Large number of users  Beyond resources (and a lot of work!) –Provide papers, so other can provide data Systematic errors –Classification errors –Paper selection bias

Systematic Error: Classification Classification differences between 468 article classification pairs

Systematic Error: Classification Classification ambiguity –Large between Theory and Design-0% (26%) –Design-0% and Other (10%) –Design-0% with simulations (20%) Counting inaccuracy –15% from counting experiment space differently

Systematic Error: Paper Selection Journals may not be representative of CS –PLDI proceedings is a ‘case study’ of conferences Random sample may not be “random” –Influenced by INSPEC database holdings –Further influenced by library holdings Statistical error if selection within journals do not represent journals

Overall Accuracy (Maximize Distortion) No Experimental Evaluation 20%+ Space for Experiments

Conclusion 40% of CS design articles lack experiments –Non-CS around 10% 70% of CS have less than 20% space –NC and OE around 40% CS conferences no worse than journals! Youth of CS is not to blame Experiment difficulty not to blame –Harder in physics –Psychology methods can help Field as a whole neglects importance

Guidelines Higher standards for design papers Recognize empirical as first class science Need more publicly available benchmarks Need rules for how to conduct repeatable experiments Tenure committees and funding orgs need to recognize work involved in experimental CS Look in the mirror