Replicating Results- Procedures and Pitfalls June 1, 2005.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Electronic theses and copyright Janet Aucock Head of Repository services March 2014.
Configuration Management
Strategies for solving scientific problems using computers.
Review Questions Business 205
Responding to Record Requests. TYPE OF REQUEST Public Information Act (PIA) Request otherwise known as an Open Records Request Inactive Student Record.
CCPR Computing Services More Efficient Programming July 13, 2006.
Approaches to Publish rather than Perish: Some Lessons from the School of Hard Knocks Dr. John Loomis, Professor Dept. of Ag & Resource Economics Colorado.
System Design System Design - Mr. Ahmad Al-Ghoul System Analysis and Design.
HOW TO USE THE SYSTEM Specialty Crop Block Grant Program Online System.
Data Processing A simple model and current UKDA practice Alasdair Crockett, Data Standards Manager, UKDA.
So, You’re Going to Write an Empirical Paper Statlab Workshop October 31 st, 2003 David Nickerson.
Research Integrity: Collaborative Research Michelle Stickler, DEd Office for Research Protections
PPA 502 – Program Evaluation Lecture 5b – Collecting Data from Agency Records.
Chapter 4 Topics –Sampling –Hard data –Workflow analysis –Archival documents.
Manuscript Writing and the Peer-Review Process
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
This chapter is extracted from Sommerville’s slides. Text book chapter
2015 Form 471 Step-by-Step for Filing the Form 471 Using ITS as the Billed Entity Century Link and Frontier Local TWO FRNs 1.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
ACRIN 6698 Diffusion-weighted MRI Biomarkers for Assessment of Breast Cancer Response to Neoadjuvant Treatment: An I-SPY 2 Trial Substudy Presented by:
Advanced Shell Programming. 2 Objectives Use techniques to ensure a script is employing the correct shell Set the default shell Configure Bash login and.
© 2011 Octagon Research Solutions, Inc. All Rights Reserved. The contents of this document are confidential and proprietary to Octagon Research Solutions,
SYSTEMS ANALYSIS FORM 4 Included in this topic: Information Systems Systems Analysts System Life Cycle (incl. Case Study) Documentation.
1 Data Management (1) Data Management (1) “Application of Information and Communication Technology to Production and Dissemination of Official statistics”
NIH Data Sharing Policy University of Nebraska Medical Center.
VGLA/VSEP Implementation For Program Administrators
Introduction to writing scientific papers Gaby van Dijk.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Author Instructions How to upload Abstracts and Sessions to the Paper Management System.
Chapter Five Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command.
Data Access and Research Transparency in Political Science Journals John Ishiyama Professor of Political Science & Editor in Chief American Political Science.
Creating documentation and metadata: Recording provenance and context Jeff Arnfield National Climatic Data Center Version a1.0 Review Date.
(Spring 2015) Instructor: Craig Duckett Lecture 10: Tuesday, May 12, 2015 Mere Mortals Chap. 7 Summary, Team Work Time 1.
Data Management in Scholarly Journals and possible Roles for Libraries – Some Insights from EDaWaX Sven Vlaeminck | Leibniz-Information Centre for Economics.
Research Methods and Techniques Lecture 8 Technical Writing 1 © 2004, J S Sventek, University of Glasgow.
Organizing a project, making a table Biostatistics 212 Lecture 7.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
1 ARRO: Anglia Ruskin Research Online Making submissions: Benefits and Process.
What Does it Take to Publish in the AJAE? Get a good idea. Turn the idea into a well-posed, answerable question. Do the research right. Write Effectively.
Copyright 2010, The World Bank Group. All Rights Reserved. Statistical Project Monitoring Section A 1.
Basic Writing Skills Science Workshop 1pm Tuesday March 6 th Department of Biological Sciences.
2015 Form 471 Step-by-Step for Filing the Form 471 Using ITS as the Billed Entity AT&T Long Distance 1.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
Title Page The title page is the first page of your psychology paper. In order to make a good first impression, it is important to have a well-formatted.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
CBER Common Problems on Source Plasma Inspections Judy Ellen Ciaraldi BS, MT(ASCP)SBB, CQA(ASQ) CBER, OBRR, DBA September 16, 2009.
Data Organization Quality Assurance and Transformations.
Mrs. Herrera English Language Arts and Composition.
Workshop Overview What is a report? Sections of a report Report-Writing Tips.
European Commission 1 Quick Guide to the eTIP V1.3.
Project management Topic 8 Configuration Management.
Tutorial 1 Dr. Oscar Lin School of Computing and Information Systems Faculty of Science and Technology Athabasca University January 18, 2011.
Collecting Copyright Transfers and Disclosures via Editorial Manager™ -- Editorial Office Guide 2015.
St. Mary’s Catholic School, Mayville Mrs. Kaiser, Technology Teacher.
The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.
MANAGEMENT INFORMATION SYSTEM
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
Work Flows of the Online Review System Copernicus Office Editor
Dr.V.Jaiganesh Professor
Incorporating W3C’s DQV and PROV in CISER’s Data Quality Review and
Writing Scientific Research Paper
(Winter 2017) Instructor: Craig Duckett
Developing Problem Statement for Dissertation
David Adams Brookhaven National Laboratory September 28, 2006
Information Literacy Peer Reviewed Sources
Turning In Your Final Paper
Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.
Presentation transcript:

Replicating Results- Procedures and Pitfalls June 1, 2005

The JMCB Data Storage and Evaluation Project Project summary –Part 1- July 1982 JMCB started requesting programs/data from authors –Part 2- attempt replication of published results based on submissions Review of results from Part 2 in Replication in Empirical Economics: The Journal of Money, Credit and Banking Project; The American Economic Review, Sept 1986, by Dewald, Thursby, Anderson

The JMCB Data Storage and Evaluation Project/ Dewald et al The paper focuses on Part 2 -How people responded to the request -Quality of the data that was submitted -The actual success (or lack thereof) of replication efforts

The JMCB Data Storage and Evaluation Project/ Dewald et al Three groups: –Group 1: Papers submitted and published prior to These authors did not know upon submission that they would be subsequently asked for programs/data. –Group 2: Authors whose papers were accepted for publication beginning July, 1982 –Group 3: Authors whose papers were under review beginning July, 1982

Group 1Group 2Group 3 Requests Responses: Total Percent Mean response time in days % % % Datasets Submitted22 35% 21 78% 47 72% Datasets Not Submitted:40618 Confidential Data210 Lost or Destroyed Data1421 Data Available, but not Sent421 Nonrespondents20116 Summary of Responses/Datasets Submitted, Dewald et al, p 591

Summary of Examined Datasets Dewald et al, p Group 1Group 2Group 3 Total Datasets Submitted Data Sets Examined No Problems134 Problems by type Incomplete Submission635 Sources Cited Incorrectly044 Sources Cited Imprecisely11710 Data Transformations Described Incompletely 341 Data Element Not Clearly Defined232 Other031 Total222423

“Our findings suggest that inadvertent errors in published empirical articles are a commonplace rather than a rare occurrence.” – Dewald et al, page “We found that the very process of authors compiling their programs and data for submission reveals to them ambiguities, errors, and oversights which otherwise would be undetected.” – Dewald et al, page 589

Raw data to finished product Raw data Analysis data Runs/results Finished product

Raw Data -> Analysis Data Always have two distinct data files- the raw data and analysis data A program should completely re-create analysis data from raw data NO interactive changes!! Final changes must go in a program!!

Raw Data -> Analysis Data Document all of the following: –Outliers? –Errors? –Missing data? –Changes to the data? Remember to check- –Consistency across variables –Duplicates –Individual records, not just summary stats –“Smell tests”

Analysis Data -> Results All results should be produced by a program Program should use analysis data (not raw) Have a “translation” of raw variable names -> analysis variable names -> publication variable names

Analysis Data -> Results Document- –How were variances estimated? Why? –What algorithms were used and why? Were results robust? –What starting values were used? Was convergence sensitive? –Did you perform diagnostics? Include in programs/documentation.

Thinking ahead Delete or archive old files as you go Use a meaningful directory structure (/raw, /data, /programs, /logfiles, /graphs etc.) Use relative pathnames Use meaningful variable names Use a script to sequentially run programs

Example script to sequentially run programs 1. #! /bin/csh 2. #File location: /u/machine/username/project/scripts/myproj.csh 3. #Author: your name 4. #Date: 9/21/04 5. #This script runs a do-file in Stata which produces and saves a dta-file 6. #in the data directory. Stat-transfer converts the.dta file to.sas7bdat 7. #and saves the file in the data folder. The program analyze.sas is run on 8. #the new sas data-file. 9. cd /u/machine/username/project/ 10. stata -b do programs/cleandata.do 11. st data/H00x_B.dta data/$file.sas7bdat 12. sas programs/analyze.sas

Log files Your log file should tell a story to the reader. As you print results to the log file, include words explaining the results Don’t output everything to the log-file- use quietly and noisily in a meaningful way. Include not only what your code is doing, but your reasoning and thought process

Project Clean-up Create a zip file that contains everything necessary for complete replication Delete/archive unused or old files Include any referenced files in zip When you have a final zip archive containing everything- –Open it in it’s own directory and run the script –Check that all the results match

When there are data restrictions… Consider releasing: –the subset of the raw data used –your analysis data as opposed to raw data –(at a minimum) notes on process from raw to analysis data PLUS everything pertaining to the data analysis Consider “internal” and “external” version of your log-file: –Do this via a variable at the top of your log-files: local internal = 1 … list if `internal’ == 1

Ethical Issues All authors are responsible for proper clean-up of the project Extremely important whether or not you plan on releasing data and programs Motivation –self-interest –honest research –the scientific method –allowing others to be critical of your methods/results –furthering your field

Ethical Issues – for discussion What if third-party redistribution of data is not allowed? Solutions for releasing data while protecting your time investment in data collection Is it unfair to ask people to release data after a huge time investment in the collection?