Presentation is loading. Please wait.

Presentation is loading. Please wait.

Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn.

Similar presentations


Presentation on theme: "Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn."— Presentation transcript:

1 Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 1

2 The challenges Large (or complex) multi-disciplinary projects – Multiple sites, data streams, standards, and practices – Complex data preparation procedures Point and click software used Documenting as overhead 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 2

3 Example Project Farmer's land use decisions related to climate change (e.g. biofuel related crops) One component of larger NSF grant Multiple teams, multiple universities – The two main sites are 135 km apart Multi-disciplinary – Economists, geographers, agronomists, biologists, engineers, climate scientists, anthropologist, sociologist, political scientists, urban planner, GIS experts, photographer 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 3

4 Example Project Data – Develop substantial geodatabase (ARC SDE) ground cover, soils, crop statistics, facility locations (e.g. purchaser, processing plant). Weather, climate, watershed and aquifer models, Sub-(farmer’s) field geographic level – Climate models at different scales – Focus groups and multi wave survey (geocoded) – Interviews coded in NVIVO (geocoded) – Photographs – Large proprietary dataset with time-limited use 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 4 Challenge - put it all together and document how it was done and how everything relates. Other example: Iassist posting

5 Spatial Aspects Reconciling different spatial schemes at multiple scales across time – Raster images, – model grids at different scales, – weather point sources, other point locations (e.g. biorefineries), – political entity polygons (state, county), – farm field and sub-field polygons, – Attribute data at all these levels, imputed and aggregated data Harmonizing data from different geographic schemes Producing new spatial objects – E.G. corners as separate from circle with center-pivot irrigation 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 5

6 New Polygons 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 6 Polygons to be extracted from remote sensing imagery Subfield areas sometimes grow different crops (corners are 21% of the square)

7 Need to Capture Process Example 1 Project member with expertise volunteered to process data to produce a spatial dataset (soils data). Users of the dataset discover anomalies Expert no longer available, can’t remember quite what he did and has no documentation (used point and click tools) Ouch 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 7

8 Process Example 2 Qualitative analysis – Transcription – Multiple coders, common coding scheme – Coding scheme evolves (capture this?) – Training – Paired coders code each interview – Testing of coder reliability Integrate this after the fact with geodatabase 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 8

9 Point and Click Some tools are only point and click and don’t create a log. – E.g. Some procedures in ArcGIS How do you document process – Screen capture pasted into Word? – Action recording software – Discoverable? Machine actionable? 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 9

10 An ArcGIS process (different project) 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 10 NSFCHEMAnnualDataProcedure.docx AnnualLinksByTime4.avi

11 Need Tools There is a need for tools built on top of standards that make it easy to capture and annotate process 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 11

12 Need Tools to Capture Process One example – SAS Enterprise Guide 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 12 Can modify nodes during development. Can run the process from any point But – overall process may involve multiple tools - in this case also R and ArcGIS. In other cases, multiple people in different settings. Scott Long - The Workflow of Data Analysis Using Stata http://www.indiana.edu/~jslsoc/web_workflow/wf_home.htm Datasets – Permanent and temporary

13 Capturing Process as it is Being Developed False starts and blind alleys – Does the whole process matter or only a process that reproduces the final result? (learn from my mistakes?) – Description of process gets edited as it evolves Adding minimal overhead – If the tool requires a lot of attention it won’t get used. Combining sub-processes Filling in pieces of overall planned project Parallel parts Time as ordinal or interval (or ratio?) 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 13

14 Annotated screen capture – works on top of any software – Text (or audio/video?) annotation – Dealing with IP in captured images – Flow diagram with popups? – Editable – Time stamped Tools – The Fantasy 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 14 Sub process edited separately Planned overall process Persistent identifiers allow (re-)linking

15 Final thoughts Metadata for the audience – Documentation for reproducibility – Documentation in cases of disputed results Sometimes the researcher is the audience – One researcher commented that having documentation at this level would be very helpful in writing methods sections of papers. – Teaching tool - critique students process – Assists refining methods – Also useful in future similar projects 10/22/2012 Dagstuhl Presentationn 2012 - Larry Hoyle 15


Download ppt "Documenting Research Project Process for Reproducibility Larry Hoyle Institute for Policy & Social Research University of Kansas 10/22/2012 Dagstuhl Presentationn."

Similar presentations


Ads by Google