Download presentation
Presentation is loading. Please wait.
Published byPrimrose Hubbard Modified over 9 years ago
1
Make the World a Better Place through Reproducible Research Roger D. Peng Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Wall of Wonder 2006-05-12
2
Trends in Scientific Research Signal-to-noise in many investigations is getting smaller Smaller relative risks –e.g. relative risk of mortality is 1.005 per 10 ppb of ozone High-throughput measurement technologies Powerful computers
3
Trends in Computing: Then...
4
...And Now
5
The Result? Large databases for investigating subtle associations Interactive computing with advanced statistical algorithms Sophisticated searches across models and variables to identify important risks Bigger and better studies
6
Replication: The Standard Scientific evidence is strengthened when important findings are replicated by independent investigators, data, methods, laboratories, instruments, etc. Replication is often not possible because of time, funding constraints Policy decisions must often be made with evidence at hand
7
Reproducible Research: A Minimum Standard Published research where the following are made available: Analytic data Computer code implementing methods Documentation about code/data All are distributed using standard means
8
Benefits of Reproducible Research Published findings can be verified Alternative analyses conducted Challenge uninformed criticisms (“put up or shut up”) Expedite exchange of ideas among investigators
9
Challenges to RR “If I give away my data, others will publish results and scoop me” “I own my data and ideas, other people don’t necessarily have any rights to them” Why should I just give away my intellectual property?
10
Ya see, it’s what I call the “ownership society”
11
Property [Automatt][JRodrigues] [james.thompson] [nervsappy]
12
“Intellectual Property” “the intangible value created by human creativity and invention” –from JHSPH Office of Technology Transfer (emphasis added) How can something that is intangible be property?
13
There’s No Such Thing as “Intellectual Property” If I copy your book, you still have your book If I use your idea, you still have your idea If I copy your data, you still have your data If I use your statistical model, you still have your statistical model If I implement your algorithm, you still have your algorithm etc.
14
Research done by you regardless of sharing What are the Potential Gains and Losses from Sharing Data? Research done by you regardless of sharing Data Research done by others Research you would have done if you hadn’t shared data Don’t share Share = Y(1) = Y(0) (a)D = 0 (b)D < 0 (c)D > 0 D = Y(1) - Y(0)
15
What is a Dataset? Represents already published findings and ideas Contains potential findings and ideas yet to be discovered and exploited Datasets do not fit well into the framework of copyrights and patents
16
What Do We Need? Infrastructure –Tools for researchers, developers –Repositories for datasets –Rights framework for datasets Privacy preservation Handle computer language Babel Structured research modularity
17
“WWKD”
18
“WWKD” What Would Karl Do?
19
Models for Reproducibility
25
An Example http://www.biostat.jhsph.edu/MCAPS/
26
Partial Rights for Data? A First Cut Full access: the data can be used for any purpose Attribution: the data can be use for any purpose with a specific citation Share-alike: the data can be used for any purpose but any “improvements” must be made available under the same license Reproduction-only: the data can only be used for reproducing published results and commenting via a letter to the editor
27
Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.