Presentation is loading. Please wait.

Presentation is loading. Please wait.

Methods Reproducibility & Results Reproducibility

Similar presentations


Presentation on theme: "Methods Reproducibility & Results Reproducibility"— Presentation transcript:

1 Methods Reproducibility & Results Reproducibility
Courtney Soderberg Jennifer Freeman Smith

2 Recap What content was covered during the first two days of the course?

3 What are different forms of reproducibility?
Computation Reproducibility: If we took your data and code/analysis scripts and reran it, we can reproduce the numbers/graphs in your paper Methods Reproducibility: We have enough information to rerun the experiment or survey the way it was originally conducted Methods reproducibility refers to the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated. Operationally, this can mean different things in different sciences. Results reproducibility (previously described as replicability) refers to obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible.

4 What do cupcakes have to do with reproducibility?

5 Methods Reproducibility
INGREDIENTS 3⅓ c. cake flour (not self-rising) ¾ c. (12 Tbl.) unsalted butter, room temperature 2¼ c. sugar 3 large eggs, room temperature 2 Tbl. liquid red food coloring 3 Tbl. unsweetened cocoa powder 1½ tsp. vanilla extract 1½ tsp. salt 1½ c. buttermilk 1½ tsp. white vinegar 1½ tsp. baking soda Cream Cheese Frosting: 1 c. (2 sticks) unsalted butter, room temperature 1 (8 oz.) pkg. cream cheese, room temperature ¼ tsp. salt 2 tsp. vanilla extract 4½ c. powdered sugar 1 Tbl. milk, plus more if needed INSTRUCTIONS Preheat oven to 350 degrees. Line a muffin tin with paper liners and spray with cooking spray; set aside. In the bowl of a stand mixer, combine butter and sugar and mix on medium speed until very light and fluffy, about 5 minutes. Add the eggs, one at a time, beating well after each addition. In a small bowl, whisk the food coloring, cocoa powder and vanilla together; add to the butter/sugar mixture and mix well. Stir the salt into the buttermilk and add to the batter in three parts, alternating with the cake flour, starting and ending with flour. In a small bowl, stir together the vinegar and baking soda; add to the batter and mix well. Fill cupcake liners ⅔ full with batter and bake in preheated oven for minutes or until a toothpick inserted in the center comes out clean. Do not over bake. Repeat with remaining cupcakes. Cool completely and top with Cream Cheese Frosting. Makes about 3½ dozen cupcakes. For the Cream Cheese Frosting: In a large bowl, mix together butter, cream cheese, salt and vanilla until smooth. Add powdered sugar, one cup at a time, beating well after each addition. If frosting is too thick, add a little milk. If you are planning to pipe the frosting onto the cupcakes, you want it thick enough to hold its shape. This makes enough frosting to pipe a big swirl on the top of each cupcake. I used a Wilton 1M piping tip. *If you do not want to pipe a large swirl onto each cupcake and prefer to spread a small layer on top instead, the amount of frosting can be cut in half. With the recipe and instructions, we should be able to reproduce the cupcakes pictured in the cookbook. However, we may find it’s not so easy to do so because there may be factors in play that go beyond what’s described in the recipe instructions (i.e. gas vs. electric oven, two racks of cupcakes cooking vs. a single rack, brand of flour or cocoa powder, etc.)

6 What are methods?

7 Methods reproducibility differ across disciplines
Biomedical sciences: detailed study protocol, measurement procedures, data, descriptive metadata, analysis software and code, final analytical results Laboratory science: how key reagents and biological materials were created or obtained (level of detail is key and contested) Social sciences: exact survey questions, sampling plans, computer programs that ran experiments, scripts for confederates, etc. EXPAND: Examples of what might be considered methods in different discipline: protocol, reagents, recipe, etc. “In the biomedical sciences, this means, at minimum, a detailed study protocol, a description of measurement procedures, the data gathered, the data used for analysis with descriptive metadata, the analysis software and code, and the final analytical results. In laboratory science, how key reagents and biological materials were created or obtained can be critical. In theory, these requirements are clear, but in practice, the level of procedural detail needed to describe a study as “methodologically reproducible” does not have consensus. For example, the detection of batch effects, which have been responsible for a number of high-visibility claims and retractions, can require information on exactly which samples were tested on which machine in what order and on what day, together with calibration data. This level of detail is typically not provided in publications and is not always retained by the investigator. In the clinical sciences, the definition of which data need to be examined to ensure reproducibility can be contentious. The relevant data could be anywhere along the continuum from the initial raw measurement (such as a pathology slide or image), to the interpretation of those data (the pathologic diagnosis), to the coded data in the computer analytic file. Many judgments and choices are made along this path and in the processes of data cleaning and transformation that can be critical in determining analytical results. Last, even if there is consensus on the appropriate analytical data set, methodologic reproducibility requires an understanding of which and how many analyses were performed and how the particular analyses reported in a published paper were chosen. So, whether a particular study is to be considered methodologically reproducible is contingent on whether there is general agreement about the level of detail needed in the description of the measurement process, the degree of processing of the raw data, and the completeness of the analytic reporting.” From Goodman et al, “ What does research reproducibility mean?” pdf

8 Resources exist to enable researchers to share study protocols, but the level of detail provided is key to reproducibility / replicability

9 Example Project: Methods Section from Paper
Plaks, Stroessner, Dweck, & Sherman (2001) Person theories and attention allocation: Preferences for Stereotypic versus counterstereotypic information

10 Activity Recreate Methods of Example
What were all survey items used from the Implicit Person Theory Measures in Experiment 1? What was the order of the items? What was the response scale? Be prepared to report back on your efforts, including any challenges you might have encountered Introduce study, distribute copies (or have them download?) Individual exercise LARGE GROUP DISCUSSION: How did it go? Were you successful? What were some issues? How confident do you feel that your project’s methods are identical to those of the original researchers?

11 Barriers to methods reproducibility
Lack of documentation Incomplete surveys Don’t know all the questions asked Don’t have the exact wording of questions Don’t know the scales used Unknown study population Lack of detail about how it was administered What else? Here are some of the known issues with the study example. Walk through these, and invite participants to share more. What does this all mean for researchers?

12 Why don’t people share their complete methods?

13 3 data sets, many potential projects
Set 1: American National Election Survey Set 2: Behavioral Risk Factor Surveillance System Survey Set 3: General Social Survey Explain why these three data sets were chosen and briefly describe each. 1 “To serve the research needs of social scientists, teachers, students, policy makers and journalists, the ANES produces high quality data from its own surveys on voting, public opinion, and political participation. Central to this mission is the active involvement of the ANES research community in all phases of the project.” 2“The Behavioral Risk Factor Surveillance System (BRFSS) is the nation's premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Established in 1984 with 15 states, BRFSS now collects data in all 50 states as well as the District of Columbia and three U.S. territories. BRFSS completes more than 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. By collecting behavioral health risk data at the state and local level, BRFSS has become a powerful tool for targeting and building health promotion activities.” 3“The GSS gathers data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes. Hundreds of trends have been tracked since In addition, since the GSS adopted questions from earlier surveys, trends can be followed for up to 70 years. The GSS contains a standard core of demographic, behavioral, and attitudinal questions, plus topics of special interest. Among the topics covered are civil liberties, crime and violence, intergroup tolerance, morality, national spending priorities, psychological well-being, social mobility, and stress and traumatic events.”

14 What are we doing with these example materials?
3 separate data sets Download the files associated with your assigned data set from You will be developing a demonstration project with this data “you” collected, working through decisions about how to organize, document, and share your work Have groups count off to assign data sets

15 Activity Review Questionnaire Files Develop Research Question
Create one new OSF project per group and add your partner(s)

16 Open Science Framework
Have people create projects based on their example data set. Demonstrate wiki and how to edit it. Break up into groups, use one of the three example data sets to review questionnaires and make research question Free, open source scientific commons

17 Activity Update wiki with your research question
Upload files to OSF storage

18 How to organize a project on the OSF

19 Activity Add structure to your OSF project based on your files
Each project example will have a questionnaire, data file, and code book

20 It takes some effort to organize your research to be reproducible…the principal beneficiary is generally the author herself. – Schwab & Claerbout “In a recent survey of 704 principal investigators for National Science Foundation biology grants, the majority said their most important unmet data needs were not software or infrastructure, but training in data integration and data management.[1]” “Most researchers will need to interact with large datasets at some point in their careers. When they do, many realize they’re unprepared for the challenge. Being unfamiliar with computational tools and workflows, they may find themselves carrying out repetitive and error-prone tasks by hand. If they write in-house scripts for cleaning or analyzing their data, they may fail to document their code in a way that allows it to be checked and used by other researchers. If using code written by others, they may not properly test its utility for their dataset. They may fail to document the parameters they select and the software version they use, information that is important for other researchers seeking to replicate their results.”

21 We checklists. Required by many publishers, such as Nature (as of 31 May 2017).

22 Standardizing documentation practices
Reminds you to record all the necessary information Makes it easier to search your materials and find what you need Helps your colleagues understand your workflow Aids collaboration What happens if you leave a lab? Can be personalized to fit into your workflow “Standardizing recorded information about your data helps you in several ways. First, it reminds you to record all of the necessary information about your data. Second, it helps you find datasets because it’s easier to search through organized information. Thirdly, standardization helps your colleagues understand your data, which is useful during collaboration and should you leave a laboratory. Finally, standardization can be personalized and doesn’t have to be rigid. Standardization should easily fit into your workflow and should be adaptable enough to respond to any changes in your research.”

23 How would a project look in your discipline?
What types of ‘methods’ files from your particular field do you think are important to document?

24 Results Reproducibility/Replicability
We use your exact methods and analyses, but collect new data, and we get the same statistical conclusion

25 Reproducibility vs. replicability
A statistical definition for reproducibility and replicability Prasad Patil, Roger D. Peng, Jeffrey Leek doi:

26 Did we reproduce the results?

27 Barriers to results reproducibility
Theoretical, methodological, and statistical barriers Some include: Lack of methodological detail Biased literature; the file drawer problem

28 Positive results by discipline
Fanelli D (2010) “Positive” Results Increase Down the Hierarchy of the Sciences. PLOS ONE 5(4): e doi: /journal.pone

29 Barriers to results reproducibility
Theoretical, methodological, and statistical barriers Some include: Lack of methodological detail Biased literature; the file drawer problem Researcher Degrees of Freedom

30 Researcher degrees of freedom
All data processing and analytical choices made after seeing and interacting with your data Should I collect more data? Which observations should I exclude? Which conditions should I compare? What should be my main DV? Should I look for an interaction effect?

31 False positive inflation
Simmons, Nelson, & Simonsohn (2011)

32 Solutions to barriers to results reproducibility / replicability?

33 Preregistration Documenting your research plan in a read-only public repository before you conduct the study Practice originated in clinical research and is now expanding more broadly Helps decrease the file drawer

34 Preregistration Benefits of preregistering your study depend on how much information you include. At a minimum, a preregistration should include the “what” of the study: Research question Population and sample size General design Variables you’ll be collecting, or dataset you’ll be using

35 Pre-analysis plan Details the analyses planned for hypothesis testing
Sample size Data processing and cleaning procedures Exclusion criteria Statistical analyses Including a pre-analysis plan in your preregistration helps decrease researcher degrees of freedom

36 in a conventional way after that.”
“The first principle is that you must not fool yourself, and you are the easiest person to fool. After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that.” - Richard P. Feynman Cargo Cult Science, 1974

37 Why preregister? Preregistration helps reduce the “file drawer effect” by increasing discoverability of unpublished studies Preregistered analysis plans help improve study accuracy and replicability by guarding against unintended false positive inflation.

38 Exploratory vs. confirmatory analyses
Interested in exploring possible patterns/relationships in data to develop hypotheses Confirmatory Have a specific hypothesis you want to test

39 Exploratory vs. confirmatory analyses
Interested in exploring possible patterns/relationships in data to develop hypotheses Confirmatory Have a specific hypothesis you want to test Pre-registration of analyses clarifies which are exploratory and which are confirmatory

40 How to preregister on the OSF?

41 Activity Practice filling out a preregistration using the AsPredicted template Save as a “draft registration”

42 Discussion How does preregistration apply to your particular research?
Do you see any barriers to applying preregistration or pre-analysis plans to your work?

43 Activity Complete analyses specified in pre-analysis plan

44 Pluses and wishes


Download ppt "Methods Reproducibility & Results Reproducibility"

Similar presentations


Ads by Google