May 2016 © 2016 IEEE Importance and Challenges of Reproducible Research Vladimir Kanchev

Slides:



Advertisements
Similar presentations
BPK 304W Inquiry & Measurement in Kinesiology Dr. Richard Ward Summer 2014.
Advertisements

Integrating Writing in the Statistics Curriculum 1 Dean Poeth and Jane Oppenlander Union Graduate College eCOTS, May 19-23, 2014.
Scientific Research Dr. Noura Al-dayan.
Reproducible Research Sergey Fomel The University of Texas at Austin.
Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and.
Discrete-Event Simulation: A First Course Steve Park and Larry Leemis College of William and Mary.
CSCD 555 Research Methods for Computer Science
Biostatistics Frank H. Osborne, Ph. D. Professor.
Thinking Processes By Marvi Matos. College of Engineering, UPR BS, Chem E My background.
Reading the Literature
Experiences with Reproducible Research in Various Facets of Signal Processing Research Patrick Vandewalle Philips Research, The Netherlands November 12,
Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and.
Introduction to Communication Research
Chapter One: The Science of Psychology
Using Science Notebooks as tools for teaching. Why Science notebooks? ► Important tools for practicing scientists ► Provide practice writing  Everyday.
Publishing your paper. Learning About You What journals do you have access to? Which do you read regularly? Which journals do you aspire to publish in.
Biology Chapter 1 The Science of Biology
Section 2: Science as a Process
Research Writing and Scientific Literature
Communication of Scientific Results in Advanced Labs at San Francisco State University ALTC – July 2009 James Lockhart Physics & Astronomy Dept. San Francisco.
The Research on Credibility of Knowledge Management System Wang FanLin Department of Accounting Capital University of Economic Business Beijing, China.
CS507 Fundamentals of Research Fall About the Course - Topics Graduate School How to read a research paper Planning and conducting research Writing.
Chapter One: The Science of Psychology. Ways to Acquire Knowledge Tenacity Tenacity Refers to the continued presentation of a particular bit of information.
1.3: Scientific Thinking & Processes Key concept: Science is a way of thinking, questioning, and gathering evidence.
Panel Discussion Part I Methodology Ideas from adult MR brain segmentation are used in neonatal MR brain segmentation. However, additional challenges.
Main issues Effect-size ratio Development of protocols and improvement of designs Research workforce and stakeholders Reproducibility practices and reward.
Introduction to Earth Science Section 1- What is Earth Science Section 2- Science as a Process.
Skills Building Workshop: PUBLISH OR PERISH. Journal of the International AIDS Society Workshop Outline Journal of the International.
THOMSON SCIENTIFIC Patricia Brennan Thomson Scientific January 10, 2008.
1 Science as a Process Chapter 1 Section 2. 2 Objectives  Explain how science is different from other forms of human endeavor.  Identify the steps that.
Publishing and Sharing Sherif Farag University of North Carolina at Chapel Hill, USA.
Interactive Science Publishing: A Joint OSA-NLM Project Michael J. Ackerman National Library of Medicine.
Science Fair How To Get Started… (
Planning an Applied Research Project Chapter 3 – Conducting a Literature Review © 2014 by John Wiley & Sons, Inc. All rights reserved.
Unit 1 Lesson 3 Scientific Investigations Copyright © Houghton Mifflin Harcourt Publishing Company.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Project Thesis 2006 Adapted from Flor Siperstein Lecture 2004 Class CLASS Project Thesis (Fundamental Research Tools)
Examples for Open Access Scholar Electronic Repository by New Bulgarian University IP LibCMASS Sofia 2011 Contract № 2011-ERA-IP-7 Sofia, September,
Which Journal to Publish in and How Barbara Gastel, MD, MPH Professor, Texas A&M University Knowledge Community Editor, AuthorAID.
Dataset citation Clickable link to Dataset in the archive Sarah Callaghan (NCAS-BADC) and the NERC Data Citation and Publication team
DESIGNING AN ARTICLE Effective Writing 3. Objectives Raising awareness of the format, requirements and features of scientific articles Sharing information.
The Process of Conducting Research. What is a theory? a set of general principles that explains the how and why of phenomena. Theories are not directly.
The Psychologist as Detective, 4e by Smith/Davis © 2007 Pearson Education Chapter One: The Science of Psychology.
EARTH & SPACE SCIENCE Chapter 1 Introduction to Earth Science 1.2 Science as a Process.
WHAT IS RESEARCH? According to Redman and Morry,
Introduction to Earth Science Section 1 SECTION 1: WHAT IS EARTH SCIENCE? Preview  Key Ideas Key Ideas  The Scientific Study of Earth The Scientific.
© 2001 Laura Snodgrass, Ph.D.1 Experimental Psychology Introduction.
Unix tools Regular expressions grep sed AWK. Regular expressions Sequence of characters that define a search pattern banana matches the text banana
Interactive Science Publishing: A Joint OSA-NLM Project Michael J. Ackerman National Library of Medicine John Childs Optical Society of America.
Smith/Davis (c) 2005 Prentice Hall Chapter One The Science of Psychology PowerPoint Presentation created by Dr. Susan R. Burns Morningside College.
The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for the Center for Open Science June 17, 2015 April.
Unit 1 Lesson 3 Scientific Investigations
Technical Communication: Concepts and Features
BPK 304W Scientific Method
The Scientific Inquiry Process ♫A Way to Solve a Problem♫
Methods of Science Chapter 1 Section 3.
Experimental Psychology
Section 2: Science as a Process
Section 3: Methods of Science
The Scientific Inquiry Process ♫A Way to Solve a Problem♫
Welcome.
Methods of Science Chapter 1 Section 3.
THE SCIENTIFIC METHOD.
Class Project Guidelines
Reproducible Research
Using Science Notebooks as tools for teaching
Research Software Group
Research in Medical Education
Presentation transcript:

May 2016 © 2016 IEEE Importance and Challenges of Reproducible Research Vladimir Kanchev

* * research-impossible-dream Slide 2

3 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion

Slide 4 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion

Slide 5 Personal Introduction Defense of my Ph.D. thesis at TU-Sofia is pending Research in image/MR image segmentation Publications in peer-reviewed journals Some experience in industry

Slide 6 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion

Slide 7 Introduction to Reproducible Research Definitions Reproducible Research (RR) is an approach aiming at complementing classical printed scientific articles with everything required to independently reproduce the results they present *. "Everything" covers: data computer codes a precise description of how the code was applied to the data * Delescluse, Matthieu, et al. "Making neurophysiological data analysis reproducible: Why and how?" Journal of Physiology- Paris (2012):

Introduction to Reproducible Research Definitions Another definition (Signal Processing) : An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures*. D. Donoho * D. Donoho et al., “Reproducible Research in Computational Harmonic Analysis,” Computing in Science & Eng., vol. 11, no. 1, 2009, pp. 8–18 Slide 6

Slide 9 Introduction to Reproducible Research Definitions Replication – independent people going out and collecting new data to verify research* (Roger Peng). It is considered the scientific golden standard. Reproduction – independent people analyze the same data and produce the same result*. Focus on validity of data analysis. (Roger Peng) * computational-science/

Introduction to Reproducible Research Definitions * * Peng, R. D. (2011). Reproducible research in computational science. Science (New York, Ny), 334(6060), Slide 8

Slide 11 Introduction to Reproducible Research History The RR “movement" started with what economists have been calling replication since the early 1980s to reach what is now called reproducible research in computational data analysis. Currently, it is influenced by the open science and open source movement.

Slide 12 Introduction to Reproducible Research Relation to scientific method Steps of a scientific method *: 1.Define a question 2.Observe – gather information and resources 3.Form an explanatory hypothesis 4.Test the hypothesis by performing an experiment and collecting data in a reproducible manner 5.Analyze the data 6.Interpret the data and draw a conclusion 7.Publish results 8.Retest (reproduce) from other researchers * Crawford S, Stucki L (1990), "Peer review and the changing research record", "J Am Soc Info Science", vol. 41, pp. 223–228 The steps related to the Reproducible Research are in italic type

* / * * Slide 11

Slide 14 Introduction to Reproducible Research Relation to scientific method Principles of a scientific method: 1.Empirically testable 2.Replicable 3.Objective 4.Transparent 5.Falsifiable 6.Logically consistent

Slide 15 Introduction to Reproducible Research Scheme * * (mod.)

Slide 16 Introduction to Reproducible Research Current situation Current situation with RR in different fields: Medicine (cancer research), social sciences (psychology), etc. Replication/Reproducibility crisis – the results of scientific experiments are impossible to replicate Natural sciences Computer science

* Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), * Slide 15

Slide 18 Introduction to Reproducible Research Current situation Reproducibility in Medical imaging & Computer vision & Machine learning: Public test sets available Most method codes are available (papers from major conferences and journals) High pressure/workload on researchers to make their work reproducible

Slide 19 Introduction to Reproducible Research Current situation Reproducibility in Medical imaging & Computer vision & Machine learning (cont.): Benchmark comparison with other methods - compulsory Experiment automation Differences between Medical imaging vs. Computer vision & Machine learning fields Example: IPOL journal

Slide 20 Introduction to Reproducible Research Reasons Reasons for reproducibility/replication crisis: “ Publish or perish” culture - pressure to obtain publishable results Uneasiness to make method codes public – additional time and efforts to improve its quality Most graduate non-CS students are not taught in software engineering and statistics courses

* * Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), Slide 21

Slide 22 Other problems: Insufficient description of the experiment in the publications Test datasets and paper method codes not publicly available – common in social sciences The used mathematical methods are inclined to malpractices – p hacking (data dredging), failing to report non-significant tests, inclusion/exclusion of points/results until achieving the desired result Introduction to Reproducible Research Reasons

Slide 23 Introduction to Reproducible Research Reasons Problems with method code: Reproducibility issues – missing method data and code, method code errors, not all figures and tables are reproduced Documentation issues – missing README file, bad code documentation Programming style issues – bad coding style

* * Wolkovich, E. M., Regetz, J., & O'Connor, M. I. (2012). Advances in global change research require open science by individual researchers. Global Change Biology, 18(7), Slide 24

Introduction of Reproducible Research Guidance (Biostatistics journal) Authors should provide all data code in order to reproduce all results, images and tables with: README file Consistent coding style and documentation Test data sets Simulations and random numbers General advice * Peng, R. D. (2009). Reproducible research and biostatistics. Biostatistics,10(3), Slide 25

Slide 26 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion

Slide 27 Software tools Recommended programs to use to achieve reproducibility: Latex (Tex editor) Version control systems - Git software systems Make – pipeline Literate programming concept (Knuth).

Slide 28 Software tools Matlab programming language: Matlab file exchange Proprietary Matlab toolboxes - disadvantages Examples of RR toolboxes - Wavelab, Sparselab Matlab publish – no literate programming support

Slide 29 Software tools R programming language : R studio – development environment for R programming language Graphic packages, such as ggplot2 Packages as knitr or rmarkdown – literate programming support

Slide 30 Software tools Python programming language: Many open scientific libraries available – scipy, numpy, etc. IPython notebook Sumatra package – save parameter values, code state, output results and files

* ISMB/ECCB 2013 Keynote * Slide 31

Slide 32 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion

Slide 33 The context – personal experience Making a current research project reproducible at the end of the process is not the best way …. * *

The context – personal experience Difficulties with: Exact reproduction of all figures and results Exact parameter values setting Time to improve code quality and add documentation Slide 34

Slide 35 The context – personal experience Motivation for achieving reproducibility: Better visibility of research More citations and higher impact Increased trust in research quality (outside academia, e.g. from industry) Help from readers of the publication with the improvement of the developed method

Slide 36 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion

Slide 37 The situation in Bulgaria and abroad RR in Bulgaria: Its introduction in the scientific community is still at the beginning Its principles need to be taught at under- graduate and graduate level Paper code and test datasets, in general, are not available online in most fields

Slide 38 The situation in Bulgaria and abroad Advances of RR implementation would: Increase the impact of research conducted by Bulgarian researchers abroad Improve reputation and applicability – especially to people from industry Faster distinction of quality work and steady improvement of lower quality papers

Slide 39 The situation in Bulgaria and abroad Advances of RR implementation (cont.): Profit from the fast development of scientific computing, machine learning, data science, and AI Attract more bright young people in research (open source movement and open data)

Slide 40 The situation in Bulgaria and abroad RR abroad: A great issue in social and biomedical sciences An important criterion for manuscript evaluation from reviewers in many CS fields One of major requirements of funding agencies abroad for the evaluation of project proposals

Slide 41 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion

Slide 42 Additional resources for research and RR methods MOOC courses: 1.Data science specialization ( (John Hopkins University) – course 5 Reproducible research 2.Methods and Statistics in Social Sciences Specialization ( (University of Amsterdam) 3.Research Methods: An Engineering Approach ( (Wits University ) 4.Research Data Management and Sharing ( (The University of North Carolina at Chapel Hill & The University of Edinburgh)

Slide 43 Additional resources for research and RR methods Software tools for RR: 1.Software carpentry ( – basic computing skills for researchers 2.Bootcamps - one or two day long courses – teaching coding and professional skills for researchers. 3.MOOC courses for programming skills in R, Python, Matlab.

Slide 44 Additional resources for research and RR methods Books: 1.Stodden, V., Leisch, F., & Peng, R. D. (Eds.) (2014). Implementing Reproducible Research. CRC Press 2.Gandrud, C. (2013). Reproducible Research with R and R Studio. CRC Press 3.Subramanian, G. (2015). Python Data Science Cookbook. Packt Publishing Ltd 4.Milovanovic, I., Foures, D., & Vettigli, G. (2015). Python Data Visualization Cookbook. Packt Publishing Ltd

Slide 45 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion

Slide 46 Discussion Topics for discussion: What do you think about reproducibility, in general? Have you already met RR in your work? How the application of reproducibility might impact your work as researchers, engineers, or programmers?

Slide 47 End