Download presentation
Presentation is loading. Please wait.
Published byThomas Carroll Modified over 8 years ago
1
May 2016 © 2016 IEEE Importance and Challenges of Reproducible Research Vladimir Kanchev vladimir.kanchev@ieee.org
2
* * http://www.software.ac.uk/blog/2014-03-21-reproducible- research-impossible-dream Slide 2
3
3 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion
4
Slide 4 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion
5
Slide 5 Personal Introduction Defense of my Ph.D. thesis at TU-Sofia is pending Research in image/MR image segmentation Publications in peer-reviewed journals Some experience in industry
6
Slide 6 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion
7
Slide 7 Introduction to Reproducible Research Definitions Reproducible Research (RR) is an approach aiming at complementing classical printed scientific articles with everything required to independently reproduce the results they present *. "Everything" covers: data computer codes a precise description of how the code was applied to the data * Delescluse, Matthieu, et al. "Making neurophysiological data analysis reproducible: Why and how?" Journal of Physiology- Paris 106.3 (2012):159-170.
8
Introduction to Reproducible Research Definitions Another definition (Signal Processing) : An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures*. D. Donoho * D. Donoho et al., “Reproducible Research in Computational Harmonic Analysis,” Computing in Science & Eng., vol. 11, no. 1, 2009, pp. 8–18 Slide 6
9
Slide 9 Introduction to Reproducible Research Definitions Replication – independent people going out and collecting new data to verify research* (Roger Peng). It is considered the scientific golden standard. Reproduction – independent people analyze the same data and produce the same result*. Focus on validity of data analysis. (Roger Peng) * http://simplystatistics.org/2011/12/02/reproducible-research-in- computational-science/
10
Introduction to Reproducible Research Definitions * * Peng, R. D. (2011). Reproducible research in computational science. Science (New York, Ny), 334(6060), 1226. Slide 8
11
Slide 11 Introduction to Reproducible Research History The RR “movement" started with what economists have been calling replication since the early 1980s to reach what is now called reproducible research in computational data analysis. Currently, it is influenced by the open science and open source movement.
12
Slide 12 Introduction to Reproducible Research Relation to scientific method Steps of a scientific method *: 1.Define a question 2.Observe – gather information and resources 3.Form an explanatory hypothesis 4.Test the hypothesis by performing an experiment and collecting data in a reproducible manner 5.Analyze the data 6.Interpret the data and draw a conclusion 7.Publish results 8.Retest (reproduce) from other researchers * Crawford S, Stucki L (1990), "Peer review and the changing research record", "J Am Soc Info Science", vol. 41, pp. 223–228 The steps related to the Reproducible Research are in italic type
13
* https://scischol102.wordpress.com/category/science / * * Slide 11
14
Slide 14 Introduction to Reproducible Research Relation to scientific method Principles of a scientific method: 1.Empirically testable 2.Replicable 3.Objective 4.Transparent 5.Falsifiable 6.Logically consistent
15
Slide 15 Introduction to Reproducible Research Scheme * * http://www.biostat.jhsph.edu/~rpeng/research.html (mod.)
16
Slide 16 Introduction to Reproducible Research Current situation Current situation with RR in different fields: Medicine (cancer research), social sciences (psychology), etc. Replication/Reproducibility crisis – the results of scientific experiments are impossible to replicate Natural sciences Computer science
17
* Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454. * Slide 15
18
Slide 18 Introduction to Reproducible Research Current situation Reproducibility in Medical imaging & Computer vision & Machine learning: Public test sets available Most method codes are available (papers from major conferences and journals) High pressure/workload on researchers to make their work reproducible
19
Slide 19 Introduction to Reproducible Research Current situation Reproducibility in Medical imaging & Computer vision & Machine learning (cont.): Benchmark comparison with other methods - compulsory Experiment automation Differences between Medical imaging vs. Computer vision & Machine learning fields Example: IPOL journal
20
Slide 20 Introduction to Reproducible Research Reasons Reasons for reproducibility/replication crisis: “ Publish or perish” culture - pressure to obtain publishable results Uneasiness to make method codes public – additional time and efforts to improve its quality Most graduate non-CS students are not taught in software engineering and statistics courses
21
* * Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454. Slide 21
22
Slide 22 Other problems: Insufficient description of the experiment in the publications Test datasets and paper method codes not publicly available – common in social sciences The used mathematical methods are inclined to malpractices – p hacking (data dredging), failing to report non-significant tests, inclusion/exclusion of points/results until achieving the desired result Introduction to Reproducible Research Reasons
23
Slide 23 Introduction to Reproducible Research Reasons Problems with method code: Reproducibility issues – missing method data and code, method code errors, not all figures and tables are reproduced Documentation issues – missing README file, bad code documentation Programming style issues – bad coding style
24
* * Wolkovich, E. M., Regetz, J., & O'Connor, M. I. (2012). Advances in global change research require open science by individual researchers. Global Change Biology, 18(7), 2102-2110. Slide 24
25
Introduction of Reproducible Research Guidance (Biostatistics journal) Authors should provide all data code in order to reproduce all results, images and tables with: README file Consistent coding style and documentation Test data sets Simulations and random numbers General advice * Peng, R. D. (2009). Reproducible research and biostatistics. Biostatistics,10(3), 405-408. Slide 25
26
Slide 26 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion
27
Slide 27 Software tools Recommended programs to use to achieve reproducibility: Latex (Tex editor) Version control systems - Git software systems Make – pipeline Literate programming concept (Knuth).
28
Slide 28 Software tools Matlab programming language: Matlab file exchange Proprietary Matlab toolboxes - disadvantages Examples of RR toolboxes - Wavelab, Sparselab Matlab publish – no literate programming support
29
Slide 29 Software tools R programming language : R studio – development environment for R programming language Graphic packages, such as ggplot2 Packages as knitr or rmarkdown – literate programming support
30
Slide 30 Software tools Python programming language: Many open scientific libraries available – scipy, numpy, etc. IPython notebook Sumatra package – save parameter values, code state, output results and files
31
* ISMB/ECCB 2013 Keynote * Slide 31
32
Slide 32 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion
33
Slide 33 The context – personal experience Making a current research project reproducible at the end of the process is not the best way …. * http://www.idiap.ch/~marcel/professional/BTAS_SS_2015.html *
34
The context – personal experience Difficulties with: Exact reproduction of all figures and results Exact parameter values setting Time to improve code quality and add documentation Slide 34
35
Slide 35 The context – personal experience Motivation for achieving reproducibility: Better visibility of research More citations and higher impact Increased trust in research quality (outside academia, e.g. from industry) Help from readers of the publication with the improvement of the developed method
36
Slide 36 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion
37
Slide 37 The situation in Bulgaria and abroad RR in Bulgaria: Its introduction in the scientific community is still at the beginning Its principles need to be taught at under- graduate and graduate level Paper code and test datasets, in general, are not available online in most fields
38
Slide 38 The situation in Bulgaria and abroad Advances of RR implementation would: Increase the impact of research conducted by Bulgarian researchers abroad Improve reputation and applicability – especially to people from industry Faster distinction of quality work and steady improvement of lower quality papers
39
Slide 39 The situation in Bulgaria and abroad Advances of RR implementation (cont.): Profit from the fast development of scientific computing, machine learning, data science, and AI Attract more bright young people in research (open source movement and open data)
40
Slide 40 The situation in Bulgaria and abroad RR abroad: A great issue in social and biomedical sciences An important criterion for manuscript evaluation from reviewers in many CS fields One of major requirements of funding agencies abroad for the evaluation of project proposals
41
Slide 41 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion
42
Slide 42 Additional resources for research and RR methods MOOC courses: 1.Data science specialization (www.coursera.org) (John Hopkins University) – course 5 Reproducible research 2.Methods and Statistics in Social Sciences Specialization (www.coursera.org) (University of Amsterdam) 3.Research Methods: An Engineering Approach (www.edx.org) (Wits University ) 4.Research Data Management and Sharing (www.coursera.org) (The University of North Carolina at Chapel Hill & The University of Edinburgh)
43
Slide 43 Additional resources for research and RR methods Software tools for RR: 1.Software carpentry (www.Software-carpentry.org) – basic computing skills for researchers 2.Bootcamps - one or two day long courses – teaching coding and professional skills for researchers. 3.MOOC courses - www.coursera.org, www.edx.org, www.udacity.org - for programming skills in R, Python, Matlab.
44
Slide 44 Additional resources for research and RR methods Books: 1.Stodden, V., Leisch, F., & Peng, R. D. (Eds.) (2014). Implementing Reproducible Research. CRC Press 2.Gandrud, C. (2013). Reproducible Research with R and R Studio. CRC Press 3.Subramanian, G. (2015). Python Data Science Cookbook. Packt Publishing Ltd 4.Milovanovic, I., Foures, D., & Vettigli, G. (2015). Python Data Visualization Cookbook. Packt Publishing Ltd
45
Slide 45 Agenda 1.Personal introduction 2.Introduction to Reproducible Research (RR) 3.Software tools 4.The context – personal experience 5.The situation in Bulgaria and abroad 6.Additional resources for RR 7.Discussion
46
Slide 46 Discussion Topics for discussion: What do you think about reproducibility, in general? Have you already met RR in your work? How the application of reproducibility might impact your work as researchers, engineers, or programmers?
47
Slide 47 End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.