Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.

Similar presentations


Presentation on theme: "Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays."— Presentation transcript:

1 Nature Reviews/2012

2 Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays – Protein-DNA binding – Histone modification – Transcript levels – Spatial interactions – Combination of applications into larger studies 1000 Genomes Project

3 Next-Generation Sequencing (NGS): Data Interpretation Meaningful interpretation of sequencing data is important Rely heavily on complex computation Major problems – Low adoption of existing practice – Difficulty of reproducibility

4 Problem1: Low Adoption of Existing Practices Example: Variant discovery A series of accepted and accessible practices from “1000 Genomes Projects” – 299 articles in 2011 cited this project – Only 10 studies used the recommended tools – Only 4 studies used the full workflow Not following tested practices undermines the quality of biomedical research Why low adoption? – Over complicated logistical challenges (e.g. resort input data) – Limited application of toolkit (e.g. handful of well- annotated genomes) – Little agreement on what is considered to be the “best practice”

5 Problem2 Difficulty of Reproducibility Example: Read mapping To repeat a mapping experiment: primary data, software and its version, parameter setting, name of reference genome – 19 studies cited “1000 genomes projects”, only 6 satisfy all details – 50 random selected papers using burrows-wheeler aligner, only 7 provides all details Most results in today’s publications cannot be accurately verified, reproduced, adopted or used Why difficult? – Lack of mechanism for documenting analytical steps

6 Solution: Democratization of Biomedical Computation To achieve democratization – Developing best practices – Removing obstacles associated with heterogeneous software – Facilitating the interactive exploration of analysis parameters – Promoting the concepts of analysis transparency and reproducibility

7 Potential of Integrative Frameworks Combinations of diverse tools under the umbrella of an unified interface – E.g. BioExtract, Galaxy, GenePattern, GeneProf Advantages 1.Making data analysis transparent and reproducible 2.Making use of high-performance computing infrastructure 3.Improving long-term archiving

8 1. Promoting Transparency and Reproducibility Automatic tracking, recording and disseminating all details of computational analyses – GenePattern: embed details into Microsoft Word documents while preparing publication – Galaxy: create interactive Web-based supplements with analysis details Allow readers to inspect the described analysis in details

9 2. Using High-performance Computing Infrastructure High-performance computing resources – Computing clusters at institutions or nationwide efforts, e.g. XSEDE – Private and public clouds Not accessible to the broad biomedical community – Virtual machines or application-programming interface With integrative frameworks, anyone can deploy an solution on any type of resource – E.g. CloudMan User interface for managing computing clusters on cloud resources

10 3. Improving Long-term Archiving General vulnerability of centralized resources: longevity of hosted analysis services – Depend on various external factors, e.g. funding climate With integrative frameworks – Create snapshots of a particular analysis – Compose virtual machine images from analysis to be stored as an archival resource E.g. Dryad system or Figshare – Export complete collection of analysis automatically for archival Anyone can recreate a new virtual instance with this archival – Improved reproducibility

11 Future Directions: Tools Distribution Current practice – Tools needs to be compiled, installed and supplied with associated data E.g. short-read mapper requires genome indices Better practice – Digital platforms providing a set of tools to be automatically installed into users’ integrative framework environment Pioneer work: e.g. Gparc, Galaxy Tool Shed – Allow sharing of analysis workflows, data sets, visualizations and any other analysis artifacts

12 Future Directions: Integrate Analysis and Visualization Current practice – Visualization is the last step of an analysis Better practice – Visualization as an active component during analysis Advantages – Users are able to directly sense how parameter changes affect the final result in real time – In the context of publication, it aids readers to evaluate and inspect the results

13 Conclusion To sustain the growing application of NGS, data interpretation must be as accessible as data generation Necessary to bridge the gap between experimentalists and computational scientists – For experimentalists, embrace unavoidable computational components – For computational scientists, ensure the software is appeal to be used Emergence of integrative frameworks – Tracking details precisely – Ensuring transparency and reproducibility


Download ppt "Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays."

Similar presentations


Ads by Google