Download presentation
Presentation is loading. Please wait.
Published byJulian Ball Modified over 8 years ago
1
Transforming Science Through Data-driven Discovery Genomics in Education University of Delaware – February 2016 Jason Williams, Education, Outreach, Training Lead Joslynn Lee, Data Science Educator Cold Spring Harbor Laboratory williams@cshl.edu @JasonWilliamsNY
2
CyVerse Evolution iPlant 2008 Empowering a New Plant Biology iPlant 2013 Cyberinfrastructure for Life Science CyVerse 2016 Transforming Science Through Data-Driven Discovery
3
We are funded by the National Science Foundation We are your colleagues and collaborators! $100 Million in investment Freely available to the community Spur national/international collaboration Cite CyVerse: CyVerse.org/acknowledge-cite-cyverse DBI-0735191 and DBI-1265383 CyVerse Evolution
4
CyVerse 2016 Transforming Science Through Data-Driven Discovery Vision: Transforming science through data-driven discovery Mission: Design, develop, deploy, and expand a national cyberinfrastructure for life science research, and train scientists in its use More than 30K users, PB of data, and hundreds of publications, courses, and discoveries
5
What is Cyberinfrastructure? Data storage Software High-performance computing People organized into systems that solve problems of size and scope that would not otherwise be solvable.
6
What is Cyberinfrastructure? Platforms, tools, datasets Storage and compute Training and support
7
CyVerse supports all domains of life science Plant / Microbial Animal Biomedical Ecological/Climate CyVerse is built for Data
8
CyVerse product stack Ready to use Platforms Foundational Capabilities Established CI Components Extensible Services Ease of Use Flexibility
9
Genomics in Education
10
Big data biology – Education and Research Image Credits: Genome sequencing costs: http://www.genome.gov/images/content/costpergenome2015_4.jpg Oxford nanopore sequencer: https://www.nanoporetech.com/ Agricultural drone: http://purdue.imodules.com/s/1461/images/gid1001/editor/alumnus/2014_mar/drones_main.jpg Fitbit: http://www.fitbit.com/force 100K fold costs decrease in sequencing Hand-held sequencers Drones Biological sensors Biology is swimming in data
11
Big data biology – Too fast to keep up? “Essentially, all models are wrong, but some are useful” – George E.P. Box
12
Big data biology – Too fast to keep up?
13
1866 – Mendel publishes work on inheritance 1869 – DNA discovered 1915 – Hunt Morgan describes linkage and recombination 1953 – Structure of DNA described 1956 – Human chromosome number determined 1968 – First gene mapped to autosome 1977 – Dideoxy sequencing 1983 – PCR 1986 – Human Genome Project proposed
14
Big data biology – Too fast to keep up? 1993 – First MicroRNAs described 2003 – First ‘Gold Standard’ human genome sequence 2005 – First draft of human haplotype map (HapMap) 2007 – ENCODE project
15
Big data biology – Too fast to keep up?
16
Challenge – bringing students into the fold How do scientists share their data and make it publically available? How do scientists extract maximum value from the datasets they generate? How can students and educators (who will need to come to grips with data-intensive biology) be brought into the fold? ResearchEducation Students can work with the same data at the same time and with the same tools as research scientists.
17
Can you navigate the tools? What are your challenges in teaching bioinformatics in the classroom?
18
Take the Subway
19
DNA Subway Classroom friendly bioinformatics Faculty identified guiding requirements that shaped the development of CyVerse educational platforms: Mix lecture and lab – have a wet bench “hook” Student-scientist partnerships – someone has to care about the data Co-investigation – projects should potentially lead to publications Scale – platforms should support projects multiple classrooms can join.
20
DNA Subway Classroom friendly bioinformatics More than 13,000 users More than 28,000 student projects in 2015
21
DNA Subway Red Line: Genome annotation Red Line Analyze up to 150 KB of DNA sequence De novo gene prediction Construct evidence-based gene models Visualize genome sequence in browser
22
DNA Subway Yellow Line: Genome prospecting Yellow Line Analyze DNA or protein sequence Search plant genomes using TARGeT Explore gene duplications, transposons, and non-coding sequences not detectable in conventional BLAST searches
23
DNA Subway Blue Line: DNA barcoding, and phylogenetics Analyze DNA or protein sequence Search plant genomes using TARGeT Explore gene duplications, transposons, and non-coding sequences not detectable in conventional BLAST searches Blue Line
24
DNA Subway Green Line: Transcriptome analysis Green Line Examine RNA-Seq data for differential expression Use High-performance computing to analyze complete datasets Generate lists of genes and fold-changes; add results to Red Line projects
25
Transforming Science Through Data-driven Discovery Parker Antin Nirav Merchant Eric Lyons Matt Vaughn Doreen Ware Dave Micklos CyVerse is supported by the National Science Foundation under Grant No. DBI-0735191 and DBI-1265383. CyVerse Executive Team
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.