iPlant Genomics in Education Workshop Genome Exploration in Your Classroom
iPlant Genomics in Education Workshop Major Workshop Concepts: Biology is becoming a “Data Unlimited” science. Genomes are dynamic. Genomes are more than just protein coding genes. DNA sequence is information. Gene annotation adds “meaning” to DNA sequence. Biological concepts like “genes” and “species” continually evolving. DNA barcoding bridges molecular genetics, evolution, ecology.
The Problem of Big Data in Biology The abundance of biological data generated by high- throughput sequencing creates challenges, as well as opportunities: How do scientists share their data and make it publically available? How do scientists extract maximum value from the datasets they generate? How can students and educators (who will need to come to grips with data-intensive biology) be brought into the fold?
The iPlant Collaborative
5-10 year project to develop a computer infrastructure to apply computational thinking to solve biological problems High performance computing Data and data analysis Virtual organization Learning and workforce The iPlant Collaborative
Bringing Genomics into the Classroom Visualization of the Pectobacterium atrosepticum genome
1866 – Mendel publishes work on inheritance 1869 – DNA discovered 1915 – Hunt Morgan describes linkage and recombination 1953 – Structure of DNA described 1956 – Human chromosome number determined 1968 – First gene mapped to autosome 1977 – Dideoxy sequencing 1983 – PCR 1986 – Human Genome Project proposed Bringing Genomics into the Classroom
1993 – First MicroRNAs described 2003 – First ‘Gold Standard’ human genome sequence 2005 – First draft of human haplotype map (HapMap) 2007 – ENCODE project Timeline: Welcome Trust Bringing Genomics into the Classroom
“Essentially, all models are wrong, but some are useful” – George E.P. Box From This…
To This… Bringing Genomics into the Classroom
Majority of genome is transcribed ~50% transposons ~25% protein coding genes/1.3% exons ~23,700 protein coding genes ~160,000 transcripts Average Gene ~ 36,000 bp 7 ~ 300 bp 6 ~5,700 bp 7 alternatively spliced products (95% of genes) RefSeq: ~34,600 “reference sequence” genes (includes pseudogenes, known RNA genes) Bringing Genomics into the Classroom
Using Plants to Explore Genomics
There are a large number of plant genomes available for analysis.
Using Plants to Explore Genomics “Plant genomes range from simple to exceptionally complex” – Richard Chronn, USDA Forest Service It’s this diversity within plant genomes that provides a rich platform for examination of the genome as a phenomenon. Genlisea margaretae 63Mb Paris Japonica 150Gb
Using Plants to Explore Genomics The “weirdness” of plant genomes on your dinner plate Triticum aestivum: allohexaploid Brachypodium Sorghum Oryza Brachypodium
Monocots Dicots Time (million years) Present Oryza (rice) Avena (oats) Hordeum (barley) Triticum (wheat) Setaria (foxtail millet) Pennisetum (pearl millet) Sorghum Zea (maize) Arabidopsis Brachypodium Glycine max (soy) 2,500 Mb 750 Mb 20,000 Mb 270 Mb 430 Mb 145 Mb 1,115 Mb ?? Mb 5,200 Mb >20,000 Mb ?? Mb - Genome duplication event Using Plants to Explore Genomics