Summary of Current Assembly

Slides:



Advertisements
Similar presentations
ABSTRACT WormBase is a freely available information resource primarily for the nematode Caenorhabditis elegans but which progressively includes data from.
Advertisements

Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
Gramene Comparative & Phylogenomics Resources for Plants Joshua C. Stein 1, William Spooner 1, Sharon Wei 1, Liya Ren 1, Doreen Ware 1,2 1 Cold Spring.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
9 Genomics and Beyond Brief Chapter Outline
Structural and Functional Genomics of Tomato Barone et al Tomato (Solanum Lycopersicon) – economically important crop worldwide, – intensively investigated.
Assembly.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Compartmentalized Shotgun Assembly ? ? ? CSA Two stated motivations? ?
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
Next generation sequencing Xusheng Wang 4/29/2010.
Genome-Wide SNP Discovery from de novo Assemblies of Pepper (Capsicum annuum ) Transcriptomes Hamid Ashrafi 1, Jiqiang Yao 2, Kevin Stoffel 1, Sebastian.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
Jing Yu 1, Sook Jung 1, Chun-Huai Cheng 1, Stephen Ficklin 1, Taein Lee 1, Ping Zheng 1, Don Jones 2, Richard Percy 3, Dorrie Main 1 1. Washington State.
Tomato genome annotation pipeline in Cyrille2
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
The New Zealand Institute for Plant & Food Research Limited Potato Genome Sequencing Consortium, notes from the edge Dr Susan Thomson, Dr Mark Fiers, Dr.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Tomato Chromosome 4: A Mapping & Sequencing Update 28 th September 2005 Christine Nicholson Mapping Core Group Welcome Trust Sanger Institute, UK.
Genomic assessment of mass-reared vs wild Hawaiian Mediterranean fruit flies Bernarda Calla, Brian Hall, Shaobin Hu, and Scott Geib Tropical Crop and Commodity.
Jing Yu, Sook Jung, Chun-Huai Cheng, Stephen Ficklin, Ping Zheng, Taein Lee, Richard Percy, Don Jones, Dorrie Main.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Plants.ensembl.org / The transPLANT project is funded by the European Commission within its 7 th Framework Programme under the thematic.
© 2010 by The Samuel Roberts Noble Foundation, Inc. 1 The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK, 73401, USA 2 National Center.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
Genomics.
Applied Bioinformatics Week 5. Topics Cleaning of Nucleotide Sequences Assembly of Nucleotide Reads.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Copyright OpenHelix. No use or reproduction without express written consent1.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
No reference available
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Development of a Cotton Marker Database (CMD) for Gossypium genome and genetic research CMD Main Goals Collect and integrate.
Accessing and visualizing genomics data
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
Myb Transcription Factors Dylan Coughtrey Laboratory Methods in Genomics Spring 2011.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
The Bovine Genome Sequence: potential resources and practical uses. Nicola Hastings, Andy Law and John L. Williams * * Department of Genetics and Genomics,
Risheng Chen et al BMC Genomics
The Transcriptional Landscape of the Mammalian Genome
Human Genome Project.
Resources Available for Fragaria Research through the Genome Database for Rosaceae Dorrie Main, Sook Jung, Chun-Huai Cheng, Stephen Ficklin, Taein Lee,
House spider genome uncovers evolutionary shifts in the diversity and expression of black widow venom proteins associated with extreme toxicity Gendreau.
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Genome Sequence Annotation Server
CottonGen: An Up-to-Date Resource Enabling Genetics, Genomics and Breeding Research for Crop Improvement Plant and Animal Genome Conference XXV Jing Yu1,
Section 3: Gene Technologies in Detail
Genome Sequence Annotation Server
the Genome Database for Rosaceae: New Data and Functionality
Stuff to Do.
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
Access to Sequence Data and Related Information
CottonGen An Online Resource for the Cotton Community
Genomes and Their Evolution
Centre of Plant Structural and Functional Genomics
Plant and Animal Genome Conference XXIV
In vivo optimization of the tagging approach using the Act5C model locus and flow cytometry-based quantification of the Act5C-GFP tagging success. In vivo.
Next-Generation Sequencing Strategies Enable Routine Detection of Balanced Chromosome Rearrangements for Clinical Diagnostics and Genetic Research  Michael E.
A web-based platform for structural and functional annotation of model and non-model organisms Jodi Humann, Taein Lee, Stephen Ficklin,
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
2016 Beltwide Cotton Conference
The Potato Genome Sequencing Consortium: An Update
Genome-wide Functional Analysis Reveals Factors Needed at the Transition Steps of Induced Reprogramming  Chao-Shun Yang, Kung-Yen Chang, Tariq M. Rana 
For more information contact:
Presentation transcript:

Summary of Current Assembly “Blueprints for Blueberry” –The current status of assembly and annotation of the blueberry genome Robert W. Reid1, Ying-Chen Lin4, Raad Gharaibeh1, Cory Brouwer1, Jeannie Rowland2, Dory Maine3, Rachel Walstead1, Mary Ann Lila4, Allan Brown5 1 Dept. of Bioinformatics and Genomics, University of North Carolina Charlotte, Charlotte, NC Dept. of Horticulture, Washington State University, WA USDA-ARS, MD Plants for Human Health Institute, North Carolina State University, Kannapolis, NC International Institute of Tropical Agriculture (IITA), CGIAR, Arusha, Tanzania Abstract Summary of Current Assembly Here we report on the current state of the diploid blueberry genome and describe some of the current resources available. While many plant sequencing efforts are underway across the plant kingdom, the vaccinium genus is relatively under represented. Blueberry is unique in that it has a long develop period (juvenile period = 3 years), is unable to self propagate, Our goal is to generate a genome reference that acts as a resource for more efficient breeding. Marker guided breeding benefits from having a genome resource to associate gene regulation with desirable plant traits. These efforts will ultimately lead to improvements in growing, berry collecting, processing and improved human nutrition. Table 1: Assembly improves with additional sequencing and BAC end sequences. To aid in scaffolding, Sanger BAC ends were sequenced and incorporated into the assembly using a modified version of SSPACE. Repeat Analysis: Summary of repeat content in genome Annotations Augustus gene prediction produced 113,003 gene predictions. All predictions are available and viewable in a genome browser at vaccinium.org and within the IGB browser (http://bioviz.org/igb). Of these gene predictions, 79,096 protein predictions were greater than 100 residues in length. These proteins were spread across 6,707 scaffolds. Interproscan annotated 38,217 proteins (48%) and of these 19,506 proteins were assigned GO terms across 1,483 different GO identifiers. Figure 3 summarizes the detected blueberry gene ontologies from Blast2Go for both biological processes and molecular functions. The top processes detected were DNA metabolic processes (GO:0006529), cellular protein modification process (GO:0006464) and RNA metabolic processes (GO:0016070). Repeat analysis was produced via RepeatModelor, MISA, RepeatScout and Repeatmasker and is summarized in the table above (right). Transcription factor analysis discovered 1,889 transcription factors a reciprocal best hit BLAST search against the Plant Transcription Factor Database v 3.0 . Figure 1: Assembly numbers for the latest blueberry genome assembly. This includes all scaffolds and contigs with no cutoff. There are a total of 104,711 assembled sequences with 13,860 contigs/scaffolds being greater than 1000 nucleotides (NT) long. Materials & methods Core genes that align to genome Improved blueberry genetic linkage map 77% Figure 2: Overview of the blueberry variety used for linkage mapping and genome Assembly. Diploid W85-20 is a wild selection from New Jersey that was selected based on it’s cold hardiness properties. Actively growing leaves were used as the source material for sequencing. For marker linkage analysis (paper in preparation & Rowland 2014, see Fig. 4), a screening population was produced using W85-20 as a grandparent. All tissue was provided by Jeannie Rowland (USDA-ARS). Sequencing was completed using Illumina, Sanger technologies and 454 GS FLX pyrosequencing. For paired-end library construction, genomic DNA was sheared to 3, 8 and 20 kb, respectively, using HydroshearTM. Illumina GA2 and HiSeq mate pair sequences were produced and 1 additional lane of Illumina HiSeq was sequenced using the Nextera mate pair preparation kit with an average insert size of 7000 base pairs. Contigs were assembled via MaSuRCA and Newbler followed by assembly merging using GARM. Figure 5: Assessing genome completeness, we aligned 458 core genes from arabidopsis (described in CEGMA) to our assembly (aligning via exonerate), we found 354 aligned (77%). Blasting these same core genes to the available blueberry transcriptome produced 450 hits (98%). Future plans are to incorporate BUSCO (busco.ezlab.org) for further assessments. Assembly sise is 484MB but flow cytometry has estimated the genome size to be 600MB. To learn more about our future efforts or about joining the blueberry consortium, please contact us. Annotation pipeline for gene ontology Figure 4: Depiction of latest diploid blueberry map. The current diploid map contains 318 markers, with 92 added markers (in red) since previously reported (Rowland, Molecular Breeding, December 2014, Volume 34, Issue 4, pp 2033-2048). Of the new SSR markers added, > 95% align to the assembly. There is an average of 26.5 markers per linkage group. Available online resources for blueberry Figure 3: Annotation highlights. Left panel: pipeline summary of GO annotations generated from automated gene predictions Future directions of development: Like so many other plant genomes, the blueberry genome requires longer read lengths to resolve repeat regions and anchor numerous small contigs generated so far. The blueberry consortium has begun with both PAC-Bio sequencing and will be employing the Dovetail Genomic’s Chicago sequencing strategy. We plan to add Optical Mapping to integrate linkage maps and scaffolds in the future. Funds for the blueberry genome project have been provided by North Carolina General Assembly, NC State University, USDA-ARS, University of Florida, UNC Charlotte and the P2EP (p2ep.org). Figure 6: Online resources for blueberry. (Upper left) Vaccinium.org hosts an online BLAST server (Lower left) Soon to be added to vaccinium.org will be an annotated browser generated using GenSAS 2 which is currently in development. (Right) The Integrated Genome Browser (IGB) is freely available at bioviz.org and includes an annotated browser including automated gene predictions, repeat regions as well as berry transcriptome alignments at various stages of berry development. Improved linkage maps are also in development. Poster P1131