Download presentation
Presentation is loading. Please wait.
Published byBrett Lynch Modified over 9 years ago
1
Sequence and Analysis of the Maize B73 Genome Doreen Ware 1,2, Joshua Stein 1, Apurva Narechania 1, Shiran Pasternak 1, Linda McMahan 1, Chengzhi Liang 1, Wei Zhao 1, Sharon Wei 1, William Spooner 1,, Ben Faga 1, and The Maize Genome Sequencing Consortium 3 1 Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY11724, USA 2 USDA-ARS NAA Plant, Soil & Nutrition Laboratory Research Unit, USA 3 Genome Sequencing Center, Washington University, St. Louis, MO 63108, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724; Arizona Genomics Institute, University of Arizona, Tucson, AZ 85721; and Iowa State University, Ames, IA 50011 Summary From its domestication 8,000 years ago in Central America to its position today as the world’s leading harvested grain, Zea mays has played an important role in human civilization, providing food, animal feed, and biofuel. Maize also enjoys a long and distinguished history as a model organism owing to its rich diversity and tractable genetics. The complete sequence of the maize genome would therefore propel advances in basic research as well as agriculture and other industries. The Maize Genome Sequencing Consortium was launched with a three-year grant from NSF to produce a complete sequence of the maize (B73) genome. At 2.5 Gb, the maize genome rivals mammalians in terms of size, and is six times larger than rice, owing to its high content of retrotransposable elements. To meet the challenge of producing an assembled sequence we took a BAC-by-BAC approach, selecting a minimal tiling path of clones from a 20X fingerprint map. Now in its third year, the project has produced complete sequences of 15,200 BAC clones comprising approximately 2 billion non-redundant bases, all available via GenBank. Annotation of this first draft, using both ab initio gene prediction and evidence-based approaches, gives preliminary estimates of gene numbers, many of which produce alternative transcripts. Comparison to rice, and a detailed analysis of a 22 Mb contig on chromosome 4, reveals that the maize genome has been largely shaped by its history of tetraploidization, subsequent rearrangement and duplicate gene loss. Gene annotations and comparative maps generated by this project are available at the Gramene Genome Browser (maizesequence.org). Survey of Gene Statistics Maize Accelerated Region Synteny Analysis Maize Genome Gene Densities Retroelement Composition The maize accel region contains syntenic blocks to rice chr2 and sorghum chr4 Maize: max gap between NETS 100,000 residues; min NET size 5000 residues. Rice and sorghum: max NET gap 50,000 residues; min NET size 2000 residues. Syntenic blocks are defined in two steps. First, NETS are grouped if the distance between them is smaller than twice the max gap parameter and there are no NETS breaking the synteny. Second, these groups are arranged into syntenic blocks up to 30 times the max gap parameter with two synteny breaking groups allowed. The rice assembly is complements of TIGR (version 5), and early access to the sorghum assemblies complements of JGI. Maize-sorghum and Maize-rice synteny illustrates two large scale inversions, one on Maize Chr4 (the accelerated region), and the other on Sorghum Chr4. Survey of Retroelements Maize Accelerated Region Duplication Rice Chr2 from positions 29MB to 36MB aligns to Maize Chromosomes 4 and 5 in equal measure indicating a duplication event. Alignments were made to maize BAC-contigs and mapped to Chromosomes 4 and 5 using the FPC map. The majority of Chr4 hits were on FPC ctg182, corresponding to the accelerated region. The majority of NETS on Chr5 were on contigs 250, 251, 253, and 254 in agreement with marker based studies. PLoS Genet. 2007 Jul 20;3(7):e123 Gene Level Synteny With Rice TypeCount WH58511 NH64716 TE254608 Evidence66525 ji11.9% giepum1.5% other1.8% opie8.6% milt1.0% tekay1.4% xilon3.0% grande3.3% prem14.3% cinful4.5% zeon5.0% other11.2% huck11.9% Retro elements comprise 76% of the genome sequence. DNA transposes, Sirs, and other repeats comprise less than 3%. ji and huck families together occupy 24% of the genome sequence. 22 mb 1 mb SorghumMaizeRice SorghumMaizeRice SorghumMaizeRice Maize Inversion Sorghum Inversion Maize 22 Mb region Rice chr 2 (29.0 – 35.8 Mb) deletion 68 genes Syntenic Non-syntenic Inversion associated with a deletion resulting in an unmatched span of 68 genes in rice. 2.7 -fold expansion in length of region in maize. 476/961 (49.5%) maize genes are syntenic. 380/1150 (33.0%) rice genes are syntenic. Suggestive of extensive gene movement as well as loss in maize. Genes were called on a freeze of the maize data containing 14,042 BACs. WH: with homology (alignment to non-TE) NH: no homology TE: Alignment to TE’s in a curated DB Prediction on masked sequence with Gramene Ensembl using same and cross species evidence Class I retroelements are the most abundant in the genome. Class II DNA transposons, simple repeats, and other repeats are far less abundant 78% of the maize BACs sequence is repetitive. Evidence-based genes and transposons called on Maize BACs were projected to the maize FPC map to illustrate contiguous, chromosome-level gene frequency. Virtual Core Bins were generated via IBM2 anchors to the FPC map. The boxed area corresponds to the maize accelerated region. This region appears to be gene rich relative to the rest of the genome. The high gene density of the accelerated region (22.1 evidence-genes/500kb) is shown in further detail. Clusters of Gramene Genes (GeneBuilder) and Fgenesh Models (ab initio) seem to mirror each other, a trend that is more apparent at 1 Mbase magnification. BACs at MaizeSequence.org MIPS repeats Gene Predictions FgenesH (ab initio) Gramene Genes (evidence-based) Alignments to proteins in NR Cereal sequence alignments Phred quality scores of BAC sequence Mathematically defined repeats (20-mer freq’s in a 0.45X WGS maize library) Ensembl tracks are configurable and data sets can be toggled on and off with user preference. Mean gene densities across chromosomes *Densities calculated given 500Kb windows. *Standard deviations provided in parentheses.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.