Cassava Genome from Ancestor to Cultivar GCP21-II S3 Cassava Genome from Ancestor to Cultivar Wenquan Wang Ph. D Chinese Cassava Genomics Consortium Institute of Tropical Biosciences & Biotechnology, CATAS Uganda, June 19, 2012
Biological characteristics of cassava High photosynthesis High starch accumulation Extremely tolerance to drought and barren soil Heterozygosity and somatic propagation.
Bottleneck in aspect of genetics for developing cassava industry Less known genetic diversity in evolution Lack knowledge for mechanisms of high photosynthesis and starch metabolism Uncovering function of drought and barren soil tolerance Less understanding adaptation to different kinds of diseases and pests of cassava plant Lack tools for genotyping in cassava breeding
Genotypes used for whole genome sequencing W14 (Manihot esculenta. ssp. flabellifolia) Semi-wild species KU50 (Manihot esculanta Crantz) Cultivar (starchy) S1.600 (Manihot esculanta Crantz) Cultivar (sugary ) W14 KU50
Characteristics of the three genotypes for sequencing W14 KU50 S1.600 Regeneration Seeds mainly Stems Tuber root small large very large Photosynthesis middle high Fresh root yield low 10 folds 5-10 folds Starch content 4-5% 30% 5-6 folds 5% added 12-15% sugar, 2-3 folds
Net photosynthesis rate difference of W14 and KU50 in developing stages
contigs/scaffolds >10kb Genome assembly of W14 and KU50 W14 KU50 all contigs/ scafolds contigs/scaffolds >10kb Fold genome coverage (Gb) 97.88 45.31 Number of contigs/scaffolds 54,426 15,234 62,763 7,441 Total span 475 Mb 302 Mb 416 Mb 167 Mb N50 14 kb 21 kb 12 kb 23 kb Largest contigs/scaffold 183 kb 123 kb Average scaffold length 9 kb 20 kb 7 kb 22 kb GC(%): 34.63% 34.47% 36.02% 36.07%
Repeats account and divergence rate in W14 and KU50 AM560 40.2% W14 36.8% KU50 25.7% 12% 22% 17%
LTR in situ hybridization in all the chromosomes
Genome coverage in gene region Transcript coverage 80.5% KU50 W14 Transcripts coverage 97.1% EST coverage 73.4% KU50 W14 EST coverage 91.5%
Evaluation of assembly of W14 Miss match rate: 4.9/10000; mm and gap rate, 3.9/1000
Gene prediction in genomes of W14 and KU50 Gene Number: 43986 31480 Gene Length: 70 Mb 39 Mb Coding Region Length: 46 Mb 28 Mb Gene Density(%): 9.94% 9.95% Mean Length of Intergenic: 4 kb Maxmium Length of Intergenic: 52 kb 44 kb Exon Number: 181,158 124,694 Exon Number/Gene: 4.13 3.97 Exon Length: Mean Length of Exon: 252.73 225.44 Maxmium Length of Exon: 9 kb 6 kb GC(%) of Exon: 44.09% 42.65% Intron Number: 137266 93287 Intron number/Gene: 3.13 2.97 Intron Length: 31 Mb Mean Length of Intron: 336.12 327.71 Maxmium Length of Intron: 11 kb 14 kb GC(%) of Intron: 32.81% 33.40%
Annotation of genes predicted Genome W14 KU50 Predicated genes Number 43,892 Percentage (%) 31,407 Swissprot 28,808 65.63% 19,240 61.26% TrEMBL 38,784 88.36% 26,723 85.09% InterPro/GO 39,918 90.95% 24,344 77.51% KEGG 35,451 80.77% 24,247 77.20% COG 18,205 41.48% 12,162 38.72% NR/NT 38,802 88.40% 26,739 85.14% Total annotated 41,934 95.54% 27,549 87.72% Un-annotated 1,958 4.46% 3,858 12.28%
BAC library and physic map constructed in W14 Description Index BAC library coverage BAC library insertion size Number of BAC clones fingerprinted Number of high quality fingerprints used for assembly Number of contigs Number of singletons Total length of the contigs N50 contig length Longest contig Average number of clones per contig 93,000 clones, >10x 130kb 30,000 ? 2484 984 675.93 Mb 336.38 kb 1981.98kb 2.16
Genome diversity decreasing in evolution Heterozygosity of genome W14, KU50 and AM560 sample # SNPs SNPs density (1SNPs/n bp) # gene SNPs gene SNPs density # SNPs in exon SNPs In exon density (per SNPs/bp) W14 1,377,370 1/257 295,358 1/270 220,600 1/272 KU50 806,271 1/286 109,701 1/336 43,610 1/422 AM560 (S3) 506,746 1/693 73,628 1/6170 46,524 1/5583
SNPs divergence in genome of wild ancestor W14 and cultivar KU50 Sample # SNPs SNPs density (SNPs/ bp) # gene SNPs gene SNPs density # SNPs in exon SNPs In exon density (SNPs/bp) # intergenics SNPs intergenics SNPs density ( SNPs/bp) W14 4,812,287 6.94/ 1000 1,574,460 1/294 563,588 1/676 3,237,827 1/160 KU50 3,620,860 4.57/ 516,278 1/894 187,122 1/1947 3,104,582 1/229 S1.600 2,977,198 4.10/ 517,321 1/893 186,413 1/1935 2,459,877 1/255
SNPs shared and distribution Samples # SNPs # SNPs unique # SNPs in gene # SNPs in exon # intergenics SNPs # SNPs in repeat regions W14 4,812,287 4,065,298 1,574,460 563,588 3,237,827 1,751,276 KU50 3,620,860 1,976,538 516,278 187,122 3,104,582 2,142,290 S1.600 2,977,198 1,375,917 517,321 186,413 2,459,877 1,737,544 W14-KU50 570,695 219,335 200,908 75,356 369,787 184,454 W14-S1600 527,654 176,294 205,509 76,873 322,145 162,873 KU50-S1600 1,424,987 1,073,627 281,464 101,783 1,143,523 770,687 W14-KU50-S1600 351,360 143,721 53,735 207,639 98,970
Indels divergence in genome of wild ancestor W14 and cultivar KU50 Sample # indels # indels density # insertion # deletion average length W14 390,652 0.80/1000 159,467 231,080 3.59 KU50 275,639 0.79/1000 132,396 143,200 3.65 S1.600 217,226 0.64/1000 103,964 113,207 4.07
SNP/Indels among four cassava genomes
Transcriptome for photosynthesis and starch metabolism in cultivar Arg7 and wild ancestor W14 Transcriptome sequenced samples: C1 Arg7 Early root C2 Arg7 Middle root C3 Arg7 Later root C4 W14 Middle root C5 Arg7 Developing stem C6 Arg7 Functional leaf C7 W14 Functional leaf C8 W14 Developing stem
Expression profiling of genes for starch and photosynthesis pathways
Comparative expression folds of genes for photosynthesis: Arg7/W14
Cell Wall metabolism, Arg7/W14, red-high expression in root of W14
Sucrose glycolysis in root of Arg7 is weak than in W14
Comparative expression folds of genes for starch metabolism in leaf and storage root: Arg7/W14
Expression folds of genes for starch accumulation in tuber root of KU50 than in W14
Phylogenetic tree of SuSy and INV
An efficient starch biosynthesis model in tuber root of cassava
miRNAs and drought tolerance in cassava a set of 148 miRNAs in cassava have been predicted by sequencing of14 small RNA samples and referenced to genome which of 41 are novels and 107 are conserved. miRNAs and targets related to drought and development of leaf and tuber root have been found.
Eleven drought and cold inducible miRNAs with interesting targets were revealed and confirmed by qPCR miRNA Targets miR1045125 Protein binding, Zinc ion binding; Transcription factor, DNA binding miR1230481 zinc ion binding, protein serine/threonine kinase, ATP binding miR3747522 Enzyme inhibitor activity, Pectinesterase activity; DNA-binding protein-related; FUNCTIONS IN: transcription factor activity miR5178028 Encodes a H3/H4 histone acetyltransferase; Encodes eukaryotic translation initiation factor. miR3615546 oxidative phosphorylation uncoupler activity, binding oxidoreductase activity, Iron ion binding miR1229496 Protein kinase family--kinase activity, small molecular g-protein miR4806982 Kinase activity; Nuclear protein required for early embryogenesis miR5815094 auxin induced gene (IAA1) encoding a short-lived nuclear-localized transcriptional regulator protein; acetylglucosaminyltransferase ; transferase activity
ABA biosynthesis pathway in tuber root of cassava
Expression of genes in carotene and ABA synthesis pathways
Comparative genomics among cassava, Jatropha and castor bean Unique gene families: Cassava 2043 Jatropha 532 Castor bean 826 Shared gene families: 12041
Gene families in virion part and reproduction found only Coherence in biological processes among cassava, Jatropha and castor bean of Eurphorbiceace Gene families in virion part and reproduction found only in cassava
Database: Cassava-genome.cn
Ongoing work Genome fine mapping integrated assembly with physic map, BAC-end sequences and BAC-pooling sequences. Chromosomes location with assembling BACs and scarfolds based on in situ hybridization Functional verification of genes for important pathways Development of SNP markers and molecular design breeding
Summary Genome drafts of an ancestor and a cultivar in cassava were assembled and annotated. Genome diversity decreased from wild ancestor to cultivar in domestication;Millions of SNPs were discovered and recommended for genotyping in cassava. Advanced an efficient starch biosynthesis pathway in tuber root of cassava.
Acknowledgement CATAS BX Feng, Z Xia, XC Zhou, KM Li, PH Li, M Peng, WQ Wang BIG-CAS JF Xiao, JX Liu, SN Hu SCBG-CAS Gong Xiao, Chi Song, Ying Wang EMBRAPA, Brazil, Luiz C UC Davis Mingcheng Luo XJIEG-CAS Bin Liu, Binxiao Feng SIS-CAS Jun Yang, Peng Zhang Fudan U Zhicheng Wu, Ruiqi Liao, Shuigen Zhou Copenhagen U, Demark Rubini, Birger Muller Nanjing Agricultural U Qunfeng Lu