Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite gene clusters Dallery et al. 2017
Colletotrichum higginsianum Max Planck Institute Pathogenic fungus Affects brassica crops, such as Arabidopsis thaliana, in tropical and subtropical regions Important model pathosystem for looking at molecular basis of fungal pathogenicity and host response O’Connell et al. 2012
Rationale Affects crop yields Previous genome assembly was highly fragmented Looking for role of transposable elements (TEs) in gene and genome evolution Better understanding of genome structure of pathogenic fungi
Methods 10 μg genomic DNA → ~20kb size-selected library Sequenced on PacBio RS II platform De novo assembly with the Hierarchical Genome Assembly Process (HGAP) approach Reads filtered for min. 500bp length Genome consensus sequence polished with Quiver Assembly validated w/ PCR Illumina sequencing w/ 100 bp paired-end reads Used only to detect sequence polymorphisms
Even More Methods REPET pipelines to detect and classify TEs and simple sequence repeats Analysis of repeat-induced point mutations Gene predictions with MAKER2, SNAP, Augustus from Illumina reads Functional annotations via BLASTp and Blast2GO and predictions from SMURF, antiSMASH v.3.0, SMIPS, and CASSIS Phylogenetic analysis of secondary metabolism key genes (MEGA6 and Treedyn)
Final Methods Slide Analysis of distance of TEs to genes and gene clusters Segmental duplication analysis (SDDetector w/ PacBio unitigs) Transcriptome analysis (previous RNA-Seq data) Basically a bunch of experimental validation of transcriptome data
Genome Assembly 7.8 Gb of raw sequence reads 92,834 error-corrected reads N50 length 16,193 bp Final edited assembly = 28 unitigs (unitigs = high confidence contigs) 12 largest unitigs = chromosomes, 99.14% genome assembly Total length = 50.82 Mb Not actually gapless = gap on Chr 7 (liars)
Genome Assembly Genome assembly compared to previous 2009 assembly Assembly Statistics 2012 Assembly 2017 Assembly PacBio read coverage _ 133x Sanger read coverage 0.2x Illumina read coverage 76x 454 read coverage 25x Genome physical size 53.35 Mb Assembly length 49.05 Mb 50.72 Mb Alignable sequence 77.14 kb 50.38 Mb Number of contigs 10, 259 28 Largest contig 49.23 kb 6.04 Mb N50 contig length 6.15 kb 5.20 Mb Complete genes 2946 (79%) 3616 (97%)
Results 2699 MAKER2 genes match to previous gene models 2289 new genes w/ no match in previous annotation Includes 132/133 genes on Chr 12
Results Mini chromosomes 11 & 12 have half the gene content of the ‘core’ chromosomes Lower gene expression Much higher TE content
Results
Results Secondary Metabolite (SM) Gene Clusters
Results Genes in SM clusters + genes encoding candidate secreted effector proteins were found significantly closer to TEs than random genes over whole genome Many copies of large TEs
Results Found 6 segmental duplications 4 of these are at chrom ends and/or regions of highly similar repeats
Actually Cool Results Some TE families subject to Repeat-Induced Point (RIP) mutations Occurs during meiosis (sexual reproduction) This fungus is asexual RIP occurred either during ancestral sexual state or there is cryptic meiosis happening ~30% TEs appear active 60% of expressed SM clusters only during plant infection
Conclusions A complete genome assembly is key to analysis of TEs, teleomeres, structural rearrangements, and large gene clusters The mini-chromosomes differ dramatically from the core genome in gene and repeat content Resemble conditionally dispensable chroms. Pathogenicity-related (?) genes Repeat-mediated segmental duplication likely accelerated the pathogenicity-related gene evolution, e.g. ectopic recombination SM gene cluster inventory will help to ID novel bioactive molecules and their biosythetic pathways
Questions Would the Illumina library have helped the genome assembly if it had been included? Are unitigs as good as scaffolds when used in their place? If the fungus lost its mini-chromosomes, would it be significantly less pathogenic?