The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease M. larici-populina Transcriptome Mlp Summer workshop – INRA Nancy, August Duplessis Sébastien (INRA Nancy) Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM
Mlp Transcriptome – Goals and Means Goals Gene Expression - Identify genetic determinants involved in Mlp biology - Identify sets of genes involved in development of infection structures (secretion, effectors, avirulence,...) - Identify sets of genes involved in biotrophy (nutrition, transport) - Identify expression profiles expressed during plant-fungal interaction Gene Models Annotation - Validation of Gene Models prediction - Detection of new Gene Models
Mlp Transcriptome – Goals and Means Means EST sequencing - Sanger ESTs from specific cDNA library (cDNA cloning / s ESTs) pyrosequencing from specific tissue (no cDNA cloning / k reads) 454: 80 Mb in 1 run for 10K€ vs. 1000s of Sanger ESTs for much more => Genes expressed in a given tissue (specific and ubiquitous) => No gene prediction a priori Array-based expression profiling - DNA Chips – NimbleGen Systems oligonucleotide arrays => Expression of all predicted genes represented on the array => Gene prediction a priori or EST sequencing required
Mlp Transcriptome – EST sequencing I cDNA Library of Mlp 98AG31 uriniospores and germlings 250 µg of DNase free-RNA were isolated from Mlp 98AG31 urediniospores and germlings (urediniospores grown for less than 12h on agar) sent to JGI Mlp is an obligate biotroph so spores are unique sources for uncontaminated ESTs cDNA Library => 29,081 cDNA clones 5'/3' sequencing => 52,269 ESTs (including ~ 4,500 ESTs previously obtained at INRA Nancy) EST assembly => 11,535 Consensus (mean size 780nt: 100 -> 5052 nt) — 6,599 singletons — 4,936 clusters — 119 consensus contain > 50 ESTs Best Blast Hits of most abundant ESTs consisted in: — stress response TF rds1, HSP, glycosidase, ubiquitin, fruitingbody protein, cyclin, SOD, Ras, antibiotic resistance, protease, laccase, tubulin — dehydrogenases and cytP450 from Uromyces fabae — predicted gene models from P. graminis
Mlp Transcriptome – EST Sequencing I Comparison to released Pucciniales ESTs (e-value < ) Phakopsora pachyrizi (soybean rust) ESTs => Germinated/not germ spores, Infected tissues Puccinia graminis f. sp. tritici (wheat stem rust) => Germ/not germ urediniospores and teliospores 46,41128,5365,858 45,81256,7536,483 Mlp PpPgt 4,045 Pgt spore ESTs 5,738 Pp spore ESTs
Mlp Transcriptome – EST Sequencing I Mlp 98AG31 ESTs for Gene Prediction and Gene model support ESTs were used in JGI and EuGene predictions => 27 % of Gene Models supported => 4,507 Gene models supported ESTs to support gene curation => ESTs and clusters are shown on the JGI Melampsora website
Mlp Transcriptome – EST Sequencing II M. medusae f.sp. deltoidae (MMD) — Multiple isolates, diff. growth stages (field) M. larici-populina (MLP and MLP-H) — Multiple isolates, diff. growth stages (field) — Single isolate, haustoria-enriched (in vitro) M. medusae f.sp. tremuloidae (MMT) — Single isolate, 13 days growth (in vitro) M. occidentalis (MO) — Single isolate, 13 days growth (in vitro) cDNA Libraries from various Melampsora Spp. (Feau, Joly, Hamelin, CFS, Canada)
Mlp Transcriptome – EST Sequencing II Construction kit # clones sequenced # readable sequences # contigs# singletons MMDStratagene5,5413, MLPStratagene3,0082, MLP-HClontech3,7083, ,034 MMTClontech3,0082, MOClontech3,0082, ,285 cDNA Libraries from various Melampsora Spp.
Mlp Transcriptome – EST Sequencing II Feau et al Can.J.Bot Annotation of Melampsora Spp. ESTs
Mlp Transcriptome – EST Sequencing II Annotation of Melampsora Spp. ESTs Feau et al Can.J.Bot
Mlp Transcriptome – EST Sequencing III: 454-pyrosequencing 454-pyrosequencing of poplar leaf infected tissues Melampsora is an obligate biotroph => specialized infection structures (haustoria) formed after 16 h post-inoculation (pi) and uredinia formed after 7 dpi only in the plant host Strong Mlp invasion of plant tissues was observed at 4 dpi (Rinaldi et al., 2007) Pyrosequencing allows the generation of 100,000s sequences from isolated transcripts => 200,000 ESTs from transcripts isolated from Poplar infected leaves at 4 and 7 dpi with 454 GS-FLEX (Roche) by Cogenix — Transcripts expressed during plant infection — Transcripts involved in infection structure development, maintenance and biotrophy — Transcripts involved in spore formation and maturation — Identification of plant infection-specific transcripts by comparison with Sanger ESTs
Mlp Transcriptome – 454-pyrosequencing (From Ellegren, Mol. Ecol. 2008)
Mlp Transcriptome – 454-pyrosequencing 454-sequencing at JGI
Mlp Transcriptome – 454-pyrosequencing µg of total RNA were isolated from infected Poplar leaves ('Beaupré') at 4 hpi and 7 dpi with Mlp 98AG31 2. cDNA synthesis with SMART cDNA synthesis kit from 60 ng purified mRNA µg cDNA recovered and sent to Cogenix for 454-pyrosequencing on GS-FLEX (Roche) 4 dpi: infection hyphae, haustoria4 dpi: infection hyphae, haustoria, uredinia, spore-forming cells Pictures by S Hacquard & S Duplessis (2008) by confocal microscopy with PI/Uvitex staining
Mlp Transcriptome – 454-pyrosequencing Cogenix report on 454-sequencing 454-pyrosequencing allow to generate > 400,000 sequences or 2 x 200,000 sequences in 1 run Poplar infected tissues => ~ 185,663 sequences 454-sequences are small (mean length 203 nt) and requires assembly for transcript reconstruction Assembly by Newbler => 148,688 assembled in 10,629 contigs & 36,975 reads (= singletons?)
Mlp Transcriptome – 454-pyrosequencing Newbler assembly vs. MIRA assembly Newbler is a de novo assembler designed for genomic sequences (not transcripts) working in flow- chart space, not nucleotide space Newbler tends to eliminate several reads with no obvious reasons (>38,000 reads are lost) Cogenix recommended the use of other de novo assembler dedicated to transcript assembly CAP3 is not recommended MIRA is an ESTs assembler recently updated for 454-data => MIRA generates more contigs than Newbler => contigs (including 2,600 singletons) MIRA provides information on overall quality of sequences (tag 'too short' = low quality sequences) Genome threader (Gth) allows to map transcript sequences to a genome sequence MIRA contigs are mapped to Mlp and poplar genomes to identify fungal and plant transcripts
Mlp Transcriptome – 454-pyrosequencing Newbler vs. MIRA Mlp sequences Poplar sequences Singletons reads from Newbler are mostly low quality sequences
Mlp Transcriptome – 454-pyrosequencing Final MIRA assembly vs. poplar and Mlp genomes — Contigs that showed a Gth score < 0.9 were dissolved in singletons — Contigs attributed to both genomes with Gth scores > 0.9 were manually resolved — Contigs attributed to a genome and containing reads attributed to the other genome were manually inspected with Consed => new contigs/singletons — Singletons with Gth scores < 0.9 were not retained 5,956 contigs & 9,562 singletons attributed to Mlp 6,414 contigs & 21,400 singletons attributed to Poplar PASA (Program to Assemble Spliced Alignment) PASA is a tool designed for curation of gene catalogs using sets of ESTs and FL-CDNA and based on stringent alignment to genome sequence with GMAP, assembly in clusters based on position on genome sequence, comparison to current catalogue of gene models => curation PASA was used in several published 454-analyses, and in Arabidopsis community for gene curation PASA => Mlp EST (Sanger & 454 contigs) vs. Mlp genome/gene models
Mlp Transcriptome – 454-pyrosequencing PASA outputs for Mlp 454 Contigs PASA was run using all 454 reads against Mlp Genome and a similar number of gene models were supported
Mlp Transcriptome – 454-pyrosequencing PASA outputs for Mlp Sanger contigs Total of 6294 Mlp Gene Models supported (38%)
Mlp Transcriptome – 454-pyrosequencing Examples of gene models curation based on Mlp 454 Contigs proposed by PASA
Mlp Transcriptome – 454-pyrosequencing Most abundant transcripts supporting Mlp Gene Models identified through 454-sequencing 4010 Gene models supported by 454 ESTs — 935 no hits in nr/swissprot specific to Pucciniales specific to Mlp — 265 encodes SSPs => 166 no hits in nr/swpr - 34 specific to Pucciniales specific to Mlp
Mlp Transcriptome – NimbleGen Systems oligonucleotide arrays NimbleGen Systems Expression oligont arrays ~390, mer oligoprobes evenly distributed on 2cm 2 array 4plex arrays = 80 to 90,000 probes per array (+ controls) Set of 8 oligoprobes/gene duplicated in Laccaria bicolor 16,694 JGI models + new EuGene models with 454 support [All 454 supported new CDS ?] 17 to 20,000 Mlp Gene Models => 4 probes/genes => no duplicated probes => Populus filtered 10 x 4plex NimbleGen arrays ordered – Design ASAP Mlp Gene Expression during timecourse infection
Mlp Transcriptome – Conclusions Conclusions — 52,269 Mlp 98AG31 ESTs support 27% JGI Mlp Gene Models — ESTs from other Mlp Spp to help in annotation (+ polymorphism study) — 185, reads were assembled in 12,370 Contigs & 30,962 Singletons 5,956 contigs & 9,562 singletons attributed to Mlp by Gth 6,414 contigs & 21,400 singletons attributed to Poplar by Gth — PASA identified a total of 6294 Mlp Gene Models supported both by 454 and Sanger ESTs contigs = 38% of Mlp Gene Models (11% increase) — MIRA identified many Gene models that may need annotation — MIRA also identified more than 2,500 putative new genes (to be verified) — Among the 4,010 Gene Models expressed in planta => 519 are specific to Mlp and 391 to Pucciniales => 265 encode SSPs and 128 SSPs are specific toMlp
Mlp Transcriptome – Conclusions Ongoing… — Curation of Gene Models supported by 454 contigs — Prediction/Curation of putative new genes with 454 contigs support — Design of NimbleGen Systems Oligoarray Mlp v1.0 To come… — Alternative splicing — Presence of SNPs (Transcripts expressed in both nuclei?) — Profiles of candidate genes during timecourse infection of poplar leaves
Stéphane Hacquard (INRA Nancy) Mlp effectors Emilie Tisserant & Benoît Hilselberger (INRA Nancy) Mlp Bioinfo Yao-Cheng Lin (VIB, Ghent, BE) EuGene prediction, Mlp gene families Mlp 98AG31 the 'bad guy' genomic team at INRA UMR 1136 IAM Marie-Pierre Oudot-Le Secq (INRA Nancy) EST annotation Duplessis Sébastien & Francis Martin