The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease M. larici-populina Transcriptome Mlp Summer workshop – INRA.

Slides:



Advertisements
Similar presentations
Mo17 shotgun project Goal: sequence Mo17 gene space with inexpensive new technologies Datasets in progress: Four-phases of 454-FLX sequencing to max of.
Advertisements

Rust Pathogen Genomics and the Potential Impact on Crop Management Les J. Szabo USDA ARS Cereal Disease Laboratory University of Minnesota St. Paul, MN.
BiGCaT Bioinformatics Hunting strategy of the bigcat.
EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
The Genome of Melampsora larici-populina, The Poplar Leaf Rust Tree/Microbe Interactions, INRA/Nancy University.
The genome sequence of Melampsora larici-populina the causal agent of the poplar rust disease Inventory and annotation of Mlp Signaling genes Mlp Summer.
Transcriptome Sequencing with Reference
Stéphane HACQUARD (INRA NANCY) The secretome of Melampsora larici-populina First results Nancy, workshop Melampsora, august 2008 David JOLY (CFL QUEBEC)
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
9 Genomics and Beyond Brief Chapter Outline
Transcriptomics Jim Noonan GENE 760.
Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.
INTRODUCTION In natural conditions plants are continuously exposed to number of pathogens both biotrophs and necrotrophs. Plants respond to invading fungi.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Evaluation of PacBio sequencing to improve the sunflower genome assembly Stéphane Muños & Jérôme Gouzy Presented by Nicolas Langlade Sunflower Genome Consortium.
Puccinia graminis genome project Les J Szabo USDA ARS Cereal Disease Lab Department of Plant Pathology University of Minnesota.
Fine Structure and Analysis of Eukaryotic Genes
20.1 – 1 Look at the illustration of “Cloning a Human Gene in a Bacterial Plasmid” (Figure 20.4 in the orange book). If the medium used for plating cells.
This Week: Mon—Omics Wed—Alternate sequencing Technologies and Viromics paper Next Week No class Mon or Wed Fri– Presentations by Colleen D and Vaughn.
20.1 – 1 Look at the illustration of “Cloning a Human Gene in a Bacterial Plasmid” (Figure 20.4 in the orange book). If the medium used for plating cells.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
Genomic assessment of mass-reared vs wild Hawaiian Mediterranean fruit flies Bernarda Calla, Brian Hall, Shaobin Hu, and Scott Geib Tropical Crop and Commodity.
Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
Understanding genes using mathematical tools Adam Sartiel COMPUGEN.
The iPlant Collaborative
Melampsora Genome Annotation and Genome Structure Analysis First Annotation Workshop of the Melampsora Genome Consortium Yao-Cheng Lin Bioinformatics &
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Transposable elements in Melampsora larici-populina genome Marie-Pierre Oudot-Le Secq Melampsora Genome Consortium 2008 Summer Workshop Melampsora Genome.
Genomics.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Introduction to RNAseq
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
RNA-Seq data analysis Xuhua Xia University of Ottawa
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
IFAFS Meeting Gene Expression – Disease and Water Deficit John Davis.
BLAST Sequences queried against the nr or grass databases. GO ANALYSIS Contigs classified based on homology to known plant or fungal genes Next.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
From: Duggan et.al. Nature Genetics 21:10-14, 1999 Microarray-Based Assays (The Basics) Each feature or “spot” represents a specific expressed gene (mRNA).
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
How to design arrays with Next generation sequencing (NGS) data Lecture 2 Christopher Wheat.
An EST library from Puccinia graminis f. sp. tritici reveals genes potentially involved in fungal differentiation Katja Broeker, Frank Bernard & Bruno.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Canadian Bioinformatics Workshops
Building Excellence in Genomics and Computational Bioscience miRNA Workshop: miRNA biogenesis & discovery Simon Moxon
Next generation sequencing
Quality Control & Preprocessing of Metagenomic Data
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
John Rathjen and group ANU
Genome organization and Bioinformatics
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Schematic representation of a transcriptomic evaluation approach.
Presentation transcript:

The genome sequence of Melampsora larici-populina, the causal agent of the poplar rust disease M. larici-populina Transcriptome Mlp Summer workshop – INRA Nancy, August Duplessis Sébastien (INRA Nancy) Tree/Microbe Interactions Joint Unit, INRA/University Nancy, UMR 1136 IAM

Mlp Transcriptome – Goals and Means Goals Gene Expression - Identify genetic determinants involved in Mlp biology - Identify sets of genes involved in development of infection structures (secretion, effectors, avirulence,...) - Identify sets of genes involved in biotrophy (nutrition, transport) - Identify expression profiles expressed during plant-fungal interaction Gene Models Annotation - Validation of Gene Models prediction - Detection of new Gene Models

Mlp Transcriptome – Goals and Means Means EST sequencing - Sanger ESTs from specific cDNA library (cDNA cloning / s ESTs) pyrosequencing from specific tissue (no cDNA cloning / k reads) 454: 80 Mb in 1 run for 10K€ vs. 1000s of Sanger ESTs for much more => Genes expressed in a given tissue (specific and ubiquitous) => No gene prediction a priori Array-based expression profiling - DNA Chips – NimbleGen Systems oligonucleotide arrays => Expression of all predicted genes represented on the array => Gene prediction a priori or EST sequencing required

Mlp Transcriptome – EST sequencing I cDNA Library of Mlp 98AG31 uriniospores and germlings 250 µg of DNase free-RNA were isolated from Mlp 98AG31 urediniospores and germlings (urediniospores grown for less than 12h on agar) sent to JGI Mlp is an obligate biotroph so spores are unique sources for uncontaminated ESTs cDNA Library => 29,081 cDNA clones 5'/3' sequencing => 52,269 ESTs (including ~ 4,500 ESTs previously obtained at INRA Nancy) EST assembly => 11,535 Consensus (mean size 780nt: 100 -> 5052 nt) — 6,599 singletons — 4,936 clusters — 119 consensus contain > 50 ESTs Best Blast Hits of most abundant ESTs consisted in: — stress response TF rds1, HSP, glycosidase, ubiquitin, fruitingbody protein, cyclin, SOD, Ras, antibiotic resistance, protease, laccase, tubulin — dehydrogenases and cytP450 from Uromyces fabae — predicted gene models from P. graminis

Mlp Transcriptome – EST Sequencing I Comparison to released Pucciniales ESTs (e-value < ) Phakopsora pachyrizi (soybean rust) ESTs => Germinated/not germ spores, Infected tissues Puccinia graminis f. sp. tritici (wheat stem rust) => Germ/not germ urediniospores and teliospores 46,41128,5365,858 45,81256,7536,483 Mlp PpPgt 4,045 Pgt spore ESTs 5,738 Pp spore ESTs

Mlp Transcriptome – EST Sequencing I Mlp 98AG31 ESTs for Gene Prediction and Gene model support ESTs were used in JGI and EuGene predictions => 27 % of Gene Models supported => 4,507 Gene models supported ESTs to support gene curation => ESTs and clusters are shown on the JGI Melampsora website

Mlp Transcriptome – EST Sequencing II M. medusae f.sp. deltoidae (MMD) — Multiple isolates, diff. growth stages (field) M. larici-populina (MLP and MLP-H) — Multiple isolates, diff. growth stages (field) — Single isolate, haustoria-enriched (in vitro) M. medusae f.sp. tremuloidae (MMT) — Single isolate, 13 days growth (in vitro) M. occidentalis (MO) — Single isolate, 13 days growth (in vitro) cDNA Libraries from various Melampsora Spp. (Feau, Joly, Hamelin, CFS, Canada)

Mlp Transcriptome – EST Sequencing II Construction kit # clones sequenced # readable sequences # contigs# singletons MMDStratagene5,5413, MLPStratagene3,0082, MLP-HClontech3,7083, ,034 MMTClontech3,0082, MOClontech3,0082, ,285 cDNA Libraries from various Melampsora Spp.

Mlp Transcriptome – EST Sequencing II Feau et al Can.J.Bot Annotation of Melampsora Spp. ESTs

Mlp Transcriptome – EST Sequencing II Annotation of Melampsora Spp. ESTs Feau et al Can.J.Bot

Mlp Transcriptome – EST Sequencing III: 454-pyrosequencing 454-pyrosequencing of poplar leaf infected tissues Melampsora is an obligate biotroph => specialized infection structures (haustoria) formed after 16 h post-inoculation (pi) and uredinia formed after 7 dpi only in the plant host Strong Mlp invasion of plant tissues was observed at 4 dpi (Rinaldi et al., 2007) Pyrosequencing allows the generation of 100,000s sequences from isolated transcripts => 200,000 ESTs from transcripts isolated from Poplar infected leaves at 4 and 7 dpi with 454 GS-FLEX (Roche) by Cogenix — Transcripts expressed during plant infection — Transcripts involved in infection structure development, maintenance and biotrophy — Transcripts involved in spore formation and maturation — Identification of plant infection-specific transcripts by comparison with Sanger ESTs

Mlp Transcriptome – 454-pyrosequencing (From Ellegren, Mol. Ecol. 2008)

Mlp Transcriptome – 454-pyrosequencing 454-sequencing at JGI

Mlp Transcriptome – 454-pyrosequencing µg of total RNA were isolated from infected Poplar leaves ('Beaupré') at 4 hpi and 7 dpi with Mlp 98AG31 2. cDNA synthesis with SMART cDNA synthesis kit from 60 ng purified mRNA µg cDNA recovered and sent to Cogenix for 454-pyrosequencing on GS-FLEX (Roche) 4 dpi: infection hyphae, haustoria4 dpi: infection hyphae, haustoria, uredinia, spore-forming cells Pictures by S Hacquard & S Duplessis (2008) by confocal microscopy with PI/Uvitex staining

Mlp Transcriptome – 454-pyrosequencing Cogenix report on 454-sequencing 454-pyrosequencing allow to generate > 400,000 sequences or 2 x 200,000 sequences in 1 run Poplar infected tissues => ~ 185,663 sequences 454-sequences are small (mean length 203 nt) and requires assembly for transcript reconstruction Assembly by Newbler => 148,688 assembled in 10,629 contigs & 36,975 reads (= singletons?)

Mlp Transcriptome – 454-pyrosequencing Newbler assembly vs. MIRA assembly Newbler is a de novo assembler designed for genomic sequences (not transcripts) working in flow- chart space, not nucleotide space Newbler tends to eliminate several reads with no obvious reasons (>38,000 reads are lost) Cogenix recommended the use of other de novo assembler dedicated to transcript assembly CAP3 is not recommended MIRA is an ESTs assembler recently updated for 454-data => MIRA generates more contigs than Newbler => contigs (including 2,600 singletons) MIRA provides information on overall quality of sequences (tag 'too short' = low quality sequences) Genome threader (Gth) allows to map transcript sequences to a genome sequence MIRA contigs are mapped to Mlp and poplar genomes to identify fungal and plant transcripts

Mlp Transcriptome – 454-pyrosequencing Newbler vs. MIRA Mlp sequences Poplar sequences Singletons reads from Newbler are mostly low quality sequences

Mlp Transcriptome – 454-pyrosequencing Final MIRA assembly vs. poplar and Mlp genomes — Contigs that showed a Gth score < 0.9 were dissolved in singletons — Contigs attributed to both genomes with Gth scores > 0.9 were manually resolved — Contigs attributed to a genome and containing reads attributed to the other genome were manually inspected with Consed => new contigs/singletons — Singletons with Gth scores < 0.9 were not retained 5,956 contigs & 9,562 singletons attributed to Mlp 6,414 contigs & 21,400 singletons attributed to Poplar PASA (Program to Assemble Spliced Alignment) PASA is a tool designed for curation of gene catalogs using sets of ESTs and FL-CDNA and based on stringent alignment to genome sequence with GMAP, assembly in clusters based on position on genome sequence, comparison to current catalogue of gene models => curation PASA was used in several published 454-analyses, and in Arabidopsis community for gene curation PASA => Mlp EST (Sanger & 454 contigs) vs. Mlp genome/gene models

Mlp Transcriptome – 454-pyrosequencing PASA outputs for Mlp 454 Contigs PASA was run using all 454 reads against Mlp Genome and a similar number of gene models were supported

Mlp Transcriptome – 454-pyrosequencing PASA outputs for Mlp Sanger contigs Total of 6294 Mlp Gene Models supported (38%)

Mlp Transcriptome – 454-pyrosequencing Examples of gene models curation based on Mlp 454 Contigs proposed by PASA

Mlp Transcriptome – 454-pyrosequencing Most abundant transcripts supporting Mlp Gene Models identified through 454-sequencing 4010 Gene models supported by 454 ESTs — 935 no hits in nr/swissprot specific to Pucciniales specific to Mlp — 265 encodes SSPs => 166 no hits in nr/swpr - 34 specific to Pucciniales specific to Mlp

Mlp Transcriptome – NimbleGen Systems oligonucleotide arrays NimbleGen Systems Expression oligont arrays ~390, mer oligoprobes evenly distributed on 2cm 2 array 4plex arrays = 80 to 90,000 probes per array (+ controls) Set of 8 oligoprobes/gene duplicated in Laccaria bicolor 16,694 JGI models + new EuGene models with 454 support [All 454 supported new CDS ?] 17 to 20,000 Mlp Gene Models => 4 probes/genes => no duplicated probes => Populus filtered 10 x 4plex NimbleGen arrays ordered – Design ASAP Mlp Gene Expression during timecourse infection

Mlp Transcriptome – Conclusions Conclusions — 52,269 Mlp 98AG31 ESTs support 27% JGI Mlp Gene Models — ESTs from other Mlp Spp to help in annotation (+ polymorphism study) — 185, reads were assembled in 12,370 Contigs & 30,962 Singletons 5,956 contigs & 9,562 singletons attributed to Mlp by Gth 6,414 contigs & 21,400 singletons attributed to Poplar by Gth — PASA identified a total of 6294 Mlp Gene Models supported both by 454 and Sanger ESTs contigs = 38% of Mlp Gene Models (11% increase) — MIRA identified many Gene models that may need annotation — MIRA also identified more than 2,500 putative new genes (to be verified) — Among the 4,010 Gene Models expressed in planta => 519 are specific to Mlp and 391 to Pucciniales => 265 encode SSPs and 128 SSPs are specific toMlp

Mlp Transcriptome – Conclusions Ongoing… — Curation of Gene Models supported by 454 contigs — Prediction/Curation of putative new genes with 454 contigs support — Design of NimbleGen Systems Oligoarray Mlp v1.0 To come… — Alternative splicing — Presence of SNPs (Transcripts expressed in both nuclei?) — Profiles of candidate genes during timecourse infection of poplar leaves

Stéphane Hacquard (INRA Nancy) Mlp effectors Emilie Tisserant & Benoît Hilselberger (INRA Nancy) Mlp Bioinfo Yao-Cheng Lin (VIB, Ghent, BE) EuGene prediction, Mlp gene families Mlp 98AG31 the 'bad guy' genomic team at INRA UMR 1136 IAM Marie-Pierre Oudot-Le Secq (INRA Nancy) EST annotation Duplessis Sébastien & Francis Martin