Next generation gene mining to decipher CBSV resistance in cassava

Slides:



Advertisements
Similar presentations
BiGCaT Bioinformatics Hunting strategy of the bigcat.
Advertisements

Potato Mapping / QTLs Amir Moarefi VCR
Biotechnology - Using an organism to make a product, …or using advanced methods to study an organism GMO - Genetically Modified Organism Transgenic - describing.
Recombinant DNA Technology
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano,
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Plant Immunology.
Methodology Control (no treatment) Estrogen (5 uM) 4-nonylphenol (5 uM) Cultured Cells, Isolated RNA, RTed to cDNA Data analyzed by Spotfire software RT-PCR.
Whole transcriptome analysis of germinating smoke water treated maize seeds Endre Sebestyén ARI-HAS Department of Applied Genomics H-2462 Martonvásár,
Finish up array applications Move on to proteomics Protein microarrays.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Introduction The soybean cyst nematode (SCN) causes at least $600 million in annual yield-loss in the US. It was introduced in the United States in the.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Febé Meyer Dr. Sanushka Naidoo Prof. Zander Myburg Dr. Noelani van den Berg.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
ORNL scientists report the most comprehensive characterization of the subcellular proteome of Populus xylem. Contact: Udaya Kalluri,
EB3233 Bioinformatics Introduction to Bioinformatics.
Changes in Gene Expression Induced by RNA Polymerase Inhibitors in Shigella flexneri Department of Biology Loyola Marymount University November 24, 2015.
I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan.
No reference available
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
Figure S1 (a) (b) Fig. S1. Hydroponics culture of Arabidopsis thaliana. (a) Illustration of the hydroponics system in the growth chamber. (b) close-up.
High-throughput genomic profiling of tumor-infiltrating leukocytes
Risheng Chen et al BMC Genomics
“noisy” signal analysis
Alex Abaca Robert S. Kawuki Phenehas Tukamuhabwa Yona Baguma
Shin-Han Shiu and Melissa D. Lehti-Shiu Department of Plant Biology
Using DNA Subway in the Classroom
WRKY transcription factors in potato genome factors in potato genome
The Transcriptional Landscape of the Mammalian Genome
Divergence and reciprocity in signaling through closely-related oxidative stress-activated MAPKs Gregory Lampard*, Godfrey Miles*, Juergen Ehlting, Nathalie.
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
exRNA Metadata Standards
Results and Discussion Conclusion and recommendations
Microarray Experiment Design and Data Interpretation
Noor A. Abdelsamad 1, Gustavo C. MacIntosh2, and Leonor F. Leandro 1
Basics of Comparative Genomics
Understanding and Validating Experimental Expectations
Functional analysis for this locus
Exploiting the combination of natural and genetically engineered resistance to viruses impacting cassava production in Africa.
Carlos Chuquillanqui1 • Ian Barker1
Predicting Active Site Residue Annotations in the Pfam Database
Carlos Chuquillanqui1 • Ian Barker1
Volume 16, Issue 3, Pages (September 2014)
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
WRKY transcription factors in potato genome factors in potato genome
Libo Shan, Ping He, Jen Sheen  Cell Host & Microbe 
Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin  Cell Reports 
FLS2 Molecular Cell Volume 5, Issue 6, Pages (June 2000)
Figure 9. Categories of pha-siRNA-yielding genes
Gene Expression Analysis
Basics of Comparative Genomics
Volume 16, Issue 2, Pages (February 2015)
Volume 4, Issue 4, Pages (July 2011)
Volume 1, Issue 3, Pages (May 2008)
Additional file 4: Comparison of RNA-Seq and RT-qPCR expression analyses of candidate genes between RIL165 and RIL387.
Roles of Defense Hormones in the Regulation of Ozone-Induced Changes in Gene Expression and Cell Death  Enjun Xu, Lauri Vaahtera, Mikael Brosché  Molecular.
Data Type 1: Microarrays
Volume 7, Issue 7, Pages (July 2014)
Genome-wide Functional Analysis Reveals Factors Needed at the Transition Steps of Induced Reprogramming  Chao-Shun Yang, Kung-Yen Chang, Tariq M. Rana 
Stephen T. Chisholm, Gitta Coaker, Brad Day, Brian J. Staskawicz  Cell 
UVR8 Mediates UV-B-Induced Arabidopsis Defense Responses against Botrytis cinerea by Controlling Sinapate Accumulation  Patricia V. Demkura, Carlos L.
Evaluating Cassava Advanced Lines for Resistance to Cassava Mosaic and Brown Streak Diseases in Tanzania G. Mkamilo, H. P. Kulembeka, E.E Kanju, G. Matondo,
Volume 2, Issue 5, Pages (September 2009)
Volume 1, Issue 5, Pages (September 2008)
Presentation transcript:

Next generation gene mining to decipher CBSV resistance in cassava “NRI's mission is to provide distinctive, high quality and relevant research, consultancy, teaching and advice in support of sustainable development, economic growth and poverty reduction.” Next generation gene mining to decipher CBSV resistance in cassava Hale Ann Tufan Natural Resources Institute University of Greenwich

Outline Introduction Material and methods General description of RNA-seq data RNA-seq data analysis Clustering and expression profiles Gene ontology Genes of interest Conclusions www.iita.org

Threat of CBSD Genus Ipomovirus, family Potyviridae Losses of US$ 100 million annually Serious threat to cassava production in Eastern and Central Africa Spread mechanically and by whitefly vector Pressing need for new sources of resistance Herrera Campo et al., (2011) Food Security, 3:329-345

Next generation sequencing for resistance gene discovery For sequenced genomes, RNA-seq has potential to serve as a transcriptomics tool as well as marker development platform Lower cost of sequencing enables use of this technology for resistance gene discovery Varshney et al. (2009). Trends in Biotechnology, 97: 522-530

Approach Resistant and Susceptible lines Inoculate with virulent CBSV isolate Collect RNA from Control and CBSV infected plants Library construction and sequencing Data analysis Candidate genes Validation Test on cross progeny

Susceptible cv. Albert Where does Albert come from? Note: CBSD symptoms are usually absent on top leaves even in susceptible varieties Leaves show severe symptoms and plants continue to show symptoms through development Roots show symptoms of rotting.

Resistant cv. Kaleso (Namikonga) Is Kaleso also resistant in the field? Where did Kaleso come from? Landrace? SA introgression? Source? Leaves show infection early but plant look and grow ‘normal’ thereafter Roots also show no sign of symptoms.

Methods RNA isolated from 3 independent biological replicates each from 4 treatments: Albert Control, Albert CBSV, Kaleso Control, Kaleso CBSV Pool replicates after quality control RNA samples to GATC Biotech for sequencing Illumina HiSeq 2000 platform, single-end 50 bp reads Sequence reads mapped against reference genome with BWA aligner The expression table buildup made by GATC in-house software Nampula in Mozambique

General description of data ~50 million reads per sample, 50-60% of reads mapped per sample 34,151 genes total   Albert Control Albert CBSV Kaleso Control Kaleso CBSV Number of Reads Percentage All 54,045,667 - 60,070,579 38,949,010 49,681,907 Mapping to whole genome 31,632,660 59 35,964,664 60 20,946,755 54 29,534,087 Non uniquely mapped 8.674,373 27 10,282,664 29 5,526,455 26 7,563,418 Uniquely mapped 23,261,749 74 26,036,303 72 15,618,148 75 22,243,065 Resulting Reads Number of reads used in this analysis. Number of reads mapped to whole genome. *1) Number of reads mapped to more than one site of the genome. *2)*4) Number of reads mapped to exactly one site of the genome. *2) Number of reads as result of mapping/preprocessing. *2) *1) Percentage is calculated based on all reads used in this analysis. *2) Percentage is calculated based on the number of reads mapping to whole genome. *3) Percentage is calculated based on the number of reads mapped uniquely. *4) Reads have been excluded from analysis.

General description of data 28,667 genes expressed in at least one of 4 treatments Majority of these expressed in all treatments High number of Kaleso-specific genes, compared to other treatments Number of reads used in this analysis. Number of reads mapped to whole genome. *1) Number of reads mapped to more than one site of the genome. *2)*4) Number of reads mapped to exactly one site of the genome. *2) Number of reads as result of mapping/preprocessing. *2) *1) Percentage is calculated based on all reads used in this analysis. *2) Percentage is calculated based on the number of reads mapping to whole genome. *3) Percentage is calculated based on the number of reads mapped uniquely. *4) Reads have been excluded from analysis.

Data analysis Samples are pooled-limited options for data analysis Genesis software used to analyze data http://genome.tugraz.at/genesisclient/genesisclient_description.shtml CoV cutoff of 70% to identify genes with ‘significant’ gene induction between treatments K-mean clustering to identify groups of genes with similar expression patterns (50 iterations, specify 5 clusters) Min Mean Max StDev % CoV 3.1 3.32 3.52 0.17 5.11 2.7 3.34 4.37 0.73 21.96 0.24 0.39 0.65 0.19 47.69 0.009 0.02 0.04 77.92 0.16 1.63 3.19 1.65 101.29

K-Means Clusters Expression Profiles 133 Genes Kaleso CBSV specific (highly expressed) Cluster 2 86 Genes Kaleso specific Cluster 5 670 Genes Mix/ some tendency for higher expression in Kaleso CBSV Cluster 4 4180 Genes Largely unchanged/ low expression (image truncated) Cluster 3 150 Genes Albert specific

Gene Ontology Cluster 1 Cluster 2 Kaleso Control Albert Control Albert CBSV Kaleso CBSV Kaleso Control Cluster 2 Albert Control Albert CBSV Kaleso CBSV Kaleso Control

Gene Ontology Cluster 3 Cluster 5 Kaleso Control Albert Control Albert CBSV Kaleso CBSV Kaleso Control Cluster 5 Albert Control Albert CBSV Kaleso CBSV Kaleso Control

Genes highly upregulated in Kaleso CBSD (Cluster 1) Metabolism: sucrose synthase, Fatty acid hydroxylase, hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyl transferase Transcription factors: MYB domain protein, zing finger domain protein, NAC transcription factor, WRKY protein Signaling: MAPKK, MAPKKK, Leucine-rich repeat transmembrane protein kinase Defence related: Seven transmembrane MLO family protein, peroxidase, pleiotropic drug resistance 1, Disease resistance-responsive (dirigent-like protein) family protein

Genes upregulated in Kaleso CBSD (Cluster 5) Metabolism: Cinnamyl alcohol dehydrogenase 9 Transcription factors: MYB domain protein, NAC domain protein, RWP-RK domain-containing protein, WRKY DNA-binding protein Signaling: Protein kinase, receptor-like protein kinase 1, receptor serine/threonine kinase, Leucine-rich repeat protein kinase family protein, BAK1-interacting receptor-like kinase 1, cysteine-rich RLK (RECEPTOR-like protein kinase) Defence related: disease resistance family protein, peroxidase, cellulose synthase, chitinase, beta glucosidase 11, Pathogenesis-related thaumatin, jasmonate-zim-domain protein 1, ethylene responsive element binding factor 4, ACC synthase 1, ethylene-responsive element binding factor 13, PR-1 Other: RNA-dependent RNA polymerase 1, phloem protein 2-B15,

What’s next? Genes of interest UniqueID best arabidopsis TAIR10 hit name best arabidopsis TAIR10 hit symbol best arabidopsis TAIR10 hit defline cassava4.1_001246m|PACid:17989248 AT3G07040.1 RPM1,RPS3 NB-ARC domain-containing disease resistance protein cassava4.1_025993m|PACid:17982084 AT3G46710.1 cassava4.1_000944m|PACid:17992385 AT4G12010.1 Disease resistance protein (TIR-NBS-LRR class) family cassava4.1_021672m|PACid:17989194 AT1G61190.1 LRR and NB-ARC domains-containing disease resistance protein cassava4.1_000627m|PACid:17974869 AT5G17680.1 disease resistance protein (TIR-NBS-LRR class), putative

Model Gomez et al (2009) Eur. J. Plant. Path, 125: 1-22 Modified from Maule et al. (2007) Mol. Plant Path. 8: 223–231

Conclusions Pooling samples yields good results for a snapshot study Large number of genes specific to Kaleso CBSV treatment Data analysis resulted in clusters of interesting genes, subset with large upregulation in response to Kaleso CBSV Orthologues of genes well characterized to be involved in resistance responses are upregulated in response to Kaleso CBSV Limitations in experimental design- focus on dominant resistance genes (NBS-LRR) for validation and further analysis Knowledge can possibly be applied in the field- access to Albert x Kaleso cross progeny could yield very interesting results.

Thank you Please contact Dr. Maruthi Gowda at M.N.Maruthi@greenwich.ac.uk for further questions