Cis-regulatory evolution of duplicate genes in yeasts

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Work presentation Gaurav Moghe Feb 4 th, 2008 – March 17 th, 2008.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Comparative genomics Joachim Bargsten February 2012.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Comparative Motif Finding
FOG: High-Resolution Fungal Orthologous Groups René van der Heijden Project 5.10: Comparative genomics for the prediction of protein function and pathways.
Comparative Genome Analysis. Comparative yeast genomics Kellis et al (2003) Nature 423,
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
SAGExplore web server tutorial for Module II: Genome Mapping.
Blast 1. Blast 2 Low Complexity masking >GDB1_WHEAT MKTFLVFALIAVVATSAIAQMETSCISGLERPWQQQPLPPQQSFSQQPPFSQQQQQPLPQ QPSFSQQQPPFSQQQPILSQQPPFSQQQQPVLPQQSPFSQQQQLVLPPQQQQQQLVQQQI.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Yeast genome sequencing: the power of comparative genomics MEDG 505, 03/02/04, Han Hao Molecular Microbiology (2004)53(2), 381 – 389.
Sequence analysis – an overview A.Krishnamachari
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Computational Identification of Drosophila microRNA Genes Journal Club 09/05/03 Jared Bischof.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Anatomy of a Genome Project A.Sequencing 1. De novo vs. ‘resequencing’ 2.Sanger WGS versus ‘next generation’ sequencing 3.High versus low sequence coverage.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Comparative genomics Haixu Tang School of Informatics.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Cis-regulatory Modules and Module Discovery
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Construction of Substitution matrices
SAGExplore web server tutorial. The SAGExplore server has three different modules …
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Transcription factor binding motifs (part II) 10/22/07.
What is BLAST? Basic BLAST search What is BLAST?
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
What is sequencing? Video: WlxM (Illumina video) WlxM.
What is BLAST? Basic BLAST search What is BLAST?
Regulation of Gene Expression
bacteria and eukaryotes
House spider genome uncovers evolutionary shifts in the diversity and expression of black widow venom proteins associated with extreme toxicity Gendreau.
Detection of genome regulation sequences
Basics of BLAST Basic BLAST Search - What is BLAST?
Sequence based searches:
University of Pittsburgh
Genomes and Their Evolution
Department of Genetics • Stanford University School of Medicine
TSS Annotation Workflow
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
GEP Annotation Workflow
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Genome Center of Wisconsin, UW-Madison
Effect of polymorphisms on transcriptional regulation in mice
Eukaryotic Comparative Genomics
Gene Annotation with DNA Subway
Genome organization and Bioinformatics
Identify D. melanogaster ortholog
Phylogenetic footprinting and shadowing
Explore Evolution: Instrument for Analysis
Presented by, Jeremy Logue.
Schematic representation of proteogenomic annotation strategy.
Basic Local Alignment Search Tool
Presented by, Jeremy Logue.
Study phylogeny in the context of species evolution
Volume 11, Issue 7, Pages (May 2015)
Gene regulatory regions of the insect/crustacean egr-B homologs.
Presentation transcript:

Cis-regulatory evolution of duplicate genes in yeasts Gaurav Moghe January-February 2009

Background S. cerevisiae S. bayanus S. castellii C. glabrata S. kluyveri K. lactis E. gossypii

Goal Scer Scas Klac Pre-WGD species Ago

Sequences used for the study Genome sequences downloaded from GenBank and SGD ORF sequences for Post-WGD species downloaded from SGD. Upstream sequences extracted using the location information of the ORF Upstream sequences for Pre-WGD species downloaded from RSAT PWMs obtained for 124 TFs from a study by MacIsaac et al, 2006

Motif Searches Search genome using MAST 106 million sites Are MacIsaac sites being predicted? Each site has a confidence (p-value) associated with it Generate p-value threshold based on these sites Filter the other predictions using these thresholds 1.4 million sites Map the filtered predictions to intergenic regions

Motif Searches PWM used by MAST to scan genomes Are MacIsaac sites being predicted?

Nature of PWMs Split PWMs into 6 groups, based on their length and alphabet Length Class Example >6 Best large gCATGTGAA <6 Best small GATAA Better large tGCTGg.. OK large .tCGG.YsWATGGRr OK small wGACkC Poor large wwwwsyGGGG

Does size/nature have to do anything with False Positives? PHO2 AYTAAr OK small RCS1 tgCACCy Better large SWI6 rACGCG Best small MSN2 mAGGGG. Best large SUT1 .gCsGgg OK large SWI5 tGCTGg.. SKN7 kCyrgsCc Poor large YAP5 ARrCAT CST6 tgCATTT. SOK2 .cAGGmAm No 10 TFs account for ~1.1 million sites out of 1.4 million sites No relation between size/nature and False Positives No good for many other TFs

Then… We decided to use only the MacIsaac sites for searching across species Map the MacIsaac sites onto the intergenic region Look at their loss patterns in other species in orthologous promoters

Orthologous genes Orthologous genes obtained through Yeast Genome Browser (YGOB) The gene names of YGOB do not correspond to gene names provided by SGD for the sensu stricto species BLASTp to find out which YGOB annotation corresponds to which SGD annotation Some genes are being lost in this process

Then… We decided to use only the MacIsaac sites for searching across species Map the MacIsaac sites onto the intergenic region Look at their loss patterns in other species in orthologous promoters

Using MAST Ideal case Also observed for some TFs Scer Macisaac sites Sbay MAST predictions Scer: Saccharomyces cerevisiae Sbay: Sacharomyces bayanus

Using MAST Scer Macisaac sites Sbay MAST predictions Scer: Saccharomyces cerevisiae Sbay: Sacharomyces bayanus

Using Phylogeny based methods Many programs available PhyloCon Morph Phylogibbs FootPrinter Gibbs Sampler All for motif discovery, not motif search using phylogenetic principles

Using Phylogeny based methods Conserved Regulatory Elements anchored Alignment (CONREAL) Monkey (Mike Eisen) PhyloScan Conditional Shadowing via Multi-resolution Evolutionary Trees (CSMET)

Plans for the next month Test MONKEY/PhyloScan on the intergenic elements Estimate the False positive/False negative rate under the specified parameters, based on known TFBS

Novel RNA genes project January-February 2009

Download EST sequences corresponding to PUTs 737301 Map them to the genome using GMAP (L>50bp,Cov>70,Idt>90%) 605624 1353 Yes? Map to AT RNA genes 7357 Map to protein-coding regions No? Map to other AT features BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences 2500 3260 No match? 2431 BLASTn against Repetitive Sequence Database No match? Coding Index to double-verify absence of protein-like seq 1893 No match? BLASTx against all known proteins to verify absence of any protein in the sequences 1867 No match?

BLASTn against all RNA family sequences in RFAM 1867 BLASTn against all RNA family sequences in RFAM 1837 30 Manual filtering on NCBI by Andy giving ~13% False Positive Rate ~1600 novel ESTs Conservation in lyrata using GMAP RNA structure prediction Expression conservation Wet lab confirmation Substitution rate Tiling array 817 at 60% coverage and 75% Idt, nhits<=3 Shan helping out with this