Modeling Promoter and Untranslated Regions in Yeast Abstract T ranscriptional regulation is the primary form of gene regulation in eukaryotes. Approaches.

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

A Genomic Code for Nucleosome Positioning Authors: Segal E., Fondufe-Mittendorfe Y., Chen L., Thastrom A., Field Y., Moore I. K., Wang J.-P. Z., Widom.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
The HAP webserver: Tools for the Discovery of Genetic Basis of Human Disease HYUN MIN KANG Computer Science and Engineering University of California, San.
Genome-wide Regulatory Complexity in Yeast Promoters Zhu YANG 15 th Mar, 2006.
Bioinformatics Dr. Aladdin HamwiehKhalid Al-shamaa Abdulqader Jighly Lecture 3 Finding Motifs Aleppo University Faculty of technical engineering.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Comparative Motif Finding
Introduction to BioInformatics GCB/CIS535
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Discussion summary Cytoscape introduction Thomas Skøt Jensen Center for Biological Sequence Analysis The Technical University of Denmark.
Journal club 06/27/08. Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the.
MICHAEL MORRA CSE 4939W Detection of Transcription Factor Binding Sites.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
GO::TermFinder Gavin Sherlock Department of Genetics Stanford University
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Comparison of methods for reconstruction of models for gene expression regulation A.A. Shadrin 1, *, I.N. Kiselev, 1 F.A. Kolpakov 2,1 1 Technological.
Finish up array applications Move on to proteomics Protein microarrays.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Korea BioInformation Center Byoung-Chul Kim
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
From Genomes to Genes Rui Alves.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Local Multiple Sequence Alignment Sequence Motifs
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Finding genes in the genome
Transcription factor binding motifs (part II) 10/22/07.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Notes: Human Genome (Right side page)
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Chapter 13 Section 13.3 The Human Genome. Genomes contain all the information needed for an organism to grow and survive The Human Genome Project (HGP)
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
Presented by: John Lawson Developed by: John Lawson, Nathan Sheffield
The Transcriptional Landscape of the Mammalian Genome
Alignment table: group 4
SMC DATA 11/17/09.
1 Department of Engineering, 2 Department of Mathematics,
There are four levels of structure in proteins
1 Department of Engineering, 2 Department of Mathematics,
Subspace Clustering for Microarray Data Analysis:
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
1 Department of Engineering, 2 Department of Mathematics,
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
SMC DATA 11/17/09.
Presented by, Jeremy Logue.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presented by, Jeremy Logue.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Modeling Promoter and Untranslated Regions in Yeast Abstract T ranscriptional regulation is the primary form of gene regulation in eukaryotes. Approaches to identifying functional regions based on comparative genomics and microarray expression data have recently been applied in promoter and 3'-untranslated region (UTR) sequences in the yeast genome. Here we combine these approaches to construct a robust set of motifs active in the yeast genome. With this set we consider the combinatorial actions of these motifs and apply a linear model to explain observed expression. A deeper understanding of gene regulation in yeast is the first step toward understanding gene regulation and complex disease in higher organisms. Data Set 7 Yeast strains: Saccharomyces cervisiae Saccharomyces bayanus Saccharomyces castellii Saccharomyces kudriavzevii Saccharomyces mikatae Saccharomyces kluyveri Saccharomyces paradoxus 5769 promoters analyzed 1,730,700 DNA nucleotides analyzed per strain Expression data come from heat-shock microarray experiment (Stanford Microarray Database) Comparative Genomics Expression Analysis Purpose Our goal is to understand how the combinations of various Transcription Factor Binding Sites (TFBS) on a gene affect it’s expression in different experimental conditions. Linear Model To predict the contributions of motifs to a gene’s expression level. Each gene contains zero or more motifs Each motif (assumed to be a TFBS) has an “expression factor” score (+/-) for each experiment The expression of a gene is the sum of the scores of the motifs it contains Calculating the Expression Factor Gene Expression Level Motifs Y01 = = M1 + M2 + M3 Y02 = = M2 + M4 + M16 Y03 = = M1 + M3 + M10 … Using a system of linear equations, we can find the value of unknowns (M1, M2…) using any linear regression technique such as least squares. Transcription factor binding sites are not distributed uniformly in promoter regions The motif CGATGAG most frequently occurs between 60 and 100 nucleotides away from the transcription start site (where the code for a protein begins) Assumption We assume every motif is independent to each other. The same motif is bound by the same transcription factor and has the same affect on the expression. Results 331 motifs are found. Using linear regression, 22 significant active motifs are found by heat-shock expression data. Some motifs and their scores: M66 : CCCCTT(AAGGGG), M218 : CAGGGG, M259 : CCCTTAA(TTAAGGG), M264 : TAGGGG(CCCCTA), … Transcription Factor Binding Sites The YKL182W gene promoter, with highlighted Transcription Factor Binding Sites: AAGTTATAGGGGAAAACTAAAAATATAAGAAAAAAAAAGGTATTGATTGATAAGGAAAAAGAACCAAGGGAAAAAT ATAAAAAAGTACATTGGGCCTTTTCATACTTGTTATCACTTACATTACAAAGAAGAACAAACAACTTTTTTAAACG AATTTTCTTTCTTCCTTTTTCAATTTATTAATTCTTTTTTTCCATACAATTCAAGGTCAAATATATTCTTATATGC TCTTTGAATATTTCTGAAAAATATATAAAGAAAAGAAACTACAAGAACAT Comparative genomics method uses aligned sequences of several closely related species to find patterns that are conserved across multiple genomes. A high rate of conservation implies that the pattern is functional and important. Speceies1: TAATATCAAAATCAATCTCAAAATTACCACCGGTTAGAACTTGG Speceies2: TAATGTCAAAATCAATCTCAAAGTTACCACCGGTTAGAACTTGG Speceies3: TAATATCAAAATCAATCTCAAAATTACCACCAGTTAGAACCTGA Speceies4: TAATATCGAAATCAATCTCAAAATTACCACCGGTTAGAACTTGG Speceies5: TGATGTCAAAATCGATCTCGAAATTACCACCAGTCAGGACTTGG Speceies6: TAATCTCAAAATCAATTTCAAAATTACCACCCGTCATAACTTGA Speceies7: TAATTTCAAAGTCAATTTCAAAGTTACCACCGGTCAAGACTTGA Positional Analysis AARON STONESTROM Division of Biology University of California, San Diego BORIS BABENKO Computer Science and Engineering University of California, San Diego YUJING LIANG Computer Science and Engineering University of California, San Diego ELEAZAR ESKIN Computer Science and Engineering University of California, San Diego JAMAL BENHAMIDA Computer Science and Engineering University of California, San Diego Limitations Finds only transcription factors activated or deactivated in an experimental condition relative to the control. Grouping Motifs Some of the discovered motifs are minor variants or exact reverse compliments of each other. Thus, the motifs were grouped, and each group was assignment a unique id: M0 : CGGTGGCAA, GGTGGCAAG, CGTGGC M1 : AGCTCATCGC, AGCTCATAGC M2 : GCTCATCG, CGATGAGC M3 : AGCTCATCG … Annotating the Genes We can now annotate the genes with the Motif Groups that were discovered: Gene Name : Motif Groups YPR111W: M248, M319, M74 YPR148C : M12, M153, M25 YPR194C : M127, M202, M41 YAL044W-A : M255, M27, M270, M49 Significant Motifs Found Pattern CGGTGGCAA appeared 15 times, and was conserved 15 times, MCS: Pattern is conserved on: [YJL001W, YNL155W, YOR052C, YOR259C, YOR260W, YBL022C, YCL043C, YCL042W, YCR092C, YCR093W, YDL148C, YDL147W, YDL070W, YDR427W, YER012W] Pattern AGCTCATCGC appeared 29 times, and was conserved 27 times, MCS: Pattern is conserved on: [YJL109C, YKL191W, YKR024C, YKR025W, YKR081C, YKR082W, YLR014C, YLR015W, YLR106C, YLR107W, YLR336C, YMR049C, YNL248C, YNL247W, YOL125W, YPL094C, YPL093W, YCR057C, YCR072C, YCR087C-A, YDR449C, YHR052W, YHR147C, YHR148W, YHR170W, YIL127C, YIL126W]