Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Periodic clusters. Non periodic clusters That was only the beginning…
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Gene regulatory network
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Sequence Motifs. Motifs Motifs represent a short common sequence –Regulatory motifs (TF binding sites) –Functional site in proteins (DNA binding motif)
Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush.
BACKGROUND E. coli is a free living, gram negative bacterium which colonizes the lower gut of animals. Since it is a model organism, a lot of experimental.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Transcriptional profiling I – microarrays and proteomics
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Finding Regulatory Motifs in DNA Sequences
WMES3103 : INFORMATION RETRIEVAL INDEXING AND SEARCHING.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Inferring transcriptional and microRNA-mediated regulatory programs in glioblastma Setty, M., et al.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Copyright OpenHelix. No use or reproduction without express written consent1.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Protein and RNA Families
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Parsing A Bacterial Genome Mark Craven Department of Biostatistics & Medical Informatics University of Wisconsin U.S.A.
Motif discovery and Protein Databases Tutorial 5.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.
Introduction to biological molecular networks
Local Multiple Sequence Alignment Sequence Motifs
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Module Networks BMI/CS 576 Mark Craven December 2007.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Transcription factor binding motifs (part II) 10/22/07.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
Clustering Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Regulation of Gene Expression
Yiming Kang, Hien-haw Liow, Ezekiel Maier, & Michael Brent
Learning Sequence Motif Models Using Expectation Maximization (EM)
Inferring Models of cis-Regulatory Modules using Information Theory
Recitation 7 2/4/09 PSSMs+Gene finding
Finding regulatory modules
Diverse patterns, similar mechanism
Nora Pierstorff Dept. of Genetics University of Cologne
BIOBASE Training TRANSFAC® ExPlain™
Presentation transcript:

Inference with Gene Expression and Sequence Data BMI/CS Mark Craven April 2002

Announcements HW #3 ready –HMMs for recognizing genes in E. coli –due April 29 exam –6pm Monday 5/6 or Tuesday 5/7 plan –text analysis (on Wednesday) –Thomas Anantharaman guest lecture (next Monday) –SCFGs for RNA modeling (2 lectures) –modeling cellular systems (1 lecture) –ontologies and data integration (1 lecture) reading for Wednesday –Genes, Themes and Microarrays, Shatkay et al., Proc. of ISMB 2000

Gene Expression + Sequence Data Brazma et al., Genome Research, 1998 task: predict transcription factor binding sites in DNA –can do this using only sequence data –but can get additional evidence using expression data

Describing TF Binding Sites various representations have been used to describe these motifs –n-mers (fixed length words) –position-specific probabilities (e.g. MEME) –regular expressions Brazma et al. use restricted class of regular expressions. matches any character [] matches any character inside of brackets example: A[TG].C matches all strings that start with A followed by either T or G followed by any character followed by C

Predicting TF Binding Sites given: –a cluster C of genes with similar expression profiles –300 base upstream sequences for all genes do: –find motifs that occur more frequently in sequences upstream from genes in C than upstream from other genes

Extracting Reg-Ex Patterns Brazma et al. build a suffix trie for all upstream sequences here is a suffix trie for ATACATA$ like a suffix tree but each edge represents the addition of just one character

Extracting Reg-Ex Patterns represent all sequences in suffix trie at all nodes maintain a list of matching subsequences during construction, prune trie at a node if –pattern has fewer than t occurrences –pattern matches fewer than p positive sequences –etc.

Scoring TF Binding Sites the scoring method used by Brazma et al. is essentially a likelihood ratio a given pattern a sequence from the positive class a sequence from the negative class

Experiment Brazma et al. did this for one clustering of the yeast diauxic shift expression time series patterns extracted in two restricted reg-ex representations –at most 3 wild cards, no group characters –at most on group character (two alternatives), plus wild cards validated patterns by comparing against TRANSFAC, a DB of known transcription factor binding sites

High-Scoring Patterns for One Cluster

Discussion genes in a cluster may not be regulated by same TPs –noise in expression data –subjectivity of clustering –coincidence in expression does not necessarily imply same regulation mechanisms