Finding Transcription Factor Motifs Adapted from a lab created by Prof Terry Speed.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Gene Regulation and Microarrays. Finding Regulatory Motifs Given a collection of genes with common expression, Find the TF-binding motif in common......
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Gene Regulation and Microarrays. Overview A. Gene Expression and Regulation B. Measuring Gene Expression: Microarrays C. Finding Regulatory Motifs.
Promoters Information about where to start transcription.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Identification of regulatory elements. Transcriptional Regulation Strongest regulation happens during transcription Best place to regulate: No energy.
Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.
Transcription factor binding motifs (part I) 10/17/07.
DNA Regulatory Binding Motif Search Dong Xu Computer Science Department 109 Engineering Building West
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
MotifBooster – A Boosting Approach for Constructing TF-DNA Binding Classifiers Pengyu Hong 10/06/2005.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Classical tree view of cell cycle data (Spellman, et al MolBiolCell 9, 3273)
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
Fuzzy K means.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Regulatory element detection using correlation with expression (REDUCE) Literature search WANG Chao Sept 14, 2004.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.
TimeSearcher: Interactive Querying for Identification of Patterns in Genetic Microarray Time Series Data Harry Hochheiser Ben Shneiderman Eric Baehrecke,
Computational Approaches for Understanding Biological Significance of Microarray Data Liangjiang (LJ) Wang KSU Bioinformatics Center, Biology.
REGULATORY GENOMICS Saurabh Sinha, Dept. of Computer Science & Institute of Genomic Biology, University of Illinois.
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Cis-regulatory element study in transcriptome Jin Chen CSE Fall
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Web-based/Open-source Tools for Bioinformatics and Genome Analysis.
Transcription factor binding sites and gene regulatory network Victor Jin Department of Biomedical Informatics The Ohio State University.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Finish up array applications Move on to proteomics Protein microarrays.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Doug Brutlag 2011 Genomics, Bioinformatics & Medicine Doug Brutlag Professor Emeritus of.
Learning the cis regulatory code by predictive modeling of gene regulation (MEDUSA) Christina Leslie Center for Computational Learning Systems Columbia.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Copyright OpenHelix. No use or reproduction without express written consent1.
Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Transcription factor binding sites and gene regulatory network
Cold Adaptation in Budding Yeast
Dennis Shasha, Courant Institute, New York University With
A Zero-Knowledge Based Introduction to Biology
Control of Gene Expression in Eukaryotic cells
Cold Adaptation in Budding Yeast
Stop that Noise and Turn Up the Antisense Transcription
Static properties of transcription factors (TFs) within the hierarchical framework. Static properties of transcription factors (TFs) within the hierarchical.
Wu-Cheng Shen, Michael R Green  Cell 
Molecular structure of MFS1 promoter genotypes.
Presentation transcript:

Finding Transcription Factor Motifs Adapted from a lab created by Prof Terry Speed

Cell Cycle Data Set Spellman et al. (1998). Comprehensive identification of cell cycle- regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Synchronized population of yeast cells using three independent methods (alpha factor arrest, elutriation, arrest of cdc15 temperature sensitive- mutant). Extracted RNA  microarray experiments to determine expression of ~6000 genes over 18 time points. See

Outline Read in cell cycle data into R. Cluster cell cycle data using hierarchical clustering. Visualize cell cycle clusters. Find motifs in these clusters and visualize them using sequence logos.

Experimental Data 783 genes involved in the yeast cell cycle Expression levels measured for 18 time points Read the data into R: > dat <- read.table("ccdata.txt", header=T, sep="\t")

Hierarchical Clustering > distMat <- dist(dat) > clustObj <- hclust(distMat) > plot(clustObj)

Create Gene Expression Clusters Let's cut the dendrogram into 16 clusters: > cutObj <- cutree(clustObj, k=16) > print(table(cutObj)) Write out the gene names in each cluster into a text file: for( i in 1:16 ){ cluster.genes <- row.names(dat)[cutObj == i] fileName <- paste("cluster", i, ".txt", sep="") write(cluster.genes, fileName) }

What Do These Clusters Look Like? par(mfrow=c(2,4)) for( i in 1:8 ){ titleLab <- paste("Cluster ", i, sep="") expr.prof <- as.matrix(dat[cutObj == i,]) plot(expr.prof[1,], ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab) apply(expr.prof, 1, lines) } Let's plot the first 8 clusters:

What Do These Clusters Look Like? par(mfrow=c(2,4)) for( i in 9:16 ){ titleLab <- paste("Cluster ", i, sep="") expr.prof <- as.matrix(dat[cutObj == i,]) plot(expr.prof[1,], ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab) apply(expr.prof, 1, lines) } The remaining 8 clusters:

Picking Clusters for TF Motifs > barplot(table(cutObj), main="Cluster Sizes", xlab="Number of Genes") We want to select a cluster with a reasonably large number of genes to look for upstream TF binding site motifs. Co-expression  Co-regulation. Hence we look to the promoter regions to see if we can elucidate common regular expression patterns. Statistically over-represented patterns are potential transcription binding sites.

Extracting Promoter Sequences Promoter sequence retrieval can be performed using RSA:

TF Motif Finding Tools MEME BioProspector Improbizer Verbumculus OligoAnalysis Mobydick

TF Motif Finding Tools MDScan Weeder Gibbs Motif Sampler AlignACE CONSENSUS

Making Sequence Logos WebLogo