Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Basis State Prediction of Cell-Cycle Transcription Factors in Saccharomyces cerevisiae Dr. Matteo Pellegrini Dr. Shawn Cokus Sherri Rose UCLA Molecular,
Yeast Dataset Analysis Hongli Li Final Project Computer Science Department UMASS Lowell.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
Indiana University Bloomington, IN Junguk Hur Computational Omics Lab School of Informatics Differential location analysis A novel approach to detecting.
Tutorial 5 Motif discovery.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
Fuzzy K means.
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.
Computational Approaches for Understanding Biological Significance of Microarray Data Liangjiang (LJ) Wang KSU Bioinformatics Center, Biology.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Gene Expression Clustering. The Main Goal Gain insight into the gene’s function. Using: Sequence Transcription levels.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Network Analysis and Application Yao Fu
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Regulation of Gene Expression: An Overview  Transcriptional  Tissue-specific transcription factors  Direct binding of hormones, growth factors, etc.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Finish up array applications Move on to proteomics Protein microarrays.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Motif discovery and Protein Databases Tutorial 5.
Finding Transcription Factor Motifs Adapted from a lab created by Prof Terry Speed.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Local Multiple Sequence Alignment Sequence Motifs
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.
Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, , 10.4,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
BIO : Bioinformatics Lab
1 Department of Engineering, 2 Department of Mathematics,
Dennis Shasha, Courant Institute, New York University With
1 Department of Engineering, 2 Department of Mathematics,
A Zero-Knowledge Based Introduction to Biology
1 Department of Engineering, 2 Department of Mathematics,
Manfred Schmid, Agnieszka Tudek, Torben Heick Jensen  Cell Reports 
Presentation transcript:

Bio277 Lab 3: Finding Transcription Factor Binding Motifs Adapted from a Lab Written by Prof Terry Speed Jess Mar Department of Biostatistics Quackenbush Lab DFCI

Outline Analyze cell cycle gene expression data. Cluster cell cycle data using hierarchical clustering. Visualize cell cycle clusters. Find motifs in these clusters and visualize them using sequence logos.

The Cell Cycle

Cell Cycle Data Set Experiments assayed mRNA expression patterns over the duration of one cell cycle (at least). Custom cDNA microarray platform. RNA samples from Saccharomyces cerevisiae cell culture. 3 methods of synchronization -  -factor arrest, cdc15, elutriation. Today's data:  -factor arrest (blocks cell division in G1). ~6000 genes x 17 times points Sampled at 7min intervals over 120min, starting at time zero. See Paper: Spellman et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 1998, p3273.

Experimental Data From the ~6000 yeast genes, we have chosen to focus on those involved in key biological processes (such as cell cycle, oxidative phosphorylation and nucleotide metabolism ). Read the data into R: dat <- read.table("ccexpdata.txt", header=T, sep="\t") Objective: find transcription factor binding sites implicated in the cell cycle. How do we search for these binding sites? Where do we begin to search?

Linking Gene Expression and Promoters One canonical representation of gene regulation. Genes that are regulated by the same transcriptional program share similar expression patterns. But co-expression does not always imply co-regulation. We look to upstream promoter regions to see if we can elucidate common regular expression patterns. Statistically over-represented patterns are potential transcription binding sites.

Building Gene Expression Clusters distMat <- dist(dat, method="euclidean") clustObj <- hclust(distMat) plot(clustObj) How many clusters should we use? cluster.labels <- cutree(clustObj, 15) print(table(cluster.labels)) The cluster distribution looks like: barplot(table(cluster.labels), xlab="Cluster Size", ylab="Frequency")

Visualizing Clusters par(mfrow=c(2,4)) for( i in 1:8 ){ titleLab <- paste("Cluster ", i, sep="") expr.prof <- as.matrix(dat[cluster.labels == i,]) plot(expr.prof[1,], ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab) apply(expr.prof, 1, lines) } Let's plot the first 8 clusters:

par(mfrow=c(2,4)) for( i in 9:15 ){ titleLab <- paste("Cluster ", i, sep="") expr.prof <- as.matrix(dat[cluster.labels == i,]) plot(expr.prof[1,], ylim=range(expr.prof, na.rm=T), type="l", xlab="Time", ylab="Expression", main=titleLab) apply(expr.prof, 1, lines) }

Exporting Expression Clusters Write out the gene names in each cluster into a text file: for( i in 1:15 ){ cluster.genes <- row.names(dat)[cluster.labels == i] fileName <- paste("cluster", i, ".txt", sep="") write(cluster.genes, fileName) } Are they there? dir()

Retrieving Promoter Sequences Let's focus on Cluster 12. We can retrieve the promoter sequences for these genes using a tool called RSA: When working on yeast genomics, another great resource is:

TF Motif Finding Tools MEME AlignACE BioProspector

Making Sequence Logos WebLogo SEQLOGO

TRANSFAC Database Database on eukaryotic cis-acting regulatory transcription factors. SITE: gives information on (regulatory) transcription factor binding sites within eukaryotic genes. GENE: explanation of the gene where a site (or group of sites) belongs to. FACTOR: describes the proteins binding to these sites. CELL gives brief information about the cellular source of proteins that have been shown to interact with the sites. CELL: gives brief information about the cellular source of proteins that have been shown to interact with the sites. CLASS: contains some background information about the transcription factor classes. MATRIX: gives nucleotide distribution matrices for the binding sites of transcription factors.

Public Data Repositories for Gene Expression Studies experiments available. Expression profiles derived from 180 experiments, genes available expression platforms samples.