Cis/TF discovery for Arabidopsis Aristotelis Tsirigos NYU Computer Science.

Slides:

Advertisements

Similar presentations

Yinyin Yuan and Chang-Tsun Li Computer Science Department

Advertisements

Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.

PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.

Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.

From Sequence to Expression: A Probabilistic Framework Eran Segal (Stanford) Joint work with: Yoseph Barash (Hebrew U.) Itamar Simon (Whitehead Inst.)

March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali.

Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.

Open Day 2006 From Expression, Through Annotation, to Function Ohad Manor & Tali Goren.

Exhaustive Signature Algorithm

Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.

Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.

‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.

Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.

MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.

A Data Mining Method to Predict Transcriptional Regulatory Sites Based On Differentially Expressed Genes in Human Genome HSIEN-DA HUANG, HUEI-LINA and.

O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Identifying Regulatory Transcriptional Elements on Functional Gene Groups Using Computer-

The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.

An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.

MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.

Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.

In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)

Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.

Reconstruction of Gene Regulatory Networks from RNA-Seq Data Jianlin Jack Cheng Computer Science Department University of Missouri, Columbia ACM-BCB, 2014.

Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.

Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.

Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Gene expression profiling identifies molecular subtypes of gliomas

MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.

A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.

Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.

Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.

Significance analysis of microarrays (SAM) SAM can be used to pick out significant genes based on differential expression between sets of samples. Currently.

Queensland University of Technology CRICOS No J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.

Predicting in vivo binding sites of RNA-binding proteins using mRNA secondary structure Naimul Arif( ), Tanvir Ahmed( ) Department of Computer.

Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.

Microarray data analysis David A. McClellan, Ph.D. Introduction to Bioinformatics Brigham Young University Dept. Integrative Biology.

Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.

CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.

Searching for structured motifs in the upstream regions of hsp70 genes in Tetrahymena termophila. Roberto Marangoni^, Antonietta La Terza*, Nadia Pisanti^,

Copyright OpenHelix. No use or reproduction without express written consent1.

Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.

MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.

Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.

Detecting binding sites for transcription factors by correlating sequence data with expression. Erik Aurell Adam Ameur Jakub Orzechowski Westholm in collaboration.

While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Pattern Discovery and Recognition for Genetic Regulation Tim Bailey UQ Maths and IMB.

Comp. Genomics Recitation 10 4/7/09 Differential expression detection.

Annotating Gene List From Literature Xin He Department of Computer Science UIUC.

. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.

Input: Alignment. Model parameters from neutral sequence Estimation example.

Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.

Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.

Transcription factor binding motifs (part II) 10/22/07.

Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.

An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,

Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.

1 Discovery of Conserved Sequence Patterns Using a Stochastic Dictionary Model Authors Mayetri Gupta & Jun S. Liu Presented by Ellen Bishop 12/09/2003.

Cluster Analysis II 10/03/2012.

Significance analysis of microarrays (SAM)

Mapping Global Histone Acetylation Patterns to Gene Expression

Nora Pierstorff Dept. of Genetics University of Cologne

Predicting Gene Expression from Sequence

Presentation transcript:

Cis/TF discovery for Arabidopsis Aristotelis Tsirigos NYU Computer Science

2 Outline Input data The proposed model Results on yeast Results on arabidopsis Unsupervised pattern discovery

3 Input data

4 ~23,000 genes 25 points 1,500bp upstream gctaagc...

5 Normalization ~23,000 genes 25 points 1,500bp upstream normalize columns (mean=0) gctaagc...

6 Filtering ~23,000 genes 25 points 1,500bp upstream normalize columns (mean=0, stdev=1) ~5,000 genes 25 points gctaagc... motif bitmap … filter out low-variance

7 The proposed model

8 Assumption 1 A single TF binds on a single cis element (motif) Source: U.S. Department of Energy Genomics (

9 Assumption 2 TFs regulate genes sharing a motif only on subset of conditions

10 Assumption 2 (cont’d) TFs regulate genes sharing a motif only on subset of conditions

11 Assumption 3 The TF expression correlates with the sum of the partially correlating expression patterns

12 Objective For each cis element (motif): –discover groups of co-regulated genes –compute aggregate motif expression For each TF: –find best correlating motifs

13 The algorithm – step 1 ~5,000 genes step 1: clustering 25 points

14 The algorithm – step 2 ~5,000 genes step 1: clustering 25 points step 2 for any motif compute its gene set......

15 The algorithm – step 3 ~5,000 genes step 1 clustering 25 points step 2 for any motif compute its gene set step 3 compute the distribution of its genes into the clusters......

16 The algorithm – step 4 ~5,000 genes step 1 clustering 25 points step 2 for any motif compute its gene set step 3 compute the distribution of its genes into the clusters step 4 determine overrepresented clusters using t-test......

17 The algorithm – final step ~5,000 genes 25 points final step compute motif aggregate expression 25 points......

18 Yeast

19 Example TF: BAS1 RANK MOTIF OCCUR corr score 1 gactcg cgagtc gactaa ttagtc tcggct gctagt agtcac p-value= gagtca p-value=0.004 Using cis/TF version 1:

20 Example TF: BAS1 Using cis/TF version 2: RANK MOTIF OCCUR signf corr score 1 ctgact agtcag ggttta taaacc gagtca p-value= tgactc atttga tcaaat agtggc gccact

21

22

23

24

25

26 Conclusions Advantages of version 2:  gives ability to focus on gene cluster that correlates best with a given TF  thus, increases overall correlation and motif rank  offers a measure of motif significance  can be extended to pairs of TFs/motifs

27 Arabidopsis

28 Procedure Permute gene cluster assignment Compile list of putative motifs Compute significance score of known motifs Repeat 1000 times Compute p-value of the score

30 TF discovery? Need data for training! (TFs and their associated binding cites) Parameters to be estimated:  number of clusters  motif size & degeneracy

31 Pattern discovery

32 TF-driven pattern discovery Unsupervised pattern discovery Find groups of genes partially correlating with TF Apply statistical filter Look for over-represented motifs in genes’ upstream regions Data for validation?

33

34 Pattern discovery example

35 “Predicting Gene Expression form Sequence” Beer & Tavazoie, Cell 2004 Group genes in 49 clusters Predict gene cluster using motifs discovered in its upstream region

36

37 Conclusions

38 Conlusions Two options: Supervised training: –uses background knowledge to construct model –needs more training data Unsupervised pattern discovery: –minimal model bias (no prior knowledge) –needs more ‘expert’ help to filter results