Discriminative Motifs Saurabh Sinha, RECOMB ’02, April 18-21 Introduction The term “motif” means the common pattern in different binding sites of a transcription.

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Transcriptional regulation and promoter analysis
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
Table 2 shows that the set TFsf-TGblbs of predicted regulatory links has better results than the other two sets, based on having a significantly higher.
March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali.
An Analysis of “Coronavirus 3CL pro proteinase cleavage sites: Possible relevance to SARS virus pathology” Connie Wu.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Profiles for Sequences
McPromoter – an ancient tool to predict transcription start sites
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
Similar Sequence Similar Function Charles Yan Spring 2006.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Generate Affy.dat file Hyb. cRNA Hybridize to Affy arrays Output as Affy.chp file Text Self Organized Maps (SOMs) Functional annotation Pathway assignment.
Counting position weight matrices in a sequence & an application to discriminative motif finding Saurabh Sinha Computer Science University of Illinois,
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
Queensland University of Technology CRICOS No J Using a Beagle to sniff for Bacterial Promoters Stefan R. Maetschke, Michael Towsey and James M.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Sequence analysis – an overview A.Krishnamachari
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Using Mixed Length Training Sequences in Transcription Factor Binding Site Detection Tools Nathan Snyder Carnegie Mellon University BioGrid REU 2009 University.
ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
Statistical Analysis for Word counting in Drosophila Core Promoters Yogita Mantri April Bioinformatics Capstone presentation.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
How can we find genes? Search for them Look them up.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Finding genes in the genome
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
HISPIG – A Discriminative Model Refinement Approach with Iterations for Detecting Regulatory Regions Takuma Tsukahara
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Regulation of Gene Expression
bacteria and eukaryotes
The Transcriptional Landscape of the Mammalian Genome
TSS Annotation Workflow
Albert Xue, Binbin Huang, Jianrong Wang
Generalizations of Markov model to characterize biological sequences
Revealing Global Regulatory Perturbations across Human Cancers
Revealing Global Regulatory Perturbations across Human Cancers
Nora Pierstorff Dept. of Genetics University of Cologne
Presentation transcript:

Discriminative Motifs Saurabh Sinha, RECOMB ’02, April Introduction The term “motif” means the common pattern in different binding sites of a transcription factor. Our work belongs to this generic motif finding problem. The salient features of this work are: –It takes a new of motif discovery, treating it as a feature selection problem –It describes a general algorithmic framework that can be specialized to work with a large class of motif models(including consensus models with degenerate symbols or mismatches, and composite motif)

–It utilizes information about the distribution of motif instances among the given promoters when assessing the motif’s over-representation, rather than looking only at the total count of occurrences. A motif is a feature of the positive sequences that leads to a good classification of positive and negative promoters. We call such motifs “discriminative motifs” or “d-motif” for brevity.

ALGORITHM

Simple Motif(degenerate) Results on known regulons –Regulons: a sets of genes known to be co- regulated, and where the binding site has been biologically characterized. –In 18 out of the 22 regulons -> top 10 ;In 15 out of these 18 -> top 2. –ROX1: an example of a false positive;58 occurrences are distributed as 10,11 and 37 in 3 genes

Composite motifs “higher order” motif or “composite” motif: m1 – d – m2

FUTURE WORK AND CONCLUION DMotifs envokes an enumerative search of the motif space, and leaves open the issue of an efficient traversal of the space. Another issue worth investigating is how to decide upon the significance of p-value scores. The general algorithm is adapted for two specific motif models, and shown to work well on real as well as synthetic data.

Raising the sensitivity without losing the specificity Data preparing –Promoter: core promoter, proximal promoter,DPE(downstream promoter elements) from database(EPD) or by aligning the first exon EPD release 73(1/20/03): starting to exploit 5’ESTs from full-length cDNA clones as a new resource for defining promoters. This new technique of TSS mapping is called ‘in silico primer extension’. Now, more than half of the EPD entries (1634) are based on 5’ EST sequences.

–Non-promoter Thought : having the candidates of the promoters with high sensitivity and trying to take the FP out— increasing the specificity. Method: –Predict the promoter using the promoter prediction tool with low threshold  to raise the sensitivity(try to classify the promoter before promoter training)

–Finding the feature among the promoters and then doing the feature discrimination. –Keeping the promoter candidates strongly related to the real discriminative feature(top rank)