Evaluating alignments using motif detection Let’s evaluate alignments by searching for motifs If alignment X reveals more functional motifs than Y using.

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

Parallel BioInformatics Sathish Vadhiyar. Parallel Bioinformatics  Many large scale applications in bioinformatics – sequence search, alignment, construction.
Periodic clusters. Non periodic clusters That was only the beginning…
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Structural bioinformatics
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Protein Functional Site Prediction The identification of protein regions responsible for stability and function is an especially important post-genomic.
CIS786, Lecture 7 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
DNA Barcode Data Analysis Boosting Accuracy by Combining Simple Classification Methods CSE 377 – Bioinformatics - Spring 2006 Sotirios Kentros Univ. of.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Phylogeny - based on whole genome data
Single Motif Charles Yan Spring Single Motif.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Journal club 06/27/08. Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
Genome alignment Usman Roshan. Applications Genome sequencing on the rise Whole genome comparison provides a deeper understanding of biology – Evolutionary.
My Research Work and Clustering Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Sequencing a genome and Basic Sequence Alignment
Modelling Genome Structure and Function Ram Samudrala University of Washington.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis Wen-Ting Huang.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Localising regulatory elements using statistical analysis and shortest unique substrings of DNA Nora Pierstorff 1, Rodrigo Nunes de Fonseca 2, Thomas Wiehe.
Combining Sequence and Structure Information Topic 17.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan.
Construction of Substitution matrices
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Motif Search and RNA Structure Prediction Lesson 9.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Aligning Kinases Applying MSA Analysis to the CDK family.
Bioinformatics Overview
Genome alignment Usman Roshan.
Babak Alipanahi1, Andrew Delong, Matthew T Weirauch & Brendan J Frey
Sequence based searches:
Predicting Active Site Residue Annotations in the Pfam Database
Genome organization and Bioinformatics
Geneomics and Database Mining and Genetic Mapping
Comparing read recruitment, de novo, and insertion tree strategies for phylogenetic diversity computation. Comparing read recruitment, de novo, and insertion.
Nora Pierstorff Dept. of Genetics University of Cologne
MULTIPLE SEQUENCE ALIGNMENT
Basic Local Alignment Search Tool
Deep Learning in Bioinformatics
Presentation transcript:

Evaluating alignments using motif detection Let’s evaluate alignments by searching for motifs If alignment X reveals more functional motifs than Y using technique Z then X is better than Y w.r.t. Z Motifs could be functional sites in proteins or functional regions in non- coding DNA

Protein Functional Site Prediction The identification of protein regions responsible for stability and function is an especially important post-genomic problem With the explosion of genomic data from recent sequencing efforts, protein functional site prediction from only sequence is an increasingly important bioinformatic endeavor.

What is a “Functional Site”? Defining what constitutes a “functional site” is not trivial Residues that include and cluster around known functionality are clear candidates for functional sites We define a functional site as catalytic residues, binding sites, and regions that clustering around them.

Protein

Protein + Ligand

Functional Sites (FS)

Regions that Cluster Around FS

Phylogenetic motifs PMs are short sequence fragments that conserve the overall familial phylogeny Are they functional? How do we detect them?

Phylogenetic motifs PMs are short sequence fragments that conserve the overall familial phylogeny Are they functional? How do we detect them? First we design a simple heuristic to find them Then we see if the detected sites are functional

Scan for Similar Trees Whole Tree

Scan for Similar Trees Whole Tree

Scan for Similar Trees Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 6 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 8 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 4 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 6 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 8 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 6 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 6 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 0 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 6 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 6 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 8 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 0 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 6 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 6 Windowed Tree Whole Tree

Scan for Similar Trees Partition Metric Score: 6 Windowed Tree Whole Tree

Phylogenetic Motif Identification Compare all windowed trees with whole tree and keep track of the partition metric scores Normalize all partition metric scores by calculating z-scores Call these normalized scores Phylogenetic Similarity Z-scores (PSZ) Set a PSZ threshold for identifying windows that represent phylogenetic motifs

Set PSZ Threshold

Regions of PMs

Map PMs to the Structure

Set PSZ Threshold

Map PMs to the Structure Map Set PSZ Threshold

Map PMs to the Structure Map Set PSZ Threshold

PMs in Various Structures

PMs and Traditional Motifs

TIM Phylogenetic Similarity False Positive Expectation

TIM Phylogenetic Similarity False Positive Expectation

TIM Phylogenetic Similarity False Positive Expectation

TIM Phylogenetic Similarity False Positive Expectation

Cytochrome P450 Phylogenetic Similarity False Positive Expectation

Cytochrome P450 Phylogenetic Similarity False Positive Expectation

Enolase Phylogenetic Similarity False Positive Expectation

Glycerol Kinase Phylogenetic Similarity False Positive Expectation

Glycerol Kinase Phylogenetic Similarity False Positive Expectation

Myoglobin Phylogenetic Similarity False Positive Expectation

Myoglobin Phylogenetic Similarity False Positive Expectation

Evaluating alignments For a given alignment compute the PMs Determine the number of functional PMs Those identifying more functional PMs will be classified as better alignments

Protein datasets

Running time

Functional PMs PAl=blue MUSCLE=red Both=green (a)=enolase, (b)ammonia channel, (c)=tri-isomerase, (d)=permease, (e)=cytochrome