Identifying property based sequence motifs in protein families and superfamies: application to DNase-1 related endonucleases Venkatarajan S. Mathura et.

Slides:



Advertisements
Similar presentations
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Advertisements

Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Profile-profile alignment using hidden Markov models Wing Wong.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Position-Specific Substitution Matrices. PSSM A regular substitution matrix uses the same scores for any given pair of amino acids regardless of where.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
The Protein Data Bank (PDB)
Introduction to bioinformatics
Pairwise profile alignment Usman Roshan BNFO 601.
Similar Sequence Similar Function Charles Yan Spring 2006.
Heuristic Approaches for Sequence Alignments
BLAST.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Comparing Database Search Methods & Improving the Performance of PSI-BLAST Stephen Altschul.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Psi-Blast: Detecting structural homologs Psi-Blast was designed to detect homology for highly divergent amino acid sequences Psi = position-specific iterated.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Protein Sequence Alignment and Database Searching.
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Rationale for searching sequence databases June 25, 2003 Writing projects due July 11 Learning objectives- FASTA and BLAST programs. Psi-Blast Workshop-Use.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Application of latent semantic analysis to protein remote homology detection Wu Dongyin 4/13/2015.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Construction of Substitution matrices
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
What is BLAST? Basic BLAST search What is BLAST?
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Sequence similarity, BLAST alignments & multiple sequence alignments
Blast Basic Local Alignment Search Tool
Basics of BLAST Basic BLAST Search - What is BLAST?
Dot Plots, Path Matrices, Score Matrices
Genome Annotation Continued
Sequence Based Analysis Tutorial
Motif 1 Motif 3 Motif 6 Motif 2 Motif 5 Motif 4 Motif 4 Motif 1
Secreted Fringe-like Signaling Molecules May Be Glycosyltransferases
Sequence Based Analysis Tutorial
BLAST.
Basic Local Alignment Search Tool
Alignment IV BLOSUM Matrices
Basic Local Alignment Search Tool
Presentation transcript:

Identifying property based sequence motifs in protein families and superfamies: application to DNase-1 related endonucleases Venkatarajan S. Mathura et al. Presented by Mr. Hat

Motivation “Statistically derived matrices based on allowed substitution of amino acids are not designed to detect conservation of physical–chemical properties” “Statistically derived matrices based on allowed substitution of amino acids are not designed to detect conservation of physical–chemical properties” Hmmr, psi/phi blast and rps-blast to name a few Hmmr, psi/phi blast and rps-blast to name a few MASIA could compliment these existing gene mining tools MASIA could compliment these existing gene mining tools

Methods Created quantitative descriptors E1 – E5 that described amino acid properties and their physical interpretation Created quantitative descriptors E1 – E5 that described amino acid properties and their physical interpretation Created from a comprehensive list of 237 PCP Created from a comprehensive list of 237 PCP They measured conservation by the standard deviation and relative entropy of the values E1 – E5 They measured conservation by the standard deviation and relative entropy of the values E1 – E5 Venkatarajan and Braun 2001 Venkatarajan and Braun 2001 Defined a minimum length cutoff, maximum gap thresh hold Defined a minimum length cutoff, maximum gap thresh hold

Experiment Used APE family sequences from 42 organisms Used APE family sequences from 42 organisms Both prokaryotes and eukaryotes Both prokaryotes and eukaryotes Used taxonomic classification to remove a bunch of the redundant data Used taxonomic classification to remove a bunch of the redundant data Each motif is represented as a “profile” Each motif is represented as a “profile” Consisting of average values, standard deviation and relative entropies for each vector E1 - E5 Consisting of average values, standard deviation and relative entropies for each vector E1 - E5 MASIA MOTIF MAKER MASIA MOTIF MAKER

Experiment (cont.) Used these profiles to search ASTRAL40 database Used these profiles to search ASTRAL40 database

Example score matrix for motif 2 and it’s corresponding E1 – E5 values Example score matrix for motif 2 and it’s corresponding E1 – E5 values * means low relative entropy * means low relative entropy + means significant component + means significant component - not a significant component - not a significant component

Results MASIA tool found all DNase-like superfamily members in ASTRAL40 MASIA tool found all DNase-like superfamily members in ASTRAL40 But this doesn’t show specificity?? But this doesn’t show specificity?? PSI-Blast --default parameters PSI-Blast --default parameters Used all 42 sequences to seed psi blast Used all 42 sequences to seed psi blast Performed local and NCBI psi-blast Performed local and NCBI psi-blast Searched “non-redundant sequence database” – NR/NT??? Searched “non-redundant sequence database” – NR/NT??? Found no DNase-I or IPP sequences after several iterations Found no DNase-I or IPP sequences after several iterations

Results (cont.) PSI-blast (cont.) PSI-blast (cont.) Evalue was increased to.1 Evalue was increased to.1 DNase-I was found after four iterations, but it also brought in 500 other junk sequences DNase-I was found after four iterations, but it also brought in 500 other junk sequences Failed to find DNase-I in the ASTRAL40 database Failed to find DNase-I in the ASTRAL40 database

How bout them PCP motifs and MASIA! This could possible improve my gene hunting capabilities! Now if I just had fingers to type! By the way, where are those fat BBS mice, I’m getting hungry!