Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.

Slides:



Advertisements
Similar presentations
Duncan Legge EMBL-EBI. Introduction to InterPro Introduction to InterPro Introduction to Protein Signatures & InterPro.
Advertisements

Pfam(Protein families )
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Mutiple Motifs Charles Yan Spring Mutiple Motifs.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Profiles for Sequences
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Matching Problems in Bioinformatics Charles Yan Fall 2008.
Corrections. N-linked glycosylation (GlcNac): Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)
Biology 224 Dr. Tom Peavy Sept 28 & 30
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein analysis and proteomics Friday, 27 January 2006 Introduction to Bioinformatics DA McClellan
What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Single Motif Charles Yan Spring Single Motif.
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Protein analysis and proteomics (Part 1 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
PROTEIN SEQUENCE ANALYSIS. Need good protein sequence analysis tools because: As number of sequences increases, so gap between seq data and experimental.
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
Protein Bioinformatics Course
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Chapter 6 Profiles and Hidden Markov Models. The following approaches can also be used to identify distantly related members to a family of protein (or.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Database David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein and RNA Families
Proteins to Proteomes The InterPro Database
Motif discovery and Protein Databases Tutorial 5.
Protein Domain Database
Teresa K.Attwood School of Biological Sciences University of Manchester, Oxford Road Manchester M13 9PT, UK Bioinformatics:
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Protein families, domains and motifs in functional prediction May 31, 2016.
Protein families, domains and motifs in functional prediction
Protein Families, Motifs & Domains.
Demo: Protein Information Resource
Pfam: multiple sequence alignments and HMM-profiles of protein domains
Genome Annotation Continued
Predicting Active Site Residue Annotations in the Pfam Database
PIR: Protein Information Resource
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
PROTEIN PATTERN DATABASES
Presentation transcript:

Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e. PROSITE, Pfam, etc.) or automatically generated (i.e. ProDom, DOMO) Each of them uses a different method to detect if a protein belongs to a particular domain/family (patterns, profiles, HMM)

Protein domain/family Most proteins have « modular » structures Estimation: ~ 3 domains / protein Domains (conserved sequences or structures) are identified by multiple sequence alignments Domains can be defined by different methods: –Pattern (regular expression); used for very conserved domains –Profiles (weighted matrices): two-dimensional tables of position specific match-, gap-, and insertion-scores, derived from aligned sequence families; used for less conserved domains –Hidden Markov Model (HMM); probabilistic models; an other method to generate profiles.

Some statistics 15 most common domains for H. sapiens (Incomplete) Immunoglobulin and major histocompatibility complex domain Zinc finger, C2H2 type Eukaryotic protein kinase Rhodopsin-like GPCR superfamily Pleckstrin homology (PH) domain Zinc finger, RING type Src homology 3 (SH3) domain RNA-binding region RNP-1 (RNA recognition motif) EF-hand family Homeobox domain Krab box PDZ domain (also known as DHR or GLGF) Fibronectin type III domain EGF-like domain Cadherin domain …

Protein domain/family db PROSITEPatterns /Profiles ProDomAligned motifs PRINTSAligned motifs PfamHMM (Hidden Markov Models) SMARTHMM BLOCKSAligned motifs InterPro

Prosite  Created in 1988 (SIB)  Contains functional domains fully annotated, based on two methods: patterns and profiles  Entries are deposited in PROSITE in two distinct files:  Pattern/profiles with the list of all matches in SWISS- PROT  Documentation Aug 2001: contains 1089 documentation entries that describe 1474 different patterns, rules and profiles/matrices.

Diagnostic performance List of matches

Prosite (profile): example

PFAM (HMMs): an entry

… …

PFAM (HMMs): query output

HMMs

Most protein families are characterized by several conserved motifs Fingerprint: set of motif(s) (simple or composite, such as multidomains) = signature of family membership True family members exhibit all elements of the fingerprint, while subfamily members may possess only part of it

ProDom consists of an automated compilation of homologous domain alignment. August 2001: 390 ProDom families were generated automatically using PSI-BLAST. built from non fragmentary sequences from SWISS-PROT 39 + TREMBL - May 29th, 2000

ProDom: query output example Your query

Protein domain/family: Composite databases Example: InterPro Unification of PROSITE, PRINTS, Pfam, ProDom and SMART into an integrated resource of protein families, domains and functional sites; Single set of documents linked to the various methods; Will be used to improve the functional annotation of SWISS-PROT (classification of unknown protein…) This release (3.2 july 2001) contains 3939 entries, representing 1009 domains, 2850 families, 65 repeats and 15 post-translational modifications sites.