Protein Domain Database

Slides:



Advertisements
Similar presentations
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Advertisements

1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Pfam(Protein families )
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Profiles for Sequences
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Corrections. N-linked glycosylation (GlcNac): Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Tutorial 5 Motif discovery.
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Sequence/Structure Alignment Resources from NCBI Steve Bryant Protein Data Bank Rutgers University November 19, 2005.
Protein and Function Databases
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Automatic methods for functional annotation of sequences Petri Törönen.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
1 LSM2241 AY0910 Semester 2 MiniProject Briefing Round 5.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Database David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Motif discovery and Protein Databases Tutorial 5.
Copyright OpenHelix. No use or reproduction without express written consent1.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
InterPro Sandra Orchard.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Protein families, domains and motifs in functional prediction May 31, 2016.
A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng.
Introduction to Profile HMMs
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Demo: Protein Information Resource
Pfam: multiple sequence alignments and HMM-profiles of protein domains
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Predicting Active Site Residue Annotations in the Pfam Database
There are four levels of structure in proteins
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
BLAST.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Protein Domain Database A domain is a portion of an overall protein chain that is self-stabilizing, which is identified by some characteristic patterns Proteins may have multiple domains, each domain has a distinct evolutionary origin and function Protein functionality is determined by the combination of the domains Protein domain databases provide catalog of sequence patterns, domains, structures Some such databases include: PROSITE PFAM CDD

PROSITE PROSITE is a database of protein families and domains consisting of: biologically significant sites Patterns that help identify protein families a new sequence http://www.expasy.org/prosite/ Sample prosite query: [AV]-R-[NFY]-R-x(2,3)-[ST]-x-S-x-S [] – indicates an OR for the next entry based on the letters between the brackets A letter – (such as R, above) indicates the next entry MUST be the letter x(2,3) - 2 or 3 random entries x – any random entry Query above may be read: A OR V, then R, then N OR F OR Y, then R, then any 2 or 3 entries, then S OR T, then any 1 entry, then S, then any 1 entry, finished by an S Some matching sequences: ARFRCGHTYSDS VRYRRRTRSRS

Recognizing ProSite patterns Given is the result for a random sequence search based on the pattern: [AV]-R-[NFY]-R-x(2,3)-[ST]-x-S-x-S Most significant profile domain found by the search: “P1 Protamine,” seen below SOURCE: http://www.expasy.org/prosite/

PFAM PFAM is a collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families The PFAM database may be search for similarity to a query protein sequence PFAM may also be used to analyze proteomes and domain architectures http://pfam.sanger.ac.uk/

PFAM Entries An entry in PFAM will consist mainly of text, literature references, and links to similar patterns in PFAM Entries in PFAM are classified in four ways: Family: A collection of related proteins Domain: A structural unit which can be found in multiple protein contexts Repeat: A short unit which is unstable in isolation but forms a stable structure when multiple copies are present Motifs: A short unit found outside globular domains Source: http://pfam.sanger.ac.uk/family?acc=PF00023

Tools Using PFAM Conserved Domain Architecture Retrieval Tool (CDART) performs similarity searches of the NCBI Entrez Protein Database based on domain architecture, defined as the sequential order of conserved domains in proteins http://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi?cmd=rps iPFAM is a tool for describing physical interactions between PFAM domains http://ipfam.sanger.ac.uk/

Conserved Domain Database NCBI's Conserved Domain Database (CDD) is a collection of multiple sequence alignments for ancient domains To identify conserved domains present in a protein query, access the CD-Search service at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi A structure viewer tool, Cn3D, allows to view the domain of the Hemoglobin-like flavoprotein (Hmp) shown above. SOURCE: http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgi?uid=COG1017