Pfam a resource for remote homology domain identification et al NAR 2014.

Slides:



Advertisements
Similar presentations
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Advertisements

Homology Profile-HMMs Domains Protein-family Databases How to build a new (Pfam) protein family EMBO Workshop, Cape Town, 2014 Function annotation transfer.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Protein Functional Site Prediction The identification of protein regions responsible for stability and function is an especially important post-genomic.
MCSG Site Visit, Argonne, January 30, 2003 Genome Analysis to Select Targets which Probe Fold and Function Space  How many protein superfamilies and families.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Evaluating alignments using motif detection Let’s evaluate alignments by searching for motifs If alignment X reveals more functional motifs than Y using.
Protein Structure Prediction II
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Classifying the protein universe Synapse- Associated Protein 97 Wu et al, EMBO J 19:
Structure-based Evidence for Function (TIGRfam, Pfam and PDB)
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
Guiding Motif Discovery by Iterative Pattern Refinement Zhiping Wang, Mehmet Dalkilic, Sun Kim School of Informatics, Indiana University.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Modelling binding site with 3DLigandSite Mark Wass
The Pfam and MEROPS databases EMBO course 2004 Robert Finn
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
NIGMS Protein Structure Initiative: Target Selection Workshop ADDA and remote homologue detection Liisa Holm Institute of Biotechnology University of Helsinki.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Manually Adjusting Multiple Alignments Chris Wilton.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Protein Domain Database
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
(H)MMs in gene prediction and similarity searches.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
PatchFinder. The ConSurf web-server calculates the evolutionary rate for each position in the protein. Surface clusters of spatially close & conserved.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Protein families, domains and motifs in functional prediction May 31, 2016.
METHOD: Family Classification Scheme 1)Set for a model building: 67 microbial genomes with identified protein sequences (Table 1) 2)Set for a model.
A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Protein families, domains and motifs in functional prediction
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Protein Families, Motifs & Domains.
Secondary structure prediction
Pfam: multiple sequence alignments and HMM-profiles of protein domains
Matt Menke, Tufts Bonnie Berger, MIT Lenore Cowen, Tufts
Genome Annotation Continued
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Predicting Active Site Residue Annotations in the Pfam Database
Sequence Based Analysis Tutorial
Adva Yeheskel Bioinformatics Unit, Tel Aviv University 8/5/2018
Protein Sequence Analysis - Overview -
BLAST.
Protein Sequence Analysis - Overview -
Protein structure prediction.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
G. Eric Schaller, Shin-Han Shiu, Judith P. Armitage  Current Biology 
Presentation transcript:

Pfam a resource for remote homology domain identification et al NAR 2014

Build SEED MSA of representative members Build Profile-HMM Search UniProtKB Annotate EMBO Workshop, Cape Town, 2014 Building families Identify target QCs and fix Significance thresholds Abandon

Old Family New Family EMBO Workshop, Cape Town, 2014 QC: family overlaps

Old Family New Family EMBO Workshop, Cape Town, 2014 SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN QC: family overlaps

EMBO Workshop, Cape Town, 2014 Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN A – Old and New family are evolutionary related nature overlaps, profile-profile, functional residues, functional annotation, structure QC: family overlaps

EMBO Workshop, Cape Town, 2014 A – Old and New family are evolutionary related Solution 1: Merge Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN QC: family overlaps

EMBO Workshop, Cape Town, 2014 A – Old and New family are evolutionary related Solution 2: Create/Add to clan Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN Clan QC: family overlaps

EMBO Workshop, Cape Town, 2014 A – Old and New family are NOT evolutionary related -> then overlaps might be false positives Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN QC: family overlaps

A – Old and New family are NOT evolutionary related Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN Solution 1: Separate (expunge seqs from SEED, trim ends, raise threshold) QC: family overlaps

A – Old and New family are NOT evolutionary related Old Family New Family SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMI SN Solution 2: Manually Edit (no change to family but sequence removed) QC: family overlaps

Overlaps Hits Score vs Taxonomic distribution Known annotation (e.g. functional/structural residues) Known structures … EMBO Workshop, Cape Town, 2014 False positive detection

Build SEED MSA of representative members Build Profile-HMM Search UniProtKB Annotate EMBO Workshop, Cape Town, 2014 Building families Identify target QCs and fix Significance thresholds Abandon

Are all Pfam families structural domains? EMBO Workshop, Cape Town, 2014

PDB (43%) No PDB (57%) Pfam families with/without PDB structure EMBO Workshop, Cape Town, 2014

Family Domain Repeat Motif Pfam types EMBO Workshop, Cape Town, 2014

A - Domain B - Metal stabilised domain C - 7 repeats form domain D - 9 repeats form domain could be unlimited number AB CD Domain and repeats EMBO Workshop, Cape Town, 2014

Example: Lipoprotein attachment site, LPAM_1 Alignment coloured by Residue-type Motifs EMBO Workshop, Cape Town, 2014

Family Domain Repeat Disordered Family? Pfam types EMBO Workshop, Cape Town, 2014

PDBid: 2JGC

The Pfam website EMBO Workshop, Cape Town, 2014

The Pfam website

EMBO Workshop, Cape Town, 2014

The Pfam website

EMBO Workshop, Cape Town, 2014 The Pfam website

Pfam families’ interactions: iPfam Finn et al. NAR 2013http://

TUM, January 2013 Some caveats Identifying repeats is challenging, especially with HMMER3 ->local Functional diversity within families and clans Domains of Unknown Function Family boundaries if no structure available EMBO Workshop, Cape Town, 2014

TUM, January 2013 Comparison of Enolase clan/superfamily in Pfam and SFLD SFLD: Akiva et al. NAR 2013 Picture courtesy of Patsy Babbit (UCSF)

from the Pfam blog: at How far from covering the sequence space: H. sapiens EMBO Workshop, Cape Town, 2014

Building a Pfam family EMBO Workshop, Cape Town, 2014

TUM, January KX7 Pick a target region OPEN Chimera 1. File -> Open “2KX7.pdb” 2. EMBO Workshop, Cape Town, 2014

TUM, January 2013 SELECT “2KX7.pdb (#0.1) chain A” Actions-> Ribbon-> hide 2KX7 model 1 1. Actions -> Ribbon -> show EMBO Workshop, Cape Town, 2014 Pick a target region

TUM, January 2013 Schmöe et al. Structure KX7 EMBO Workshop, Cape Town, 2014 Rcs-signaling system bacterial two component system (sensor kinase +response regulator)

TUM, January 2013 EMBO Workshop, Cape Town, 2014 Pick a target region Look-up UniprotKB ID: P39838 on the Pfam website (

TUM, January 2013 EMBO Workshop, Cape Town, 2014 Pick a target region Look-up UniprotKB ID: P39838 on the Pfam website (

TUM, January KX7 EMBO Workshop, Cape Town, 2014 Schmöe et al. Structure 2011 HK S S ABL HPt Pick a target region

TUM, January KX7 EMBO Workshop, Cape Town, 2014 Schmöe et al. Structure 2011 HK S S ABL HPt Pick a target region

EMBO Workshop, Cape Town, 2014 Pick a target region

EMBO Workshop, Cape Town, 2014 Pick a target region

Look for homologs EMBO Workshop, Cape Town, Click Start HMMER website: Finn et al. NAR 2011

Look for homologs EMBO Workshop, Cape Town, Choose “Marco-Data/Other/2KX7.fasta”

Select your dataset EMBO Workshop, Cape Town, 2014 Select rp75 in Sequence Database

Parse hits EMBO Workshop, Cape Town, 2014

Parse hits EMBO Workshop, Cape Town, 2014 Click

Check conservation and coverage EMBO Workshop, Cape Town, 2014

Check low scores EMBO Workshop, Cape Town, 2014 Scroll down

Check taxonomic distribution EMBO Workshop, Cape Town, 2014 Click Taxonomy

Check taxonomic distribution EMBO Workshop, Cape Town, 2014

Check domain architectures/overlaps EMBO Workshop, Cape Town, 2014 Click Domain

Download aligned hits EMBO Workshop, Cape Town, 2014 CLICK on Download and then on Aligned FASTA 1. Save as “RcsD-ABL-hmmer-ali.fasta” 2.

OPEN Jalview 1. File -> Input Alignment -> From File “RcsD-ABL-hmmer-ali.fasta” 2. Manipulate alignment