A brief on: Domain Families & Classification

Slides:



Advertisements
Similar presentations
Duncan Legge EMBL-EBI. Introduction to InterPro Introduction to InterPro Introduction to Protein Signatures & InterPro.
Advertisements

Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.
Pfam(Protein families )
Basics of Comparative Genomics Dr G. P. S. Raghava.
Mutiple Motifs Charles Yan Spring Mutiple Motifs.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
A brief on: Domain Families & Classification. The discovery of domains in protein structures Domains at the sequence level Examples of “Domain Resources”
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Matching Problems in Bioinformatics Charles Yan Fall 2008.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Evaluating alignments using motif detection Let’s evaluate alignments by searching for motifs If alignment X reveals more functional motifs than Y using.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Proteomics: Analyzing proteins space. Protein families Why proteins? Shift of interest from “Genomics” to “Proteomics” Classification of proteins to groups/families.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Automatic methods for functional annotation of sequences Petri Törönen.
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
Protein Bioinformatics Course
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous.
EBI web resources II: Ensembl and InterPro Yanbin Yin Fall
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
CATH – a hierarchic classification of protein domain structures Rui Kuang.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Tertiary structure combines regular secondary structures and loops (coil) Bovine carboxypeptidase A.
Protein World SARA Amsterdam Tim Hulsen.
Protein and RNA Families
Proteins to Proteomes The InterPro Database
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Protein Domain Database
Comparing and Classifying Domain Structures
Classification of protein and domain families Sequence to function Protein Family Resources and Protocols for Structural and Functional Annotation of Genome.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Protein families, domains and motifs in functional prediction May 31, 2016.
Tutorial: Bioinformatics Resources ( georgetown
Protein families, domains and motifs in functional prediction
Functional manual annotation including GO
Demo: Protein Information Resource
Basics of Comparative Genomics
Sequence based searches:
Biological Sequence Databases
Functional Annotation Final Results
Genome Annotation Continued
There are four levels of structure in proteins
Protein Bioinformatics Course
InterPro An Introduction
A brief on: Domain Families & Classification
Basics of Comparative Genomics
PROTEIN PATTERN DATABASES
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

A brief on: Domain Families & Classification

Classification to Families We can classify proteins into families by: A. Sequence B. Structure C. Function (annotation) D. Evolution

Used Terms: Motif = Domain = Signature = Profile = Seed Family = Cluster These terms are used interchangeably, They are very (too) flexible

Motif = Domain = Function ??? A motif is a sequence signature. Structural definition of a domain: an independently folding structural unit. A protein family is not well-defined. Protein function is not well-defined (some proteins can have several functions). Conclusion: these terms are used interchangeably, but they are very flexible.

Protein folds Toxin binding protein (TolB) Glucose dehydrogenase Di-isopropyl-fluorophosphatase

Dominant domain fold types. Holm and Sander. PROTEINS: Structure, Function, and Genetics 33:88–96 (1998)

Why Research Protein Families? Function prediction and annotation. Evolutionary research - finding orthologs and paralogs. Search for new protein folds. Functional research by similarity in characteristics.

Domains are the building blocks of evolution: some facts.. Each occurs in diverse sets of protein families Number of domains in proteins ranges from 1 up to tens Structural based domain are ~ 150 aa Length varies: some are very short 30-40 aa, other are long > 500 aa Domain definition is somewhat blurred Domain boundary is an unsolved problem Pyruvate kinase, PDB:1pkn

How is a novel gene born? Domains are the evolutionary units of sequence that comprise the gene coding regions. Most genes are built from more than one domain. Novel genes can be created by recombination of domains into new domain arrangements.

Correspondence between functional associations and genes linked by the fusion method From Glycolysis: M. genitalium PGK Glycerone-P M. genitalium TIM PGK1 M. genitalium GAPDH Glyceraldehyde-3P GAPDH Glycerate-1,3P2 Thermotoga Maritima PGK+TIM TIM Strart with this one Lines coming out Glycerate-3P Phytophthora infestans TIM+GAPDH

What is a Protein Family? Protein family: A group of proteins that have a common protein ancestor. Is it that simple? Domains: non-linear evolution Who is in this family?

A protein can have several same or different domains Fibronectin protein–1fnf

The Power of Integration Pfam, Prosite, SMART, PRINTS, tigrFam, ProDom SCOP CATH FSSP GO KEGG Pfam, Prosite, SMART, PRINTS, tigrFam, ProDom InterPro