PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.

Slides:



Advertisements
Similar presentations
EBSCO Discovery Service
Advertisements

THOMSON REUTERS INTEGRITY SM : INTEGRATED DRUG DISCOVERY AND DEVELOPMENT PORTAL.
AS ICT Building MS-Access Databases. Creating a Combo Box Drop Down List 1.When creating a Field requiring a Drop Down list in a Table, select the Lookup.
Support.ebsco.com CINAHL with Full Text Advanced Searching Tutorial.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Pfam(Protein families )
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Introduction To Form Builder
Introduction To Form Builder
An introduction to using the AmiGO Gene Ontology tool.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Automatic methods for functional annotation of sequences Petri Törönen.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Classroom User Training June 29, 2005 Presented by:
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
1 Protein Bioinformatics – Advances and Challenges Sona Vasudevan Peter McGarvey BY.
Anastasia Nikolskaya PIR (Protein Information Resource) Georgetown University Medical Center
Pathway Assignments. The assignment – Annotating Pathways KEGG Pathway Database.
® Microsoft Office 2013 Access Building a Database and Defining Table Relationships.
SAGExplore web server tutorial for Module II: Genome Mapping.
Lesson 2.  To help ensure accurate data, rules that check entries against specified values can be applied to a field. A validation rule is applied to.
Microsoft Access 2000 Presentation 2 Creating Databases Part I (Creating Tables)
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center Washington, DC PIR: a comprehensive resource for functional.
® Microsoft Office 2010 Building a Database and Defining Table Relationships.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Databases,Tables and Forms Access Text by Grauer Chapters 1 & 2.
XP Chapter 2 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Building The Database Chapter 2 “It is only the farmer.
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 3 BACKNEXTEND 3-1 LINKS TO OBJECTIVES Modify a Table – Add, Delete, Move Fields Modify a Table.
Database Applications – Microsoft Access Lesson 7 Designing Custom Reports Updated 11/13 27 Slides in Presentation.
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Introduction to a Database Defining a database Database window in Access The six items in window: Tables, Queries Forms, Reports, Macros, Modules.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Domain Database
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
Copyright OpenHelix. No use or reproduction without express written consent1.
Advanced SRS Course 12/12/02 -Linking -Subentries -Applications.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
Excel part 5 Working with Excel Tables, PivotTables, and PivotCharts.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
MSOffice Access Microsoft® Office 2010: Illustrated Introductory 1 Part 1 ® Database & Table.
This tutorial will describe how to navigate the section of Gramene that allows you to view various types of maps (e.g., genetic, physical, or sequence-based)
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
GSEA-Pro Tutorial Anne de Jong University of Groningen.
Demo: Protein Information Resource
Sequence based searches:
CINAHL with Full Text Advanced Searching
Genome Annotation Continued
PIR: Protein Information Resource
Sequence Based Analysis Tutorial
Protein Sequence Analysis - Overview -
Introduction to Database Programs
Explore Evolution: Instrument for Analysis
Introduction to Database Programs
Presentation transcript:

PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing full-length similarity and common domain architecture Significance Improve sensitivity of protein identification and functional inference Detect and correct genome annotation errors systematically Provide basis for evolutionary and comparative genomics research Provide basis for automated annotation of protein features: annotate generic biochemical and specific biological functions Protein Classification and Functional Annotation Discovery of New Knowledge by Using Information Embedded within Families of Homologous Sequences and Their Structures

A protein may be assigned to only one homeomorphic family, which may have zero or more child nodes and zero or more parent nodes. Each homeomorphic family may have as many domain superfamily parents as its members have domains.

Creation and Curation of PIRSFs Computer-Generated (Uncurated) Clusters Preliminary Curation Membership Signature Domains Full Curation Family Name, Description, Bibliography PIRSF Name Rules UniProtKB proteins Preliminary Homeomorphic Families Orphans Curated Homeomorphic Families Final Homeomorphic Families Add/remove members Name, refs, abstract, domain arch. Automatic clustering Computer- assisted Manual Curation Automatic Procedure Unassigned proteins Automatic placement Create hierarchies (superfamilies/subfamilies ) Map domains on Families Merge/split clusters New proteins Protein name rule/site rule Build and test HMMs

PIRSF family classification system

PIRSF Text Search Add extra input boxes for advanced search Select field Ways to get to PIRSF text search

PIRSF Text Search Result (I) Things you can do from the result table: 1.Add search terms or start search over 2. Customize the table columns 3. Save your results as table or FASTA format 4. Select entries using check boxes and perform analysis using tool bar options 5. Links to PIRSF records, PIRSF hierarchy, to protein domains (Pfam)

PIRSF Text Search Result (II) 2. How to customize the table columns: Display KEGG pathway ID column a- Select KEGGPathway ID in the “Fields not in display” box c- Now KEGG ID should be in the “Fields in display”. Press apply button for the changes to take place. b- Use the > to add item into the “Fields in display” box

PIRSF Text Search Result (III) 3. Save your results as table or FASTA format a- Select Entries using check boxes in the PIRSF column. To select all, check the box in the column heading. b- Click on “Save Result As: Table” to store the information in the result table. This file can be opened in Excel as shown below. c- Click on FASTA to save protein sequences.

PIRSF Text Search Result (IV) a- Select families using check boxes in the PIRSF ID column. To select all, check the box in the column heading. Then select tool, e.g., Taxonomy Distribution 4. Select entries using checkboxes and perform analysis using tool bar options Display taxonomic distribution for the selected families. In this case, PIRSF and PIRSF contain members of the AroQ class from prokaryotes and eukaryotes, respectively, which is also reflected in the family name.

PIRSF Text Search Result (V) 4.Note on selecting families for analysis for Multiple Alignment and Domain Display: If more than one family is selected the chosen tool will perform the operation on representative members of the selected families. Example: multiple alignment PIRSF001501, PIRSF500251, PIRSF and PIRSF If one family is selected the chosen tool will perform the operation on the seed members. Example: multiple alignment PIRSF001501

PIRSF Text Search Result (VI) 5. The result table contains summarized information about family size, domain architecture, level of curation. Additional data can be viewed by using the Display Option. PIRSF Name: The names assigned to PIRSF predominantly reflect the membership. The main source of PIRSF names is the literature. Fully curated families have a name accompanied, in most cases, by an evidence tag: [Validated]: to indicate that at least one member in the family has experimentally determined function. [Predicted]: for families whose functions are inferred computationally based on sequence similarity and/or functional associative analysis. [Tentative]: cases where experimental evidence is not decisive. Curation Status: Indicates the level of manual curation of the PIRSF. Uncurated: Computer-generated protein clusters, no manual curation. The clusters are computationally defined using both pairwise based parameters (% sequence identity, sequence length ratio and overlap length ratio) and cluster-based parameters (% matched members, distance to neighboring clusters and overall domain arrangement). Preliminary: Computer-generated clusters are manually curated for membership (do proteins belong to the assigned cluster?) and domain architecture (Pfam domains listed from N- to C- termini). Full/Full (with description): A name is assigned to the protein family, and accompanying references are listed when available. In many cases, brief descriptions are also provided. Hfam/Superfam/Subfam: Indicates the hierarchical level for the PIRSF: homeomorphic, superfamily or subfamily level, respectively. Selecting the button will show the PIRSF hierarchy in a DAG view with Pfam as the top node.

5. PIRSF hierarchy in DAG view (cont.) Pfam level Hfam level Subfam level

PIRSF Family Report (I): Curated Protein Family Information Level of manual curation Hierarchy with Pfam domain at the highest node Taxonomic distribution of PIRSF can be used to infer evolutionary history of the proteins in the PIRSF Phylogenetic tree and alignment view allows further sequence analysis See graphical display of Pfam domains assigned with high confidence

PIRSF Family Report (II) Integrated value-added information from other databases Mapping to other protein classification databases

PIRSF: Batch Retrieval Retrieve PIRSF families by selecting a specific identifier or a combination of identifiers. List IDs Define IDs Display the list of query/PIRSF matches

PIRSF SCAN (sequence search)

UniProtKB sequence Q8Y5X7 is automatically classified as chorismate mutase of the AroH class PIRSF Returns only matches to fully curated PIRSFs