Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.

Slides:



Advertisements
Similar presentations
EBSCO Discovery Service
Advertisements

Analysis of Biomolecular Sequences 29/01/2015 Mail: Prof. Neri Niccolai Simone Gardini
RETRIEVING DATA FROM FCC LICENSE DATABASE Steps for obtaining query results, and importing it into MS Excel Spreadsheet.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
Integrating Access with the Web and with Other Programs.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
A Guide to Oracle9i1 Introduction To Forms Builder Chapter 5.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
UniProt - The Universal Protein Resource
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Figure 1. Hit analysis in 2002 of database-driven web applications Hits by Category in 2002 N = 73,873 Results Reporting 27% GME 26% Research 20% Bed Availability.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Microsoft Office Word 2013 Expert Microsoft Office Word 2013 Expert Courseware # 3251 Lesson 4: Working with Forms.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Classroom User Training June 29, 2005 Presented by:
Pathway Assignments. The assignment – Annotating Pathways KEGG Pathway Database.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Microsoft Access Lesson 1 Lexington Technology Center February 11, 2003 Bob Herring On the Web at
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
SAGExplore web server tutorial for Module II: Genome Mapping.
UniProt Non-redundant Reference Cluster (UniRef) Databases Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI)
Lesson 2.  To help ensure accurate data, rules that check entries against specified values can be applied to a field. A validation rule is applied to.
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
1 By: Nour Hilal. Microsoft Access is a database software where data is stored in one or more Tables. A Database is a group of related Tables. Access.
OPAC Training aid (Library solutions & Library world)
Key Applications Module Lesson 21 — Access Essentials
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Lesson 12: Creating a Manual and Using Mail Merge.
Anastasia Nikolskaya Lai-Su Yeh Protein Information Resource Georgetown University Medical Center Washington, DC PIR: a comprehensive resource for functional.
Lesson 1: Exploring Access Learning Objectives After studying this lesson, you will be able to: Start Access and identify elements of the application.
1 LSM2241 AY0910 Semester 2 MiniProject Briefing Round 5.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Motif discovery and Protein Databases Tutorial 5.
SRS Introductory Course 5/12/ Temporary and permanent sessions - Simple querying - Browsing indices - Standard and extended query forms - User defined.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Domain Database
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 7 1 Microsoft Office FrontPage 2003 Tutorial 8 – Integrating a Database with a FrontPage.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Advanced SRS Course 12/12/02 -Linking -Subentries -Applications.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
Excel part 5 Working with Excel Tables, PivotTables, and PivotCharts.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
InterPro Sandra Orchard.
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
UniProt: Universal Protein Resource
PIR: Protein Information Resource
Sequence Based Analysis Tutorial
Tutorial: Bioinformatics Resources
Protein Sequence Analysis - Overview -
Sequence Based Analysis Tutorial
Protein Sequence Analysis - Overview -
Presentation transcript:

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase

iProClass Text Search Search! Select field Ways to get to iProClass text search Add (+)/delete (-) input boxes Search tips: 1- Use “not null” or “null” to search entries that “contain” or “do not contain” information in the selected search field, respectively. In the present example, we want to search for proteins that have enzymatic activity corresponding to EC and have 3D structure (PDB ID not null). 2- Use and/or/not logical operators

iProClass Text Search Result (I) Things you can do from the result table: 1.Add search terms or start over Customize the table columns 3. Save your results as table or FASTA format 4. Select entries using check boxes and perform analysis using tool bar options 5. Links to protein records, to protein names (BioThesaurus), to protein families (PIRSF)

iProClass Text Search Result (II) 2. How to customize the table columns: Display PDB ID column a- Select PDB ID in the “Fields not in display” box c- Now PDB ID should be in the “Field in display”. Press apply button for the changes to take place. b- Use the > to add item into the “Fields in display” box

iProClass Text Search Result (III) 3. Save your results as table or FASTA format a- Select Entries using check boxes in the Protein AC/ID column. To select all, check the box in the column heading. b- Click on “Save Result As: Table” to store the information in the result table. This file can be opened in Excel as shown below. c- Click on FASTA to save protein sequences.

iProClass Text Search Result (IV) a- Select Entries using check boxes in the Protein AC/ID column. To select all, check the box in the column heading. Then select tool, e.g., Domain Display 4. Select entries using checkboxes and perform analysis using tool bar options Domain Display shows Pfam domains present in the proteins selected

iProClass Text Search Result (V) 5. Links to protein records, to protein names (BioThesaurus), to protein families (PIRSF) Link to protein reports Link to protein names Link to taxonomy Link to PIRSF report Link to pre-computed BLAST

iProClass Protein Report (I) pre-computed BLAST Rich links & extensive cross-references Shows ID correspondence to other databases See protein synonyms

iProClass Protein Report (II) Integrated added-value information from other databases

iProClass Protein Report (III) Links to different protein family classification databases Interactive Domain and Sequence Display

iProClass Text Search Result (VII) See protein synonyms and the source attribution

iProClass Text Search Result (VII) Related Sequences (pre-computed BLAST) show proteins similar to the query, significantly faster than running BLAST in real time, and may also evidence tight protein clusters (related sequence number low).

iProClass Protein Knowledgebase

Batch Retrieval in iProClass If possible, specify the type of ID Due to the diversity of databases and the lack of consistency in protein/gene names and/or identifiers in the literature, it can be difficult to retrieve multiple entries when protein and gene identifiers come from different sources. The batch retrieval tool overcomes this problem and provides high flexibility, allowing the retrieval of multiple entries from the iProClass database by selecting a specific identifier or a combination of them.

Batch Retrieval Result Page Choose columns to be displayed Retrieve more sequences Links to iProClass and UniProtKB reports

iProClass Protein Knowledgebase

Search a Pattern in iProClass A pattern is a formula (regular expression) that represents the conserved region of a group of related proteins. PROSITE is a database that contains patterns and profiles specific for more than a thousand protein families or domains. Pattern search at PIR allows: 1- The search of a specific PROSITE or user-defined pattern against one of the following sequence database: (i) UniProtKB is the central hub for the collection of functional information on proteins, with accurate, consistent, and rich annotation. It consists of two sections: a section containing manually-annotated records (UniProtKB/Swiss-Prot), and a section with computationally analyzed records that await full manual annotation (UniProtKB/TrEMBL). (ii) A subset of UniProtKB entries belonging to a certain organism or taxon group. (iii) UniRef100 provides clustered sets of sequences at 100% identity from UniProtKB (including splice variants and isoforms) and selected UniParc records.user-defined patternUniProtKB UniRef100 P-D-x(2)-H-[DE]-[LIVF]-[LIVMFY]-G-H-[LIVMC]-[PA] Enter pattern Enter PROSITE ID

Sequence range where pattern is found Display the query pattern Search a Pattern Result in iProClass

Search a Pattern in iProClass Pattern search at PIR allows: 2- The search of PROSITE patterns (note that profiles are excluded) in a query sequence, entering the single amino acid code sequence or its unique ID. MNDRADFVVPDITTRKNVGLSHDANDFTLPQPLDRYS AEDHATWATLYQRQCKLLPGRACDEFMEGLERLEVD Enter sequence Enter ID Link to PROSITE documentation

iProClass Protein Knowledgebase

Protein ID Mapping Service Maps between UniProtKB and more than 30 other data sources to support data interoperability among disparate data sources and to allow integration and querying of data from heterogeneous molecular biology databases. Enter IDs Load file with ID list

Protein ID Mapping Service Example: we want to obtain a list of Entrez Gene IDs for a group of UniProtKB proteins Enter IDs IDs can be cut and pasted if needed or saved as a text file using the "save as" option provided by your web browser. P04176 P16331 P00439 P17276 Select ID type for source database Select ID type for target database Mapping

iProClass Protein Knowledgebase The iProClass Integrated database for protein functional analysis The iProClass Integrated database for protein functional analysis Wu CH, Huang H, Nikolskaya A, Hu Z, Yeh LS, Barker WC. Computational Biology and Chemistry, 28: 87-96, iProClass is freely available for academic institutions. Vendors and commercial entities who want to use and/or redistribute iProClass need to contact PIR to request a license Cite iProClass: iProClass Distribution: