Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.

Slides:



Advertisements
Similar presentations
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Advertisements

Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Pfam(Protein families )
Structural bioinformatics
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Protein Structure, Databases and Structural Alignment
Bioinformatics and Phylogenetic Analysis
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Protein Databases EBI – European Bioinformatics Institute
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
The Protein Data Bank (PDB)
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Protein and Function Databases
UniProt - The Universal Protein Resource
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Protein Tertiary Structure Prediction
Bioinformatics.
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
1 Protein Bioinformatics – Advances and Challenges Sona Vasudevan Peter McGarvey BY.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Secondary Databases Ansuman sahoo Roll: Y Bioinformatics Class Presentation 30 Jan 2013.
UniProt Non-redundant Reference Cluster (UniRef) Databases Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI)
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
PREDICTION OF CATALYTIC RESIDUES IN PROTEINS USING MACHINE-LEARNING TECHNIQUES Natalia V. Petrova (Ph.D. Student, Georgetown University, Biochemistry Department),
Motif discovery and Protein Databases Tutorial 5.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
Protein Homologue Clustering and Molecular Modeling L. Wang.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
? Functional Site rule: tags active site, binding, other residue- specific information Functional Annotation rule: gives name, EC, other activity- specific.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein databases Henrik Nielsen
Bio/Chem-informatics
Demo: Protein Information Resource
Biological Sequence Databases
생물정보학 Bioinformatics.
UniProt: Universal Protein Resource
Predicting Active Site Residue Annotations in the Pfam Database
PIR: Protein Information Resource
There are four levels of structure in proteins
BLAST.
Introduction to Bioinformatics
Sequence Based Analysis Tutorial
Tutorial: Bioinformatics Resources
Protein Sequence Analysis - Overview -
Sequence Based Analysis Tutorial
Protein Structures.
Protein Sequence Analysis - Overview -
Protein structure prediction.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center

Topics Proteomics and protein bioinformatics (protein sequence analysis) Why do protein sequence analysis? Searching sequence databases Post-processing search results Detecting remote homologs

Clinical proteomics From Petricoin et al., Nature Reviews Drug Discovery (2002) 1,

Single protein and shotgun analysis Adapted from: McDonald et al. (2002). Disease Markers 18: Protein Bioinformatics Mixture of proteins Gel based seperation Single protein analysis Digestion of protein mixture Spot excision and digestion LC or LC/LC separation Shotgun analysis Peptides from a single protein Peptides from many proteins MS analysis MS/MS analysis

Protein bioinformatics: protein sequence analysis Helps characterize protein sequences in silico and allows prediction of protein structure and function Statistically significant BLAST hits usually signifies sequence homology Homologous sequences may or may not have the same function but would always (very few exceptions) have the same structural fold Protein sequence analysis allows protein classification

Development of protein sequence databases Atlas of protein sequence and structure – Dayhoff (1966) first sequence database (pre-bioinformatics). Currently known as Protein Information Resource (PIR) Protein data bank (PDB) – structural database (1972) remains most widely used database of structures UniProt – The Universal Protein Resource (2003) is a central database of protein sequence and function created by joining the forces of the Swiss-Prot, TrEMBL and PIR protein database activities

Comparative protein sequence analysis and evolution Patterns of conservation in sequences allows us to determine which residues are under selective constraint (and thus likely important for protein function) Comparative analysis of proteins is more sensitive than comparing DNA Homologous proteins have a common ancestor Different proteins evolve at different rates Protein classification systems based on evolution: PIRSF and COGPIRSFCOG

PIRSF and large-scale annotation of proteins PIRSF is a protein classification system based on the evolutionary relationships of whole proteins As part of the UniProt project, PIR has developed this classification strategy to assist in the propagation and standardization of protein annotation

Comparing proteins Amino acid sequence of protein generated from proteomics experiment e.g. protein fragment DTIKDLLPNVCAFPMEKGPCQTYMTRWFFNFETGECELFAYGGCGGNSNNFLRKEKCEKFCKFT Amino-acids of two sequences can be aligned and we can easily count the number of identical residues (or use an index of similarity) as a measure of relatedness. Protein structures can be compared by superimposition

Protein sequence alignment Pairwise alignment a b a c d a b _ c d Multiple sequence alignment provides more information a b a c d a b _ c d x b a c e MSA difficult to do for distantly related proteins

Protein sequence analysis overview Protein databases PIR (pir.georgetown.edu) and UniProt ( Searching databases Peptide search, BLAST search, Text search Information retrieval and analysis Protein records at UniProt and PIR Multiple sequence alignment Secondary structure prediction Homology modeling

Universal Protein Resource Literature-Based Annotation UniProt Archive UniProt NREF Swiss- Prot PIR-PSDTrEMBL RefSeq GenBank/ EMBL/DDBJ EnsEMBLPDB Patent Data Other Data UniProt Knowledgebase Automated Annotation Clustering at 100, 90, 50% Literature-Based Annotation UniParc UniRef100 Swiss- Prot PIR-PSDTrEMBL RefSeq GenBank/ EMBL/DDBJ EnsEMBLPDB Patent Data Other Data UniProtKB Automated merging of sequences Automated Annotation UniRef90 UniRef50

Peptide Search

ID mapping

Query Sequence Unknown sequence is Q9I7I7 BLAST Q9I7I7 against the UniProt Knowledgebase ( ) Analyze results

BLAST results

Text search Any Field not specific

Text search results: display options Move Pubmed ID, Pfam ID and PDB ID into “Columns in Display” specific

Text search results: add input box

Text search result with null/not null

UniProt beta site

UniProtKB protein record

SIR2_HUMAN protein record

Are Q9I7I7 and SIR2_HUMAN homologs? Check BLAST results Check pairwise alignment

Protein structure prediction Programs can predict secondary structure information with 70% accuracy Homology modeling - prediction of ‘target’ structure from closely related ‘template’ structure

Secondary structure prediction

Secondary structure prediction results

Sir2 structure

Homology modeling

Homology model of Q9I7I7 Blue - excellent Green - so so Red - not good Yellow - beta sheet Red - alpha helix Grey - loop

Sequence features: SIR2_HUMAN

Multiple sequence alignment

Q9I7I7, Q82QG9, SIR2_HUMAN

Sequence features: CRAA_RABIT

Identifying Remote Homologs

Structure guided sequence alignment