Download presentation
Presentation is loading. Please wait.
Published byCorey Hardy Modified over 9 years ago
1
Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org
2
Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by J Pevsner (copyright © 2009 by Wiley-Blackwell). These images and materials may not be used without permission from the publisher. Visit http://www.bioinfbook.org Copyright notice
3
Outline for tonight: Protein analysis and proteomics Individual proteins Perspective 1: Protein families (domains and motifs) Perspective 2: Physical properties (3D structure) Perspective 3: Localization Perspective 4: Function
4
protein Page 389 RNA DNA
5
protein [1] Protein families [4] Protein function [2] Physical properties Page 389 [3] Protein localization Gene ontology (GO): --cellular component --biological process --molecular function
6
The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) Goals: defining standards for proteomic data representation to facilitate the comparison, exchange, and verification of data
7
The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) Work groups # Gel Electrophoresis # Mass Spectrometry # Molecular Interactions # Protein Modifications # Proteomics Informatics # Sample Processing Themes # Controlled vocabularies # MIAPE: Minimum information about a proteomics experiment
8
The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) http://www.psidev.info/
9
Perspective 1: Protein domains and motifs Page 389
10
Definitions Signature: a protein category such as a domain or motif Page 390
11
Definitions Signature: a protein category such as a domain or motif Domain: a region of a protein that can adopt a 3D structure a fold a family is a group of proteins that share a domain examples: zinc finger domain immunoglobulin domain Motif (or fingerprint): a short, conserved region of a protein typically 10 to 20 contiguous amino acid residues Page 390
12
15 most common domains (human) Zn finger, C2H2 type1093 proteins Immunoglobulin1032 EGF-like471 Zn-finger, RING458 Homeobox417 Pleckstrin-like405 RNA-binding region RNP-1400 SH3394 Calcium-binding EF-hand392 Fibronectin, type III300 PDZ/DHR/GLGF280 Small GTP-binding protein 261 BTB/POZ236 bHLH226 Cadherin226 Page 391 Source: Integr8 at EBI website
13
15 most common domains (various species) The European Bioinformatics Institute (EBI) offers many key proteomics resources at the Integr8 site: http://www.ebi.ac.uk/proteome/ Page 391
14
1. Go to the Integr8 site: http://www.ebi.ac.uk/proteome/ 2. Browse species; choose Homo sapiens. 3. Click “Proteome analysis” 4. Obtain a variety of statistics, such as common repeats, domains, average protein length
15
Source: Integr8 at EBI website (updated 7/09) amino acid frequency Amino acid composition
16
Source: Integr8 at EBI website (updated 7/09) Relative frequency protein length Average protein length : 468+/- 522 amino acid residues Size range: 4 - 34350 amino acid residues
17
Definition of a domain According to InterPro at EBI ( http://www.ebi.ac.uk/interpro /): A domain is an independent structural unit, found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. According to SMART (http://smart.embl-heidelberg.de): A domain is a conserved structural entity with distinctive secondary structure content and a hydrophobic core. Homologous domains with common functions usually show sequence similarities. Page 390
18
Varieties of protein domains Page 393 Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times
19
Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2) MBD Page 393 TRD The protein includes a methylated DNA binding domain (MBD) and a transcriptional repression domain (TRD). MeCP2 is a transcriptional repressor. Mutations in the gene encoding MeCP2 cause Rett Syndrome, a neurological disorder affecting girls primarily.
20
Page 393 Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins domain
21
Page 393 Are proteins that share only a domain homologous?
22
Example of a multidomain protein: HIV-1 pol Pol (NP_789740), 995 amino acids long Gag-Pol (NP_057849), 1435 amino acids cleaved into three proteins with distinct activities: -- aspartyl protease -- reverse transcriptase -- integrase We will explore HIV-1 pol and other proteins at the Expert Protein Analysis System (ExPASy) server. Visit www.expasy.org/ Page 394
23
http://www.expasy.ch/
24
1. Go to ExPASy (http://www.expasy.ch/) 2. If you know the SwissProt accession of your protein, enter it at top. 3. Otherwise go into Swiss-Prot/TrEMBL, click SRS (Sequence Retrieval System), click Start, then click continue, then search for your protein of interest.
25
SwissProt: which HIV-1 pol is “correct”? Use P04587
26
The SwissProt entry for any protein provides highly useful information…
27
Page 395 SwissProt entry for HIV-1 pol links to many databases
28
Page 395 ProDom entry for HIV-1 pol shows many related proteins
30
www.uniprot.org Three protein databases recently merged to form UniProt: SwissProt TrEMBL (translated European Molecular Biology Lab) Protein Information Resource (PIR) You can search for information on your favorite protein there; a BLAST server is provided.
31
Proteins can have both domains and motifs (patterns) Domain (aspartyl protease) Domain (reverse transcriptase) Motif (several residues) Motif (several residues)
32
Page 396
33
Definition of a motif A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids. Simple motifs include transmembrane domains and phosphorylation sites. These do not imply homology when found in a group of proteins. PROSITE (www.expasy.org/prosite) is a dictionary of motifs (there are currently 1600 entries). In PROSITE, a pattern is a qualitative motif description (a protein either matches a pattern, or not). In contrast, a profile is a quantitative motif description. We will encounter profiles in Pfam, ProDom, SMART, and other databases. Page 394
34
Summary of Perspective 1: Protein domains and motifs A signature is a protein category such as a domain or motif. You can learn about domains at Integr8, and at databases such as InterPro and Pfam. A motif (or fingerprint) is a short, conserved sequence. You can study motifs at Prosite at ExPASy.
35
Perspective 2: Physical properties of proteins Page 397
36
Page 398
37
Physical properties of proteins Many websites are available for the analysis of individual proteins. ExPASy and ISREC are two excellent resources. The accuracy of these programs is variable. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such as posttranslational modification of proteins by specific sugars), experimental evidence may be required rather than prediction algorithms. Page 399
38
Access a variety of protein analysis programs from the top right of the ExPASy home page
39
Page 400
41
Protein secondary structure Protein secondary structure is determined by the amino acid side chains. Myoglobin is an example of a protein having many -helices. These are formed by amino acid stretches 4-40 residues in length. Thioredoxin from E. coli is an example of a protein with many sheets, formed from strands composed of 5-10 residues. They are arranged in parallel or antiparallel orientations. Page 425
42
Fig. 11.3 Page 427 Myoglobin (John Kendrew, 1958)
43
Thioredoxin Fig. ~11.3 Page 427
44
Secondary structure prediction Chou and Fasman (1974) developed an algorithm based on the frequencies of amino acids found in helices, -sheets, and turns. Proline: occurs at turns, but not in helices. GOR (Garnier, Osguthorpe, Robson): related algorithm Modern algorithms: use multiple sequence alignments and achieve higher success rate (about 70-75%) Page 427
45
Secondary structure prediction Web servers: GOR4 Jpred NNPREDICT PHD Predator PredictProtein PSIPRED SAM-T99sec Table 11-3 Page 429
48
Go to http://pbil.univ-lyon1.fr/, click “Secondary structure prediction” to access this prediction tool
49
Tertiary protein structure: protein folding Main approaches: [1] Experimental determination (X-ray crystallography, NMR) [2] Prediction ► Comparative modeling (based on homology) ► Threading ► Ab initio (de novo) prediction (Ingo Ruczinski at JHSPH) Page 430
50
Experimental approaches to protein structure [1] X-ray crystallography -- Used to determine 80% of structures -- Requires high protein concentration -- Requires crystals -- Able to trace amino acid side chains -- Earliest structure solved was myoglobin [2] NMR -- Magnetic field applied to proteins in solution -- Largest structures: 350 amino acids (40 kD) -- Does not require crystallization Page 430
51
Steps in obtaining a protein structure Target selection Obtain, characterize protein Determine, refine, model the structure Deposit in repository Fig 11.5 page 431
52
Priorities for target selection for protein structures Page 432 Historically, small, soluble, abundant proteins were studied (e.g. hemoglobin, cytochromes c, insulin). Modern criteria: Represent all branches of life Represent previously uncharacterized families Identify medically relevant targets Some are attempting to solve all structures within an individual organism (Methanococcus jannaschii, Mycobacterium tuberculosis)
53
The Protein Data Bank (PDB) Page 434 PDB is the principal repository for protein structures Established in 1971 Accessed at http://www.rcsb.org/pdb or simply http://www.pdb.org Currently contains 59,000 structure entities Updated 7/28/09
54
Fig. 11.7 Page 435
55
structures Fig. 9.6 Page 281 year Updated 12/08 PDB content growth (www.pdb.org) 19721980 2000 yearly total 10,000 2008 50,000 30,000
56
PDB holdings (12/08) 50,621 proteins, peptides 2,225 protein/nucl. complexes 1,946 nucleic acids 33 other; carbohydrates 54,825 total Table 11-4 Page 435
57
Fig. ~11.9 Page 436
58
Fig. ~11.10 Page 436
59
Viewing hemoglobin (accession 2H35) at PDB
60
Viewing structures at PDB: WebMol Fig. 11.11 Page 437
61
Protein Data Bank Swiss-Prot, NCBI, EMBL CATH, Dali, SCOP, FSSP gateways to access PDB files databases that interpret PDB files
62
The CATH Hierarchy Fig. 11.18 Page 444
63
Access to PDB through NCBI Page 437 You can access PDB data at the NCBI several ways. Go to the Structure site, from the NCBI homepage Use Entrez Perform a BLAST search, restricting the output to the PDB database
64
Fig. ~11.12 Page 438
65
Do a blastp search; set the database to pdb (Protein Data Bank)
67
Structure links Structure accession (e.g. 2JTZ)
68
Fig. 9.13 Page 288
69
Fig. 9.14 Page 289
72
Access to PDB structures through NCBI Page 291 Molecular Modeling DataBase (MMDB) Cn3D (“see in 3D” or three dimensions): structure visualization software Vector Alignment Search Tool (VAST): view multiple structures
73
Fig. 9.15 Page 290
74
Fig. 9.15 Page 290
75
Fig. 9.16 Page 291
76
Fig. 9.17 Page 292
77
Access to structure data at NCBI: VAST Page 294 Vector Alignment Search Tool (VAST) offers a variety of data on protein structures, including -- PDB identifiers -- root-mean-square deviation (RMSD) values to describe structural similarities -- NRES: the number of equivalent pairs of alpha carbon atoms superimposed -- percent identity
78
Fig. 9.18 Page 293
79
Predicting protein structure: CASP8 competition 8th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction http://predictioncenter.gc.ucdavis.edu/ http://predictioncenter.org/casp8/index.cgi
80
Percent of residues (C ) Distance cutoff (Å) most groups had the right prediction for this structure (but not those at arrow 2) most groups had the wrong prediction for this structure (but arrow 3 did better) half the groups got it wrong (arrow 4), half got it right (arrow 5); the key difference is the multiple sequence alignment 2 1 3 4 5
82
Protein structure and human disease In some cases, a single amino acid substitution can induce a dramatic change in protein structure. For example, the F508 mutation of CFTR alters the helical content of the protein, and disrupts intracellular trafficking. Other changes are subtle. The E6V mutation in the gene encoding hemoglobin beta causes sickle- cell anemia. The substitution introduces a hydrophobic patch on the protein surface, leading to clumping of hemoglobin molecules. Page 311
83
Protein structure and human disease DiseaseProtein Cystic fibrosisCFTR Sickle-cell anemiahemoglobin beta “mad cow” diseaseprion protein Alzheimer diseaseamyloid precursor protein Table 9.5 Page 312
84
Introduction to Perspectives 3 and 4: Gene Ontology (GO) Consortium Page 237
85
The Gene Ontology Consortium An ontology is a description of concepts. The GO Consortium compiles a dynamic, controlled vocabulary of terms related to gene products. There are three organizing principles: Molecular function Biological process Cellular compartment You can visit GO at http://www.geneontology.org. There is no centralized GO database. Instead, curators of organism-specific databases assign GO terms to gene products for each organism. Page 237
86
Page 241 GO terms are assigned to Entrez Gene entries
87
Page 241
89
The Gene Ontology Consortium: Evidence Codes ICInferred by curator IDAInferred from direct assay IEAInferred from electronic annotation IEPInferred from expression pattern IGIInferred from genetic interaction IMPInferred from mutant phenotype IPIInferred from physical interaction ISSInferred from sequence or structural similarity NASNon-traceable author statement NDNo biological data TASTraceable author statement Page 240
90
Perspective 3: Protein localization Page 242
91
protein Protein localization Page 242
92
Protein localization Proteins may be localized to intracellular compartments, cytosol, the plasma membrane, or they may be secreted. Many proteins shuttle between multiple compartments. A variety of algorithms predict localization, but this is essentially a cell biological question. Page 240
95
Page 242
96
Page 244
98
Localization of 2,900 yeast proteins Michael Snyder and colleagues incorporated epitope tags into thousands of S. cerevisiae cDNAs, and systematically localized proteins (Kumar et al., 2002). See http://ygac.med.yale.edu for the TRIPLES database including ~3,000 fluorescence micrographs. Page 243
99
Perspective 4: Protein function Page 243
100
Protein function Function refers to the role of a protein in the cell. We can consider protein function from a variety of perspectives. Page 243
101
1. Biochemical function (molecular function) RBP binds retinol, could be a carrier Page 245
102
2. Functional assignment based on homology RBP could be a carrier too Other carrier proteins Page 245
103
3. Function based on structure RBP forms a calyx Page 245
104
4. Function based on ligand binding specificity RBP binds vitamin A Page 245
105
5. Function based on cellular process DNARNA RBP is abundant, soluble, secreted Page 245
106
6. Function based on biological process RBP is essential for vision Page 245
107
7. Function based on “proteomics” or high throughput “functional genomics” High throughput analyses show... RBP levels elevated in renal failure RBP levels decreased in liver disease Page 245
108
Functional assignment of enzymes: the EC (Enzyme Commission) system Oxidoreductases1,003 Transferases1,076 Hydrolases1,125 Lyases356 Isomerases156 Ligases126 Page 246
109
Functional assignment of proteins: Clusters of Orthologous Groups (COGs) Information storage and processing Cellular processes Metabolism Poorly characterized Page 247
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.