Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner

Similar presentations


Presentation on theme: "Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner"— Presentation transcript:

1 Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org

2 Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by J Pevsner (copyright © 2009 by Wiley-Blackwell). These images and materials may not be used without permission from the publisher. Visit http://www.bioinfbook.org Copyright notice

3 Outline for tonight: Protein analysis and proteomics Individual proteins Perspective 1: Protein families (domains and motifs) Perspective 2: Physical properties (3D structure) Perspective 3: Localization Perspective 4: Function

4 protein Page 389 RNA DNA

5 protein [1] Protein families [4] Protein function [2] Physical properties Page 389 [3] Protein localization Gene ontology (GO): --cellular component --biological process --molecular function

6 The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) Goals: defining standards for proteomic data representation to facilitate the comparison, exchange, and verification of data

7 The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) Work groups # Gel Electrophoresis # Mass Spectrometry # Molecular Interactions # Protein Modifications # Proteomics Informatics # Sample Processing Themes # Controlled vocabularies # MIAPE: Minimum information about a proteomics experiment

8 The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI) http://www.psidev.info/

9 Perspective 1: Protein domains and motifs Page 389

10 Definitions Signature: a protein category such as a domain or motif Page 390

11 Definitions Signature: a protein category such as a domain or motif Domain: a region of a protein that can adopt a 3D structure a fold a family is a group of proteins that share a domain examples: zinc finger domain immunoglobulin domain Motif (or fingerprint): a short, conserved region of a protein typically 10 to 20 contiguous amino acid residues Page 390

12 15 most common domains (human) Zn finger, C2H2 type1093 proteins Immunoglobulin1032 EGF-like471 Zn-finger, RING458 Homeobox417 Pleckstrin-like405 RNA-binding region RNP-1400 SH3394 Calcium-binding EF-hand392 Fibronectin, type III300 PDZ/DHR/GLGF280 Small GTP-binding protein 261 BTB/POZ236 bHLH226 Cadherin226 Page 391 Source: Integr8 at EBI website

13 15 most common domains (various species) The European Bioinformatics Institute (EBI) offers many key proteomics resources at the Integr8 site: http://www.ebi.ac.uk/proteome/ Page 391

14 1. Go to the Integr8 site: http://www.ebi.ac.uk/proteome/ 2. Browse species; choose Homo sapiens. 3. Click “Proteome analysis” 4. Obtain a variety of statistics, such as common repeats, domains, average protein length

15 Source: Integr8 at EBI website (updated 7/09) amino acid frequency Amino acid composition

16 Source: Integr8 at EBI website (updated 7/09) Relative frequency protein length Average protein length : 468+/- 522 amino acid residues Size range: 4 - 34350 amino acid residues

17 Definition of a domain According to InterPro at EBI ( http://www.ebi.ac.uk/interpro /): A domain is an independent structural unit, found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. According to SMART (http://smart.embl-heidelberg.de): A domain is a conserved structural entity with distinctive secondary structure content and a hydrophobic core. Homologous domains with common functions usually show sequence similarities. Page 390

18 Varieties of protein domains Page 393 Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times

19 Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2) MBD Page 393 TRD The protein includes a methylated DNA binding domain (MBD) and a transcriptional repression domain (TRD). MeCP2 is a transcriptional repressor. Mutations in the gene encoding MeCP2 cause Rett Syndrome, a neurological disorder affecting girls primarily.

20 Page 393 Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins domain

21 Page 393 Are proteins that share only a domain homologous?

22 Example of a multidomain protein: HIV-1 pol Pol (NP_789740), 995 amino acids long Gag-Pol (NP_057849), 1435 amino acids cleaved into three proteins with distinct activities: -- aspartyl protease -- reverse transcriptase -- integrase We will explore HIV-1 pol and other proteins at the Expert Protein Analysis System (ExPASy) server. Visit www.expasy.org/ Page 394

23 http://www.expasy.ch/

24 1. Go to ExPASy (http://www.expasy.ch/) 2. If you know the SwissProt accession of your protein, enter it at top. 3. Otherwise go into Swiss-Prot/TrEMBL, click SRS (Sequence Retrieval System), click Start, then click continue, then search for your protein of interest.

25 SwissProt: which HIV-1 pol is “correct”? Use P04587

26 The SwissProt entry for any protein provides highly useful information…

27 Page 395 SwissProt entry for HIV-1 pol links to many databases

28 Page 395 ProDom entry for HIV-1 pol shows many related proteins

29

30 www.uniprot.org Three protein databases recently merged to form UniProt: SwissProt TrEMBL (translated European Molecular Biology Lab) Protein Information Resource (PIR) You can search for information on your favorite protein there; a BLAST server is provided.

31 Proteins can have both domains and motifs (patterns) Domain (aspartyl protease) Domain (reverse transcriptase) Motif (several residues) Motif (several residues)

32 Page 396

33 Definition of a motif A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids. Simple motifs include transmembrane domains and phosphorylation sites. These do not imply homology when found in a group of proteins. PROSITE (www.expasy.org/prosite) is a dictionary of motifs (there are currently 1600 entries). In PROSITE, a pattern is a qualitative motif description (a protein either matches a pattern, or not). In contrast, a profile is a quantitative motif description. We will encounter profiles in Pfam, ProDom, SMART, and other databases. Page 394

34 Summary of Perspective 1: Protein domains and motifs A signature is a protein category such as a domain or motif. You can learn about domains at Integr8, and at databases such as InterPro and Pfam. A motif (or fingerprint) is a short, conserved sequence. You can study motifs at Prosite at ExPASy.

35 Perspective 2: Physical properties of proteins Page 397

36 Page 398

37 Physical properties of proteins Many websites are available for the analysis of individual proteins. ExPASy and ISREC are two excellent resources. The accuracy of these programs is variable. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such as posttranslational modification of proteins by specific sugars), experimental evidence may be required rather than prediction algorithms. Page 399

38 Access a variety of protein analysis programs from the top right of the ExPASy home page

39 Page 400

40

41 Protein secondary structure Protein secondary structure is determined by the amino acid side chains. Myoglobin is an example of a protein having many  -helices. These are formed by amino acid stretches 4-40 residues in length. Thioredoxin from E. coli is an example of a protein with many  sheets, formed from  strands composed of 5-10 residues. They are arranged in parallel or antiparallel orientations. Page 425

42 Fig. 11.3 Page 427 Myoglobin (John Kendrew, 1958)

43 Thioredoxin Fig. ~11.3 Page 427

44 Secondary structure prediction Chou and Fasman (1974) developed an algorithm based on the frequencies of amino acids found in  helices,  -sheets, and turns. Proline: occurs at turns, but not in  helices. GOR (Garnier, Osguthorpe, Robson): related algorithm Modern algorithms: use multiple sequence alignments and achieve higher success rate (about 70-75%) Page 427

45 Secondary structure prediction Web servers: GOR4 Jpred NNPREDICT PHD Predator PredictProtein PSIPRED SAM-T99sec Table 11-3 Page 429

46

47

48 Go to http://pbil.univ-lyon1.fr/, click “Secondary structure prediction” to access this prediction tool

49 Tertiary protein structure: protein folding Main approaches: [1] Experimental determination (X-ray crystallography, NMR) [2] Prediction ► Comparative modeling (based on homology) ► Threading ► Ab initio (de novo) prediction (Ingo Ruczinski at JHSPH) Page 430

50 Experimental approaches to protein structure [1] X-ray crystallography -- Used to determine 80% of structures -- Requires high protein concentration -- Requires crystals -- Able to trace amino acid side chains -- Earliest structure solved was myoglobin [2] NMR -- Magnetic field applied to proteins in solution -- Largest structures: 350 amino acids (40 kD) -- Does not require crystallization Page 430

51 Steps in obtaining a protein structure Target selection Obtain, characterize protein Determine, refine, model the structure Deposit in repository Fig 11.5 page 431

52 Priorities for target selection for protein structures Page 432 Historically, small, soluble, abundant proteins were studied (e.g. hemoglobin, cytochromes c, insulin). Modern criteria: Represent all branches of life Represent previously uncharacterized families Identify medically relevant targets Some are attempting to solve all structures within an individual organism (Methanococcus jannaschii, Mycobacterium tuberculosis)

53 The Protein Data Bank (PDB) Page 434 PDB is the principal repository for protein structures Established in 1971 Accessed at http://www.rcsb.org/pdb or simply http://www.pdb.org Currently contains 59,000 structure entities Updated 7/28/09

54 Fig. 11.7 Page 435

55 structures Fig. 9.6 Page 281 year Updated 12/08 PDB content growth (www.pdb.org) 19721980 2000 yearly total 10,000 2008 50,000 30,000

56 PDB holdings (12/08) 50,621 proteins, peptides 2,225 protein/nucl. complexes 1,946 nucleic acids 33 other; carbohydrates 54,825 total Table 11-4 Page 435

57 Fig. ~11.9 Page 436

58 Fig. ~11.10 Page 436

59 Viewing hemoglobin (accession 2H35) at PDB

60 Viewing structures at PDB: WebMol Fig. 11.11 Page 437

61 Protein Data Bank Swiss-Prot, NCBI, EMBL CATH, Dali, SCOP, FSSP gateways to access PDB files databases that interpret PDB files

62 The CATH Hierarchy Fig. 11.18 Page 444

63 Access to PDB through NCBI Page 437 You can access PDB data at the NCBI several ways. Go to the Structure site, from the NCBI homepage Use Entrez Perform a BLAST search, restricting the output to the PDB database

64 Fig. ~11.12 Page 438

65 Do a blastp search; set the database to pdb (Protein Data Bank)

66

67 Structure links Structure accession (e.g. 2JTZ)

68 Fig. 9.13 Page 288

69 Fig. 9.14 Page 289

70

71

72 Access to PDB structures through NCBI Page 291 Molecular Modeling DataBase (MMDB) Cn3D (“see in 3D” or three dimensions): structure visualization software Vector Alignment Search Tool (VAST): view multiple structures

73 Fig. 9.15 Page 290

74 Fig. 9.15 Page 290

75 Fig. 9.16 Page 291

76 Fig. 9.17 Page 292

77 Access to structure data at NCBI: VAST Page 294 Vector Alignment Search Tool (VAST) offers a variety of data on protein structures, including -- PDB identifiers -- root-mean-square deviation (RMSD) values to describe structural similarities -- NRES: the number of equivalent pairs of alpha carbon atoms superimposed -- percent identity

78 Fig. 9.18 Page 293

79 Predicting protein structure: CASP8 competition 8th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction http://predictioncenter.gc.ucdavis.edu/ http://predictioncenter.org/casp8/index.cgi

80 Percent of residues (C  ) Distance cutoff (Å) most groups had the right prediction for this structure (but not those at arrow 2) most groups had the wrong prediction for this structure (but arrow 3 did better) half the groups got it wrong (arrow 4), half got it right (arrow 5); the key difference is the multiple sequence alignment 2 1 3 4 5

81

82 Protein structure and human disease In some cases, a single amino acid substitution can induce a dramatic change in protein structure. For example, the  F508 mutation of CFTR alters the  helical content of the protein, and disrupts intracellular trafficking. Other changes are subtle. The E6V mutation in the gene encoding hemoglobin beta causes sickle- cell anemia. The substitution introduces a hydrophobic patch on the protein surface, leading to clumping of hemoglobin molecules. Page 311

83 Protein structure and human disease DiseaseProtein Cystic fibrosisCFTR Sickle-cell anemiahemoglobin beta “mad cow” diseaseprion protein Alzheimer diseaseamyloid precursor protein Table 9.5 Page 312

84 Introduction to Perspectives 3 and 4: Gene Ontology (GO) Consortium Page 237

85 The Gene Ontology Consortium An ontology is a description of concepts. The GO Consortium compiles a dynamic, controlled vocabulary of terms related to gene products. There are three organizing principles: Molecular function Biological process Cellular compartment You can visit GO at http://www.geneontology.org. There is no centralized GO database. Instead, curators of organism-specific databases assign GO terms to gene products for each organism. Page 237

86 Page 241 GO terms are assigned to Entrez Gene entries

87 Page 241

88

89 The Gene Ontology Consortium: Evidence Codes ICInferred by curator IDAInferred from direct assay IEAInferred from electronic annotation IEPInferred from expression pattern IGIInferred from genetic interaction IMPInferred from mutant phenotype IPIInferred from physical interaction ISSInferred from sequence or structural similarity NASNon-traceable author statement NDNo biological data TASTraceable author statement Page 240

90 Perspective 3: Protein localization Page 242

91 protein Protein localization Page 242

92 Protein localization Proteins may be localized to intracellular compartments, cytosol, the plasma membrane, or they may be secreted. Many proteins shuttle between multiple compartments. A variety of algorithms predict localization, but this is essentially a cell biological question. Page 240

93

94

95 Page 242

96 Page 244

97

98 Localization of 2,900 yeast proteins Michael Snyder and colleagues incorporated epitope tags into thousands of S. cerevisiae cDNAs, and systematically localized proteins (Kumar et al., 2002). See http://ygac.med.yale.edu for the TRIPLES database including ~3,000 fluorescence micrographs. Page 243

99 Perspective 4: Protein function Page 243

100 Protein function Function refers to the role of a protein in the cell. We can consider protein function from a variety of perspectives. Page 243

101 1. Biochemical function (molecular function) RBP binds retinol, could be a carrier Page 245

102 2. Functional assignment based on homology RBP could be a carrier too Other carrier proteins Page 245

103 3. Function based on structure RBP forms a calyx Page 245

104 4. Function based on ligand binding specificity RBP binds vitamin A Page 245

105 5. Function based on cellular process DNARNA RBP is abundant, soluble, secreted Page 245

106 6. Function based on biological process RBP is essential for vision Page 245

107 7. Function based on “proteomics” or high throughput “functional genomics” High throughput analyses show... RBP levels elevated in renal failure RBP levels decreased in liver disease Page 245

108 Functional assignment of enzymes: the EC (Enzyme Commission) system Oxidoreductases1,003 Transferases1,076 Hydrolases1,125 Lyases356 Isomerases156 Ligases126 Page 246

109 Functional assignment of proteins: Clusters of Orthologous Groups (COGs) Information storage and processing Cellular processes Metabolism Poorly characterized Page 247


Download ppt "Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner"

Similar presentations


Ads by Google