Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University.

Similar presentations


Presentation on theme: "Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University."— Presentation transcript:

1 Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University

2 Transcription DNA 5’3’ mRNA Splicing Translation Poly- peptide Folding Protein Transport / Localization Oligomerization PTM (Post-Translational Modification) Function How do we find protein coding regions, introns and exons in genomic DNA sequences? Bioinformatics I

3 What is Proteomics ? Systematic analysis of All protein sequences All protein expression pattern All protein interactions This involves Protein isolation Protein separation Protein identification Functional characterization of all proteins

4 The tools of Proteomics Traditional protein chemistry assay methods struggle to establish Identity Identity requires: Specificity of measurement (Precision) Mass Spectrometry MS-based data acquisition algorithm A reference for comparison Protein sequence databases Search algorithms

5 MS-based Proteomics and Bioinformatics MS instrument is so far not sensitive enough to resolve proteins in a biological system solely based on signals measured. MS, however, is able to acquire sufficient data for mapping a protein from the database using new computer algorithms to analyze the data. This is the field of bioinformatics

6 Ion sourceMass analyzer Sample inlet Data acquisition vacuum Instrumentation

7

8

9 “Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.

10 MS-based Protein Identification  Mass Mapping Peptide Sequencing

11 Conventional Methodology - Expression Proteomics

12 Trypsin Digestion We know that trypsin cleaves polypeptides C-terminal to basic amino acids. -NH-CH(R 1 )-CO-NH-CH(R 2 )-CO- trypsin -NH-CH(R 1 )-COOHH 2 N-CH(R 2 )-CO- m/z Ion intensity

13 Mass Spectrometry Protein identified by database mapping

14 Automated Database Search Number 1 match: tumor necrosis factor type 1 receptor associated protein TRAP-1 (M r ): 76030.27 Total coverage: 33.4%

15 Minimal content of a « protein sequence » db Sequences !! Accession number (AC) Taxonomic data References ANNOTATION/CURATION Keywords Cross-references Documentation Bioinformatics I

16 SWISS-PROT/TrEMBL Collaboration between the SIB (CH) and EMBL/EBI (UK) SWISS-PROT: Fully annotated (manually), non-redundant, cross-referenced, documented protein sequence database. TrEMBL: is automatically generated (from annotated EMBL coding sequences (CDS)) and annotated using software tools. http://www.expasy.org/sprot/ Bioinformatics I

17 ExPASy Web Server ExPASy = Expert Protein Analysis System

18 Molecular Weight Search By Pappin and Bleasby History for MS Searching MOWSE MOWSE Ⅱ 1993 1996 1994 SEQUEST By Yates and Eng 1997 1998 MOWSE Ⅲ MASCOT By Matrix science

19

20

21

22

23

24

25

26

27

28 Scoring algorithm Final score= -10*LOG(P), where P is absolute probability that the observed match is a random event E value (expected value) = describes the number of hits one can expect to see by chance when searching a database of a particular size. A value of zero indicates that no matches would be expected by chance. Significant hits at 95% confidence level (p<0.05) there is less than a 1 in 20 chance that the observed match is a random event. 5 7 Increase mass tolerance

29

30

31 MS-based Protein Identification Mass Mapping  Peptide Sequencing

32

33 Tandem Mass Spectrometry- MS/MS MS/MS acquisition is controlled by software setting

34 Protein Identification Peptide Sequencing using MSMS peptide ABCDEF A AB ABC ABCD ABCDE ABCDE CID m/z precursor ion

35 Nomenclature used for CID peptide fragmentation- Low Energy (eV)- Q, TOF, FT “Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.

36 Protein Identification by Database Search

37

38

39

40 Trypsin Digestion We know that trypsin cleaves polypeptides C-terminal to basic amino acids. -NH-CH(R 1 )-CO-NH-CH(R 2 )-CO- trypsin -NH-CH(R 1 )-COOHH 2 N-CH(R 2 )-CO- m/z Ion intensity

41

42

43

44 Sequence Tag Approach for Peptide Sequencing “Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.

45 The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

46 Bioinformatics I NCBI BLAST http://www.ncbi.nlm.nih.gov/blast/ BLAST: Basic Local Alignment Search Tool

47 Sequence alignments and comparison 1: MYTAILORISRICH 2: MONTAILLEURESTRICHE 1: MY-TAIL--ORIS-RICH- ¦x ¦¦¦¦ x¦x¦ ¦¦¦¦ 2: MONTAILLEURESTRICHE ¦ = Identity x = Mismatch - = Insertion / Deletion 1: TAILO RICH ¦¦¦¦x ¦¦¦¦ 2: TAILL RICHE Global Alignment Two Local Alignments Bioinformatics I

48 HBA_CHICK VL-SAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHF-DL 48 HBAD_CHICK ML-TAEDKKLIQQAWEKAASHQEEFGAEALTRMFTTYPQTKTYFPHF-DL 48 HBPI_CHICK AL-TQAEKAAVTTIWAKVATQIESIGLESLERLFASYPQTKTYFPHF-DV 48 HBB_CHICK VHWTAEEKQLITGLWGKV--NVAECGAEALARLLIVYPWTQRFFASFGNL 48 HBE_CHICK VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFASFGNL 48 HBRH_CHICK VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFDNFGNL 48 MYG_CHICK GL-SDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGL 49.......*... * * * *...* * * *.. HBA_CHICK SH-----GSAQIKGHGKKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRV 93 HBAD_CHICK SP-----GSDQVRGHGKKVLGALGNAVKNVDNLSQAMAELSNLHAYNLRV 93 HBPI_CHICK SQ-----GSVQLRGHGSKVLNAIGEAVKNIDDIRGALAKLSELHAYILRV 93 HBB_CHICK SSPTAILGNPMVRAHGKKVLTSFGDAVKNLDNIKNTFSQLSELHCDKLHV 98 HBE_CHICK SSPTAIMGNPRVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCDKLHV 98 HBRH_CHICK SSPTAIIGNPKVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCEKLHV 98 MYG_CHICK KTPDQMKGSEDLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKI 99. *... **.*......... *.. *.. HBA_CHICK DPVNFKLLGQCFLVVVAIHHPAALTPEVHASLDKFLCAVGTVLTAKYR-- 141 HBAD_CHICK DPVNFKLLSQCIQVVLAVHMGKDYTPEVHAAFDKFLSAVSAVLAEKYR-- 141 HBPI_CHICK DPVNFKLLSHCILCSVAARYPSDFTPEVHAEWDKFLSSISSVLTEKYR-- 141 HBB_CHICK DPENFRLLGDILIIVLAAHFSKDFTPECQAAWQKLVRVVAHALARKYH-- 146 HBE_CHICK DPENFRLLGDILIIVLASHFARDFTPACQFAWQKLVNVVAHALARKYH-- 146 HBRH_CHICK DPENFRLLGNILIIVLAAHFTKDFTPTCQAVWQKLVSVVAHALAYKYH-- 146 MYG_CHICK PVKYLEFISEVIIKVIAEKHAADFGADSQAAMKKALELFRNDMASKYKEF 149.......*.......*... **. HBA_CHICK ---- 141 HBAD_CHICK ---- 141 HBPI_CHICK ---- 141 HBB_CHICK ---- 146 HBE_CHICK ---- 146 HBRH_CHICK ---- 146 MYG_CHICK GFQG 153 Consensus length: 154; Identity : 19 ( 12.3%); Similarity: 51 ( 33.1%) Character to show that a position in the alignment is perfectly conserved: '*' Character to show that a position is well conserved: '.' Multiple Sequence Alignment (MSA) Programs: CLUSTALW T_COFFEE MULTALIGN Bioinformatics I

49 Searching databases with multiple alignments PSI-BLAST: Position-Specific Iterative BLAST (Altschul et al., 1997) 1.Starting with a single sequence, PSI-BLAST searches a database using BLAST and builds a multiple sequence alignment and a profile. 2.The profile is then used to search the protein database again. 3.Running the program several times can further refine the profile and increase search sensitivity.

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65 Error tolerance search

66 0.2Da/0.2Da 32

67 0.05Da/0.05Da 27

68 0.5Da/0.5Da 33

69

70

71

72

73 MS/MS Scan Functions mass scan mode single mass transmission m2 m3 m1 m4 m2 Collision Chamber (gas) + + + + + + N2N2 + + + + + Q1 Q3 Product Ion Scan (PI) Fix Scan Multiple Reaction Mode (MRM) Fix Fix Precursor Ion Scan (PS) Scan Fix Neutral Loss Scan (NL) Scan Scan

74

75 IP + MS/ID for searching protein interaction complex

76

77

78

79 Conclusions Protein identification by MS is a key element of proteomics and the ID process is an informatics-based methodology. MS + sequence databases represent a huge leap for protein Biochemistry- A large scale analysis approach. Biochemical manipulation + protein ID is capable of providing functional information of proteins. Bioinformatics tools are needed to link proteomics data to protein interaction and biological pathways.


Download ppt "Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University."

Similar presentations


Ads by Google