Download presentation
Presentation is loading. Please wait.
1
Bioinformatics for Proteomics Shu-Hui Chen ( 陳淑慧 ) Department of Chemistry National Cheng Kung University
2
Transcription DNA 5’3’ mRNA Splicing Translation Poly- peptide Folding Protein Transport / Localization Oligomerization PTM (Post-Translational Modification) Function How do we find protein coding regions, introns and exons in genomic DNA sequences? Bioinformatics I
3
What is Proteomics ? Systematic analysis of All protein sequences All protein expression pattern All protein interactions This involves Protein isolation Protein separation Protein identification Functional characterization of all proteins
4
The tools of Proteomics Traditional protein chemistry assay methods struggle to establish Identity Identity requires: Specificity of measurement (Precision) Mass Spectrometry MS-based data acquisition algorithm A reference for comparison Protein sequence databases Search algorithms
5
MS-based Proteomics and Bioinformatics MS instrument is so far not sensitive enough to resolve proteins in a biological system solely based on signals measured. MS, however, is able to acquire sufficient data for mapping a protein from the database using new computer algorithms to analyze the data. This is the field of bioinformatics
6
Ion sourceMass analyzer Sample inlet Data acquisition vacuum Instrumentation
9
“Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.
10
MS-based Protein Identification Mass Mapping Peptide Sequencing
11
Conventional Methodology - Expression Proteomics
12
Trypsin Digestion We know that trypsin cleaves polypeptides C-terminal to basic amino acids. -NH-CH(R 1 )-CO-NH-CH(R 2 )-CO- trypsin -NH-CH(R 1 )-COOHH 2 N-CH(R 2 )-CO- m/z Ion intensity
13
Mass Spectrometry Protein identified by database mapping
14
Automated Database Search Number 1 match: tumor necrosis factor type 1 receptor associated protein TRAP-1 (M r ): 76030.27 Total coverage: 33.4%
15
Minimal content of a « protein sequence » db Sequences !! Accession number (AC) Taxonomic data References ANNOTATION/CURATION Keywords Cross-references Documentation Bioinformatics I
16
SWISS-PROT/TrEMBL Collaboration between the SIB (CH) and EMBL/EBI (UK) SWISS-PROT: Fully annotated (manually), non-redundant, cross-referenced, documented protein sequence database. TrEMBL: is automatically generated (from annotated EMBL coding sequences (CDS)) and annotated using software tools. http://www.expasy.org/sprot/ Bioinformatics I
17
ExPASy Web Server ExPASy = Expert Protein Analysis System
18
Molecular Weight Search By Pappin and Bleasby History for MS Searching MOWSE MOWSE Ⅱ 1993 1996 1994 SEQUEST By Yates and Eng 1997 1998 MOWSE Ⅲ MASCOT By Matrix science
28
Scoring algorithm Final score= -10*LOG(P), where P is absolute probability that the observed match is a random event E value (expected value) = describes the number of hits one can expect to see by chance when searching a database of a particular size. A value of zero indicates that no matches would be expected by chance. Significant hits at 95% confidence level (p<0.05) there is less than a 1 in 20 chance that the observed match is a random event. 5 7 Increase mass tolerance
31
MS-based Protein Identification Mass Mapping Peptide Sequencing
33
Tandem Mass Spectrometry- MS/MS MS/MS acquisition is controlled by software setting
34
Protein Identification Peptide Sequencing using MSMS peptide ABCDEF A AB ABC ABCD ABCDE ABCDE CID m/z precursor ion
35
Nomenclature used for CID peptide fragmentation- Low Energy (eV)- Q, TOF, FT “Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.
36
Protein Identification by Database Search
40
Trypsin Digestion We know that trypsin cleaves polypeptides C-terminal to basic amino acids. -NH-CH(R 1 )-CO-NH-CH(R 2 )-CO- trypsin -NH-CH(R 1 )-COOHH 2 N-CH(R 2 )-CO- m/z Ion intensity
44
Sequence Tag Approach for Peptide Sequencing “Bioanalytical Chemistry” Mikkelsen, S.R., published by John Wiley & Sons, Inc.
45
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
46
Bioinformatics I NCBI BLAST http://www.ncbi.nlm.nih.gov/blast/ BLAST: Basic Local Alignment Search Tool
47
Sequence alignments and comparison 1: MYTAILORISRICH 2: MONTAILLEURESTRICHE 1: MY-TAIL--ORIS-RICH- ¦x ¦¦¦¦ x¦x¦ ¦¦¦¦ 2: MONTAILLEURESTRICHE ¦ = Identity x = Mismatch - = Insertion / Deletion 1: TAILO RICH ¦¦¦¦x ¦¦¦¦ 2: TAILL RICHE Global Alignment Two Local Alignments Bioinformatics I
48
HBA_CHICK VL-SAADKNNVKGIFTKIAGHAEEYGAETLERMFTTYPPTKTYFPHF-DL 48 HBAD_CHICK ML-TAEDKKLIQQAWEKAASHQEEFGAEALTRMFTTYPQTKTYFPHF-DL 48 HBPI_CHICK AL-TQAEKAAVTTIWAKVATQIESIGLESLERLFASYPQTKTYFPHF-DV 48 HBB_CHICK VHWTAEEKQLITGLWGKV--NVAECGAEALARLLIVYPWTQRFFASFGNL 48 HBE_CHICK VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFASFGNL 48 HBRH_CHICK VHWSAEEKQLITSVWSKV--NVEECGAEALARLLIVYPWTQRFFDNFGNL 48 MYG_CHICK GL-SDQEWQQVLTIWGKVEADIAGHGHEVLMRLFHDHPETLDRFDKFKGL 49.......*... * * * *...* * * *.. HBA_CHICK SH-----GSAQIKGHGKKVVAALIEAANHIDDIAGTLSKLSDLHAHKLRV 93 HBAD_CHICK SP-----GSDQVRGHGKKVLGALGNAVKNVDNLSQAMAELSNLHAYNLRV 93 HBPI_CHICK SQ-----GSVQLRGHGSKVLNAIGEAVKNIDDIRGALAKLSELHAYILRV 93 HBB_CHICK SSPTAILGNPMVRAHGKKVLTSFGDAVKNLDNIKNTFSQLSELHCDKLHV 98 HBE_CHICK SSPTAIMGNPRVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCDKLHV 98 HBRH_CHICK SSPTAIIGNPKVRAHGKKVLSSFGEAVKNLDNIKNTYAKLSELHCEKLHV 98 MYG_CHICK KTPDQMKGSEDLKKHGATVLTQLGKILKQKGNHESELKPLAQTHATKHKI 99. *... **.*......... *.. *.. HBA_CHICK DPVNFKLLGQCFLVVVAIHHPAALTPEVHASLDKFLCAVGTVLTAKYR-- 141 HBAD_CHICK DPVNFKLLSQCIQVVLAVHMGKDYTPEVHAAFDKFLSAVSAVLAEKYR-- 141 HBPI_CHICK DPVNFKLLSHCILCSVAARYPSDFTPEVHAEWDKFLSSISSVLTEKYR-- 141 HBB_CHICK DPENFRLLGDILIIVLAAHFSKDFTPECQAAWQKLVRVVAHALARKYH-- 146 HBE_CHICK DPENFRLLGDILIIVLASHFARDFTPACQFAWQKLVNVVAHALARKYH-- 146 HBRH_CHICK DPENFRLLGNILIIVLAAHFTKDFTPTCQAVWQKLVSVVAHALAYKYH-- 146 MYG_CHICK PVKYLEFISEVIIKVIAEKHAADFGADSQAAMKKALELFRNDMASKYKEF 149.......*.......*... **. HBA_CHICK ---- 141 HBAD_CHICK ---- 141 HBPI_CHICK ---- 141 HBB_CHICK ---- 146 HBE_CHICK ---- 146 HBRH_CHICK ---- 146 MYG_CHICK GFQG 153 Consensus length: 154; Identity : 19 ( 12.3%); Similarity: 51 ( 33.1%) Character to show that a position in the alignment is perfectly conserved: '*' Character to show that a position is well conserved: '.' Multiple Sequence Alignment (MSA) Programs: CLUSTALW T_COFFEE MULTALIGN Bioinformatics I
49
Searching databases with multiple alignments PSI-BLAST: Position-Specific Iterative BLAST (Altschul et al., 1997) 1.Starting with a single sequence, PSI-BLAST searches a database using BLAST and builds a multiple sequence alignment and a profile. 2.The profile is then used to search the protein database again. 3.Running the program several times can further refine the profile and increase search sensitivity.
65
Error tolerance search
66
0.2Da/0.2Da 32
67
0.05Da/0.05Da 27
68
0.5Da/0.5Da 33
73
MS/MS Scan Functions mass scan mode single mass transmission m2 m3 m1 m4 m2 Collision Chamber (gas) + + + + + + N2N2 + + + + + Q1 Q3 Product Ion Scan (PI) Fix Scan Multiple Reaction Mode (MRM) Fix Fix Precursor Ion Scan (PS) Scan Fix Neutral Loss Scan (NL) Scan Scan
75
IP + MS/ID for searching protein interaction complex
79
Conclusions Protein identification by MS is a key element of proteomics and the ID process is an informatics-based methodology. MS + sequence databases represent a huge leap for protein Biochemistry- A large scale analysis approach. Biochemical manipulation + protein ID is capable of providing functional information of proteins. Bioinformatics tools are needed to link proteomics data to protein interaction and biological pathways.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.