Download presentation
Presentation is loading. Please wait.
Published byMatilda Barton Modified over 9 years ago
1
Prediction of Subcellular Localization of Proteins ~ Past, Present, and Future ~ Human Genome Center, Inst. Med. Sci., University of Tokyo Kenta Nakai Swiss-Prot 20 Years
2
20 Years Ago.. I became a graduate student in Prof. Minoru Kanehisa’s lab I wanted to write a program that interprets the information encoded in DNA sequences But biology is full of exceptions
3
Diagnosis System of Bacterial Infections (MYCIN 1974) Enter Information about the patient. (Name, Age, Sex, and Race) Are there any positive cultures obtained from SALLY? … Has SALLY recently had symptoms of persistent headache or other abnormal neurologic symptoms (dizziness, lethargy, etc.)? … Enter Information about the patient. (Name, Age, Sex, and Race) Are there any positive cultures obtained from SALLY? … Has SALLY recently had symptoms of persistent headache or other abnormal neurologic symptoms (dizziness, lethargy, etc.)? … INFECTION-1 is MENINGITIS + MYCOBACTERIUM-TB [from clinical evidence only] + … [REC-1] My preferred therapy recommendation is as follows: 1) ETHAMBUTAL Dose: 1.289 (13.0 100mg-tablets) q24h PO for 60 days [calculated on basis of 25 mg/kg] then 770 mg (7.5 100mg-tablets) q24h PO.. INFECTION-1 is MENINGITIS + MYCOBACTERIUM-TB [from clinical evidence only] + … [REC-1] My preferred therapy recommendation is as follows: 1) ETHAMBUTAL Dose: 1.289 (13.0 100mg-tablets) q24h PO for 60 days [calculated on basis of 25 mg/kg] then 770 mg (7.5 100mg-tablets) q24h PO..
4
Knowledge Base for Automatic Reasoning Knowledge is represented as a collection of “if-then” rules, which are chained to make the system solve a realistic problem Rule 123 If: the gram stain of the organism is negative and: the aerobicity of the organism is anaerobic and: the morphology of the organism is rod then: the genus of the organism is bacteroides with a certainty factor of 0.6 Rule 123 If: the gram stain of the organism is negative and: the aerobicity of the organism is anaerobic and: the morphology of the organism is rod then: the genus of the organism is bacteroides with a certainty factor of 0.6 Working Memory Name: Sally Age: 42 years Sex: Female Race: … Working Memory Name: Sally Age: 42 years Sex: Female Race: …
5
Expert Systems Knowledge Base Inference Engine
6
Sample Problem
7
Prediction of Subcellular Localization
8
Typical Sorting Signals Signal FunctionExample Import into nucleus-P-P-K-K-K-R-K-V- Export from nucleus-L-A-L-K-L-A-G-L-D-I- Import into mitochondria<-MLSLRQSIRFFKPATRTLCSSRYLL- Import into plastid <-MVAMAMASLQSSMSSLSLSSNS FLGQPLSPITLSPFLQG- Import into peroxisomes-S-K-L-> Import into ER <-MMSFVSLLLVGILFWAT EAEQLTKCEVFN- Return to ER-K-D-E-L->
9
Amino Acid Composition Another good clue for prediction Suited for machine learning Outer membrane proteins and periplasmic proteins of Gram- negative bacteria
10
PSORT (I) Nakai & Kanehisa, 1991, 1992 Expert system using about 100 “If-then” rules ERMPMLSMERLLSLOTERMPMMT NCPXERMPMGGCP OMITMX GY motif KK signal peptide (Specific Signals) KDEL GPI Topology MTS NLS SKL TMS Topology Apolar Topology TMS in Mature Part signal cleavage site IM
11
Papers and the web server Nakai & Kanehisa, Proteins 1991 –cited 295 times Nakai & Kanehisa, Genomics 1992 –cited 961 times –34 in 2006 Web server since 1993
12
Limitations of PSORT Relatively low accuracy possibly because of the complexity of the sorting mechanisms It is difficult to optimize the certainty parameters assigned for each rule It is tedious to update the knowledge base with the growth of the training data
13
PSORT II Nakai & Horton, 1997, 1999 (cited 638 times) Machine learning kNN (k-nearest neighbor) method Q k = 3
14
iPSORT: Bannai et al. 2002 Rule 1 A protein has an SP if the sum of hydropathy index values within [6,25] exceeds 18.3 Rule 2 A protein has either an mTP or a cTP if it contains less than 3 D/Es within [1,30] and if it contains a motif similar to 11212111, where 2=(I,R),3=(D,E,H,K,N),1=otherwise Rule 3 A protein has an mTP if it satisfies Rule 2, if the sum of isoelectric point values within [1,15] exceeds 93, and if it contains a motif similar to 12211221, where 2=(K,R),3=(I,P),1=otherwise
15
PSORTb and PSORT.ORG Gardy et al. 2003, 2004 –Contribution from a Canadian group (Brinkman lab) Update for bacterial proteins
16
WoLF-PSORT Horton et al. 2006 Latest PSORT update for eukaryotic proteins WoLF: Women only Love Fools!?
17
Current Dilemma More data are necessary to improve the training process The practical value of prediction methods becomes less with the growth of experimental data Moreover, the more we investigate, the more the number of exceptions grows
18
It’s a General Problem Gene Finding Prediction of Protein Structure … Knowing the answer of a problem before we become to know how to solve it Similarity search against the data of typical model organisms will become enough in many cases
19
New Generation Predictors Should be useful to engineer proteins for their targeting sites Should complement errors of proteome analyses (i.e., isoforms with differential localization) Comprehensively example-based rather than statistical feature-based (such as amino acid composition)
20
Biology is like Linguistics Both are naturally born and full of exceptions There may not exist “general principles”
21
Future of Sequence Analysis It will become “DNA linguistics” Large dictionaries (databases) will contain both general cases and exceptions Such databases may be a sort of knowledge base that can be used to simulate the subcellular processes
22
Past, Present, and Future Past –Expert system-based predictions Present –Machine learning-based predictions Future –Combination of both? –Revival of knowledge bases to simulate cellular processes?
23
Acknowledgments Minoru Kanehisa Paul Horton Hideo Bannai, Satoru Miyano Jennifer Gardy, Fiona Brinkman And all the other people who contributed to the PSORT project!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.