Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright  2003 limsoon wong Recognition of Protein Features Limsoon Wong Institute for Infocomm Research BI6103 guest lecture on ?? March 2004.

Similar presentations


Presentation on theme: "Copyright  2003 limsoon wong Recognition of Protein Features Limsoon Wong Institute for Infocomm Research BI6103 guest lecture on ?? March 2004."— Presentation transcript:

1 Copyright  2003 limsoon wong Recognition of Protein Features Limsoon Wong Institute for Infocomm Research BI6103 guest lecture on ?? March 2004

2 Copyright  2003 limsoon wong Lecture Plan Membrane proteins Subcellular localization

3 Copyright  2003 limsoon wong Recognition of Transmembrane Helices

4 Copyright  2003 limsoon wong Eukaryotic Cells Eukaryotic cells have membrane-bound compartments with specialized functions

5 Copyright  2003 limsoon wong Lipids & Membrane Membrane is a double layer of lipids and associated proteins which define subcellular compartments or enclose the cell Lipids consist of a “polar head group” and long-chain fatty acids This dual nature promotes formation of lipid bilayers “Hydrophobic tails” are shielded from aqueous environment Water-soluble (i.e., charged or polar) molecules cant pass through this impermeable barrier Permeability across the bilayer is regulated by membrane proteins that span the bilayer and function like channels or pores

6 Copyright  2003 limsoon wong all-  -barrel Membrane Proteins Two types of membrane proteins: Integral vs peripheral Two types of integral membrane proteins: all-  vs  -barrel

7 Copyright  2003 limsoon wong Topography & Topology topography: predict location of transmembrane segment topology: predict location of N- and C- termini wrt lipid bilayer We focus on topography prediction for all-  membrane proteins Lipid molecules

8 Copyright  2003 limsoon wong Datasets Jayasinghe et al. Protein Sci, 10:455-458, 2001 –59 high resolution membrane proteins –www.biocomp.unibo.it/gigi/ENSEMBLE Moller et al. Bioinformatics, 16:1159--1160, 2000 –151 low resolution membrane proteins Jones et al., Biochem., 33(10):3038--3049, 1994 –38 multi-spanning and 45 single-spanning membrane proteins –topologies experimentally determined Sonnhammer et al., ISMB, 6:175-182, 1998 –108 multi-spanning and 52 single-spanning membrane proteins –most of experimentally determined topologies, but less reliably determined than Jones et al.

9 Copyright  2003 limsoon wong Monne et al., JMB, 288:141--145, 1999: Turn Propensity Scale for TM Helices E. coli Lep protein contains two TM domains (H1, H2) and C-terminal doman P2 Translocation of P2 to lumenal side is easy to test by glycoslation Replace H2 by 40 residue poly-L segment LIK 4 L 21 XL 7 VL 10 Q 3 P The poly-L segment can form either one long TM or 2 closely-spaced TM helices, depending on what is substituted for X ER

10 Copyright  2003 limsoon wong Monne et al., JMB, 288:141--145, 1999: Turn Propensity Scale for TM Helices Using the poly-L segment, measure “turn” propensity of the 20 amino acids by substituting them for the X in the poly-L segment Hydrophobic residues (I, V, L, F, C, M, A) do not induce turn Charged and polar residues (except S & T) induce turn Exercise: –What are the charged/polar residues? –What could be reason of S & T not inducing turn? glycoslated non-glycoslated

11 Copyright  2003 limsoon wong Monne et al., JMB, 288:141--145, 1999 In all-  membrane proteins, –hydrophobic residues prefer membrane env and have low turn propensity –charged & polar residues induce turn formation to avoid membrane interior  prediction of TM helix  distinction of 1 long TM helix vs 2 closely spaced TM helices Monne et al., JMB, 288:141--145, 1999: Turn Propensity Scale for TM Helices

12 Copyright  2003 limsoon wong Monne et al., JMB, 288:141--145, 1999 Inside of cellular membrane is hydrophobic Segment of protein that spans membrane is expected to contain many hydrophobic amino acids  Locate segments that have high average “hydrophobicity” score Wiess et al, ISMB, 1:420--421, 1993 Hydrophobicity Approach

13 Copyright  2003 limsoon wong Wiess et al, ISMB, 1:420--421, 1993 Hydrophobicity Approach find a segment of 10 to 70aa with hp > 0.71 expand to longer segment with hp > 0.35 mark this segment as TM repeat above starting from position after previous segment Caveats: –may be unable to distinguish hydrophobic core of nonmembrane proteins vs. transmembrane regions –what are the right thresholds? Adjustable thresholds

14 Copyright  2003 limsoon wong An Example: Bacteriorhodopsin http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=protein&list_uids=461610&dopt=GenPept&term=bacteriorhodopsin&qty=1 1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag 61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp 121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii 181 gtegagvvgl gvetlafmvl dvta 7 transmembrane helices

15 Copyright  2003 limsoon wong An Example: Bacteriorhodopsin 1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag 61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp 121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii 181 gtegagvvgl gvetlafmvl dvta After applying hydrophobicity scale...

16 Copyright  2003 limsoon wong An Example: Bacteriorhodopsin Compute hydrophobicity score, hp > 7 1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag 61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp 121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii 181 gtegagvvgl gvetlafmvl dvta TM identified: 6/7, TM FP: 0 TM residue identified: 62/117, TM residue FP: 4

17 Copyright  2003 limsoon wong An Example: Bacteriorhodopsin Expand segment, maintain hp > 5, avoid low hydrophobicity 1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag 61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp 121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii 181 gtegagvvgl gvetlafmvl dvta TM identified: 6/7, TM FP: 0 TM residue identified: 100/117, TM residue FP:15

18 Copyright  2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, A HMM Approach There are 3 main locations of a residue: –TM helix core (viz., in hydrophobic tail of membrane –TM helix cap (viz., in head of membrane) cytoplasmic vs non-cytoplasmic side of the helix core –loops cytoplasimc vs non-cytoplasmic (short) vs non-cytoplasmic (long)  So needs HMM with 7 states Exercise: What is the 7th state for? cyto non-cyto

19 Copyright  2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Architecture cyto non-cyto Each state has an associated probability distribution over the 20 amino acids characterizing the variability of amino acids in the region it models

20 Copyright  2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Architecture The first 3 and last 2 core states have to be traversed. But all other core states can be bypassed. This models core regions of 5--25 residues

21 Copyright  2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Architecture The states of globular, loop, & cap regions. The caps are 5 residues each. Since core is 5--25 residues, this allows for helices 15--35 residues long To model bias in amino acid usage near cap To model neutral amino acid distribution

22 Copyright  2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Training the HMM Stage 1: Baum-Welch is used for maximum likelihood estimation from “diluted” labeled training data. As precise end of TM is only approximately known, we “dilute” by unlabeling 3 residues on each side of a helix boundary to accommodate this Stage 2: Baum-Welch is used for maximum likelihood estimation from “relabeled” training data. The original training data are diluted as by unlabeling 5 residues on each side of a helix boundary. Model from Stage 1 is used to produce “relabeled training data” by relabeling this part under constraints of remaining labels Stage 3: Model from Stage 2 is further tuned by a method for “discriminative” training, to maximize probability of correct prediction (Krogh, ISMB, 5:179--186, 1997)

23 Copyright  2003 limsoon wong Krogh, ISMB, 5:179--186, 1997: Discriminative HMM Training

24 Copyright  2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Example Non-cytoplasmic Cytoplasmic TM segment Datasets Jones et al., Biochem., 33(10):3038--3049, 1994 Sonnhammer et al., ISMB, 6:175-182, 1998

25 Copyright  2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Accuracy (10-CV) All TM segments & their orientation correctly predicted All TM segments correctly predicted, ignoring orientation precision Jones et al Sonnhammer et al

26 Copyright  2003 limsoon wong NNHMM1HMM2 ENSEMBLE Martelli et al. Bioinformatics, 19:i205--i211, 2003 ENSEMBLE

27 Copyright  2003 limsoon wong ENSEMBLE: The Neural Network Part The NN part is a cascade shown above, a la Rost et al., Protein Science, 1995 h1h1 h2h2 h5h5 HMM LOOP Input layer 17*2 inputs 1 17 15 hidden units 17 * 20 input units Feed-forward back-propagation neural network

28 Copyright  2003 limsoon wong ENSEMBLE: The HMM1 Part HMM1 models the hydrophobic nature of most TM helices, a la Krogh et al. JMB 2001 & Sonnhammer et al., ISMB 1998

29 Copyright  2003 limsoon wong ENSEMBLE: The HMM2 Part HMM2 models TM helices that are mix of hydrophobic and hydrophilic residues, ala Martelli et al., Bioinformatics 2002.

30 Copyright  2003 limsoon wong NNHMM1HMM2 ENSEMBLE ENSEMBLE: Predicting if a residue is in TM  NN(p,i) = NN(H,p,i)  NN(L,p,i)  HMM 1 (p,i) = AP 1 (H,p,i)  AP 1 (I,p,i)  AP 1 (O,p,i)  HMM 2 (p,i) = AP 2 (H,p,i)  AP 2 (I,p,i)  AP 2 (O,p,i) E(p,i) = (  NN(p,i) +  HMM 1 (p,i) +  HMM 2 (p,i)) / 3 position helix loop (inner I, outer O) E(p,i) > 0 means residue i of protein p is in TM helix

31 Copyright  2003 limsoon wong Ensemble: Topography Prediction Fariselli et al., Bioinformatics, 2003 NNHMM1HMM2 ENSEMBLE MaxSubSeq TM helix found by MaxSubSeq but would be missed w/o it This path is taken means positions m to j form a helix

32 Copyright  2003 limsoon wong Ensemble: Topography Prediction Results A prediction is considered correct if (a) the number of TM segments is correct and (b) the overlap between a predicted and a real TM segment > 8aa

33 Copyright  2003 limsoon wong Topology Prediction: Postive-Inside Rule Gavel et al., FEBS, 282:41--46, 1991 Positively- charged residues (Lys and Arg) are enriched more than 2 fold in stromal vs luminal loops

34 Copyright  2003 limsoon wong Topology Prediction: Ensemble “positive-inside” rule

35 Copyright  2003 limsoon wong Ensemble: Topology Prediction Results

36 Copyright  2003 limsoon wong Short Break

37 Copyright  2003 limsoon wong Subcellular Localization

38 Copyright  2003 limsoon wong Compartments and Sorting Eukaryotic cells requires proteins be targeted to their subcellular destinations Protein sorting is determined by specific amino acid sequences, or “signals”, within the protein Secretory pathway targets proteins to plasma membrane, some membrane- bound organelles such as lysosomes, or to export proteins from the cell

39 Copyright  2003 limsoon wong Secretory Pathway The secretory pathway consists of the endoplasmic reticulum (ER), Golgi apparatus and transport vesicles The transport vesicles carry proteins from one compartment to the other Exocytosis is mediated by fusion of secretory vesicles with the plasma membrane. Endocytosis is the opposite of exocytosis and involves the uptake of extracellular material by pinching off vesicles from the plasma membrane The contents of the endocytic vesicles are delivered to the lysosomes by membrane fusion Lysosomes contain hydrolytic enzymes that breakdown macromolecules into the smaller subunits which can be utilized by the cell for its own biosynthesis

40 Copyright  2003 limsoon wong Datasets Reinhartdt & Hubbard, NAR, 26:2230--2236, 1998 –2427 eukaryotic proteins for 4 locations (cytoplasmic, extracellular, nuclear,& mitochondrial) –997 prokaryotic proteins for 3 locations (cytoplasmic, extracellular, & periplasmic) Park & Kanehisa, Bioinformatics, 19:1656--1663, 2003 –7589 eukaryotic proteins from 709 organisms for 12 locations (chloroplast, cytoplasmic, cytoskeleton, ER, extracellular, golgi, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, vacuolar) Chou & Cai, JBC., 277:45765--45769, 2002 –2191 proteins for 12 locations Emanuelsson et al., JMB, 300:1005--1016, 2000 Gardy et al., NAR, 31:3613--3617, 2003

41 Copyright  2003 limsoon wong Common Eukaryotic Protein Sorting Signals For a comprehensive list of cellular localization sites, see http://mendel.imp.univie.ac.at/CELL_LOC/index.html

42 Copyright  2003 limsoon wong Schematic View of Sorting Signals cleavage site ~25aa

43 Copyright  2003 limsoon wong Sequence Logos of SP, mTP, & cTP SP signal peptide mTP mitochondrial transfer peptide cTP chloroplast transit peptide

44 Copyright  2003 limsoon wong Neural Network Approach: TargetP Emanuelsson et al., JMB, 300:1005--1016, 2000 cTP, mTP, SP –4 hidden units –feedforward NNs –input windows: 55aa (cTP), 35aa (mTP), 27aa (SP) sparsely encoded Integrating Network –0 hidden unit –feedforward NN –input is taken from the outputs of cTP, mTP, SP networks over 100aa at N-terminal cTP: chloroplast transit peptide, mTP: mitochondria transfer peptide, SP: signal peptide

45 Copyright  2003 limsoon wong TargetP: Performance Dataset: Emanuelsson et al., JMB, 2000

46 Copyright  2003 limsoon wong Expert System Approach: PSORT Horton & Nakai, ISMB, 1997 A simplified version of the decision tree that PSORT uses to check and reason over various sorting signals

47 Copyright  2003 limsoon wong A Refinement: PSORT-B Gardy et al., NAR, 31:3613--3617, 2003 SCL- BLAST MotifsHMMTOP Outer Membrane Protein SubLocC Signal Peptides Bayesian Network Localization sites or “unknown” Sites considered –cytoplasm –inner membrane –periplasm –outer membrane –extracellular space

48 Copyright  2003 limsoon wong PSORT-B: SCL-BLAST Homology to a protein of known localization is good indicator of a protein’s actual localization site  BLAST target protein against a database of proteins whose localization sites are known  Return localization sites of hits at E-value of 10e -10 over 80% of length

49 Copyright  2003 limsoon wong PSORT-B: Motifs Some motifs in PROSITE may be able to identify subcellular localization with 100% precision  Scan target protein against a database of such motifs (28 such 100%-precision motifs are known)  Return localization sites corresponding to the motif hits

50 Copyright  2003 limsoon wong PSORT-B: HMMTOP  -helical transmembrane region is reliable indicator of localization to inner membrane  Scan target protein for transmembrane  helices using HMMTOP  Return localization site as “inner membrane” if >2  helices found

51 Copyright  2003 limsoon wong PSORT-B: Outer Membrane Proteins Outer-membrane proteins have characteristics  - barrel structure  Identify freq seq occurring only in  -barrel proteins (279 such freq seq known)  Scan target protein for these freq seq  Return localization site as “outer membrane” if >2 such freq seq found

52 Copyright  2003 limsoon wong PSORT-B: SubLocC Overall amino acid composition is useful for recognizing cytoplasmic proteins  Trained SVM on overall amino acid composition to predict cytoplasmic vs non- cytoplasmic, as in SubLoc  Analyze target protein’s amino acid composition using this SVM

53 Copyright  2003 limsoon wong PSORT-B: Signal Peptides Presence of signal peptide at N- terminal means protein not cytoplasmic  Train HMM and SVM to recognize signal peptides and their cleavage sites  If high-confidence cleavage site found by HMM in first 70aa of target protein, then “non-cytoplasmic”  If low-confidence cleavage site found, pass candidate signal peptide to SVM to confirm  If confirmed, then “non-cytoplasmic”  Otherwise, “unknown”

54 Copyright  2003 limsoon wong PSORT-B: Bayesian Network Bayesian Network integrates results from the 6 modules Produces a score for each of the 5 possible localization sites If a site scores >7.5, then predicts as a localization site of the target protein If no site scores >7.5, then makes no prediction

55 Copyright  2003 limsoon wong PSORT-B: Performance of Individual Modules Dataset: Gardy et al., NAR, 2003

56 Copyright  2003 limsoon wong PSORT-B: Performance wrt Localization Sites PSORT-B is a considerable improvement over original PSORT Dataset: Gardy et al., NAR, 2003

57 Copyright  2003 limsoon wong PSORT vs PSORT-B: Some Remarks PSORT considers various signal/features in a top-down way driven by its reasoning tree PSORT-B generates all signal/features in a bottom-up way, then integrate them for decision making using Bayesian Network Machine learning “beats” human expert? Probably the number of features/rules needed is too much/complicated

58 Copyright  2003 limsoon wong Amino acid composition of proteins residing in different sites are different

59 Copyright  2003 limsoon wong Amino Acid Composition Differences each cellular location has own characteristic physio-chemical environment proteins in each location have adapted thru evolution to that environment thus reflected in the protein structure and amino acid composition If the above is true, the amino acid composition differences wrt cellular location sites should be more pronounced on protein surfaces than protein interior Exercise: Why?

60 Copyright  2003 limsoon wong Adaptation of Protein Surfaces Andrade et al., JMB, 1998 Proportion of j th amino acid type in i th protein To test the theory of adaptation of protein surfaces to subcellular localization, we do a plot of 3 types of composition vectors along their first two principal components

61 Copyright  2003 limsoon wong Adaptation of Protein Surfaces Andrade et al., JMB, 1998 Total amino acid composition vector Surface amino acid composition vector Interior amino acid composition vector Clearly total & surface composition vectors show better separation than interior composition vectors

62 Copyright  2003 limsoon wong Amino Acid Composition This means can use amino acid composition vectors, especially those from protein surfaces, to predict subcellular localization! Let’s see how this turn out….

63 Copyright  2003 limsoon wong Neural Networks: NNPSL Reinhardt & Hubbard, NAR, 26:2230--2236, 1998 Input 1 Input 20 cytoplasmic extracellular mitochodrial nuclear fraction of each amino acid in the input protein

64 Copyright  2003 limsoon wong NNPSL: Performance Outputs NNPSL have values 0 to 1. The difference (  ) between the highest and the next highest nodes can be used as a reliability index 0 <  < 0.2 0.2 <  < 0.4 0.4 <  < 0.6 0.6 <  < 0.8 0.8 <  < 1 Dataset: Reinhardt & Hubbard, NAR, 1998

65 Copyright  2003 limsoon wong Performance Emanuelsson, BIB, 3:361--376, 2002 (940 proteins) (2738 proteins) Dataset: Emanuelsson et al., JMB, 2000

66 Copyright  2003 limsoon wong Markov Chain Yuan, FEBS Letters, 451:23--26, 1999 Why?

67 Copyright  2003 limsoon wong Markov Chain: Performance NNPSL4th Order Markov (Eukaryotic) Dataset: Reinhardt & Hubbard, NAR, 1998

68 Copyright  2003 limsoon wong Support Vector Machines: SubLoc Hua & Sun, Bioinformatics, 17:721--728, 2001 extracellular vs rest nuclear vs rest cytoplasmic vs rest mitochondrial vs rest Argmax X X-vs-rest SVM The SVMs use polynomial kernel with d = 9 (prokaryotic), K(X i,X j ) = (X i ·X j + 1) d RBF kernel with  =16 (eukaryotic), K(X i, X j ) = exp(-  |X i - X j | 2 20-dimensional vector giving amino acid composition of the input protein

69 Copyright  2003 limsoon wong SubLoc: Performance NNPSL SubLoc (Eukaryotic) Dataset: Reinhardt & Hubbard, NAR, 1998

70 Copyright  2003 limsoon wong SubLoc: Robustness of Amino Acid Composition Approach Amazingly, accuracy of SubLoc is virtually unaffected when the first 10, 20, 30, & 40 amino acids in a protein are deleted Amino acid composition is a robust indicator of subcellular localization, and is insensitive to errors in N-terminal sequences

71 Copyright  2003 limsoon wong Amino Acid Composition: Taking it Further How about pairs of consecutive amino acids? (a.k.a 2-grams) How about 3- grams, …, k-grams? How about pseudo amino acid composition? How about presence of entire functional domains? (I.e. think of the presence/absence of a functional domain as a summary of amino acid sequence info...)

72 Copyright  2003 limsoon wong Functional Domain Composition Chou & Cai, JBC, 277:45765--45769, 2002 Training seqs of various localization sites BLAST against db of known functional domains (SBASE-A) amino acid composition + Train SVM using these vectors x i = 1 means ith domain is present

73 Copyright  2003 limsoon wong Functional Domain Composition: Performance Not so good Why?  Number of known domains in SBASE-A too small  Need to handle situation where a protein has no hit in known domains Dataset: Reinhardt & Hubbard, NAR, 1998

74 Copyright  2003 limsoon wong Functional Domain Composition Cai & Chou, BBRC, 305:407--411, 2003 Training seqs of various localization sites BLAST against db of known functional domains (Interpro) NN-5875D: Train k-NN (k=1) using these vectors or, if no hit found Pseudo amino acid composition Amino acid composition NN-40D: Train k-NN (k=1) using these vectors If a protein got a hit in Interpro, use NN-5875D; else use NN-40D

75 Copyright  2003 limsoon wong Functional Domain Composition: Performance Dataset: Reinhardt & Hubbard, NAR, 1998

76 Copyright  2003 limsoon wong Notes

77 Copyright  2003 limsoon wong References (Transmembrane) Wiess et al. “Transmembrane segment prediction from protein sequence data”, ISMB, 420--421, 1993 Gavel et al. “The positive-inside rule applies to thylakoid membrane proteins”, FEBS 282:41--46, 1991 Monne et al. “A turn propensity scale for transmembrane helices”, JMB, 288:141--145, 1999 Sonnhammer et al. “A hidden Markov model for predicting transmembrane helices in protein sequences”, ISMB, 6:175--182, 1998 Martelli et al. “An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins”, Bioinformatics, 19(suppl):i205--i211, 2003

78 Copyright  2003 limsoon wong References (Transmembrane) Von Heijne. “Membrane protein structure prediction”, JMB, 225: 487--494, 1992 Jacoboni et al. “Prediction of the transmembrane regions of beta-barrel membrane proteins with a neural network- based predictor”, Protein Sci., 10:779--787, 2001 Martelli et al. “a sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins”, Bioinformatics, 18:S46--S53, 2002 Moller et al. “Evaluation of methods for the prediction of membrane spanning regions”, Bioinformatics, 17:646--653, 2001 Fariselli et al. “MaxSubSeq: an algorithm for segment- length optimization. The case study of the transmembrane spanning segments”, Bioinformatics, 19:500--505, 2003

79 Copyright  2003 limsoon wong References (Transmembrane) Rost et al. “Transmembrane helices predicted at 95% accuracy”, Protein Sci., 4:521--533, 1995 Krogh et al. “Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes”, JMB, 305:567--580, 2001 Andersson et al. “Different positively charged amino acids have similar effectson the topology of a polytopic transmembrane protein in E. coli”, JBC, 267:1491--1495, 1992

80 Copyright  2003 limsoon wong References (Subcellular Localization) Horton & Nakai, “Better prediction of protein cellular localization sites with the k-nearest neighbours classifier”, ISMB, 5:147--152, 1997 Gardy et al., “PSORT-B: Improving protein subcellular localization for Gram-negative bacteria”, NAR, 31:3613--3617, 2003 Emanuelsson, “Predicting protein subcellular localization from amino acid sequence information”, BIB, 3:361--376, 2002 Andrade et al., “Adaptation of protein surfaces to subcellular location”, JMB, 276:517--525, 1998 Yuan, “Prediction of protein subcellular locations using Markov chain models”, FEBS Letters, 451:23--26, 1999

81 Copyright  2003 limsoon wong References (Subcellular Localization) Emanuelsson et al., “ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites”, Protein Sci., 8:978--984, 1999 Emanuelsson et al., "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence", JMB, 300:1005-1016, 2000 Hua & Sun, “Support vector machine approach for protein subcellular localization prediction”, Bioinformatics, 17:721--728, 2001 Reinhardt & Hubbard, “Using neural networks for prediction of the subcellular location of proteins”, NAR, 26:2230--2236, 1998

82 Copyright  2003 limsoon wong References (Subcellular Localization) Cai & Chou, “Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition”, BBRC, 305:407--411, 2003 Chou & Cai, “Using functional domain composition and support vector machines for prediction of protein subcellular location”, JBC, 277:45765--45769, 2002 Park & Kanehisa, “Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs”, Bioinformatics, 19:1656--1663, 2003

83 Copyright  2003 limsoon wong References (PTM) Eisenhaber et al. “Post-translational GPI-lipid anchor modification of proteins in kingdoms of life: analysis of protein sequence data from complete genomes”, Protein Engineering,14(1):17-25, 2001 Eisenhaber et al. “Automated annotation of GPI anchor sites: case study C. elegans”,Trends Biochem Sci., 25(7):340-341, 2000 Eisenhaber et al. “Prediction of potential GPI-modification sites in proprotein sequences”, JMB, 292(3):741-758, 1999 Eisenhaber et al. “Sequence properties of GPI-anchored proteins near the omega-site: constraints for the polypeptide binding site of the putative transamidase”, Protein Engineering, 11(12):1155-1161, 1998 Not Used


Download ppt "Copyright  2003 limsoon wong Recognition of Protein Features Limsoon Wong Institute for Infocomm Research BI6103 guest lecture on ?? March 2004."

Similar presentations


Ads by Google