Download presentation
Presentation is loading. Please wait.
Published byGilbert Fletcher Modified over 8 years ago
1
1/62 An Iterative Relaxation Technique for the NMR Backbone Assignment Problem Wen-Lian Hsu Institute of Information Science Academia Sinica
2
2/62 Characteristics of Our Method Model this as a constraint satisfaction problem Solve it using natural language parsing techniques Both top-down and bottom-up Both top-down and bottom-up An iterative approach Create spin systems based on noisy data. Create spin systems based on noisy data. Link spin systems by using maximum independent set finding techniques. Link spin systems by using maximum independent set finding techniques.
3
3/62 Outline Introduction Method Experiment Results Conclusion
4
4/62 Blind Man’s Elephant We cannot directly “see” the positions of these atoms (the structure) But we can measure a set of parameters (with constraints) on these atoms Which can help us infer their coordinates Which can help us infer their coordinates Each experiment can only determine a subset of parameters (with noises) To combine the parameters of different experiments we need to stitch them together
5
5/62 The Flow of NMR Experiments Structure ConstraintsResonance assignment Get protein Samples Calculation and simulation - Energy minimization - Fitness of structure constraints Collect NMR spectra
6
6/62 Find out Chemical Shift for Each Atom Backbone atoms: Ca, Cb, C’, N, NH Various experiments: HSQC, CBCANH, CBCACONH, HN(CA)CO, HNCO, HN(CO)CA, HNCA Side chain: all others (especially CHs) TOCSY-HSQC, HCCCONH, CCCONH, HCCH-TOCSY CC CC N H H CC CC CC H2H2 H2H2 H3H3 Chemical Shift Assignment One amino acid
7
7/62 H-C-H C H-C-HH -N-C-C-N-C-C-N-C-C-N-C-C- O O O O H H H H HO H H-C-H CH3 Backbone Some Relevant Parameters ppm 18-23 19-2416-20 17-23 31-34 55-60 CH3 30-35
8
8/62 Backbone: Ca, Cb, C’, N, NH HSQC, CBCANH, CBCA(CO)NH, HN(CA)CO, HNCO, HN(CO)CA, HNCA sequential assignment chemical shifts of Ca, Cb, NH HSQC Three important experiments
9
Our NMR spectra CBCANH CBCA(CO)NH HSQC CBCA(CO)NH (2 peaks) HNCACB (4 peaks)
10
10/62 HSQC Spectra HSQC peaks (1 chemical shifts for an amino acid) HNIntensity 8.109118.6065920032 HSQC
11
11/62 CBCA(CO)NH Spectra CBCA(CO)NH peaks (2 chemical shifts for one amino acid) HNCIntensity 8.116118.2516.3779238811 8.109118.6036.5265920032
12
12/62 CBCANH Spectra CBCANH peaks (4 chemical shifts for one amino acid) Ca (+), Cb (-) Ca (+), Cb (-) HNCIntensity 8.116118.2516.3779238811 8.109118.6036.52 ─65920032 8.117118.9061.58 ─51223894 8.119117.2557.42109928374 ++ --
13
13/62 A Dataset Example HSQC HNCACB 4 CBCA(CO)NH 2 N H
14
14/62 Backbone Assignment Goal Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone. Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone. General approaches Generate spin systems Generate spin systems A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb).A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb). Link spin systems Link spin systems
15
15/62 Ambiguities All 4 point experiments are mixed together All 2 point experiments are mixed together Each spin system can be mapped to several amino acids in the protein sequence False positives, false negatives
16
16/62 Previous Approaches Constrained bipartite matching problem The spin system might be ambiguous The spin system might be ambiguous Can’t deal with ambiguous link Can’t deal with ambiguous link Legal matching Illegal matching under constraints
17
17/62 Natural Language Processing ─ Signal or Noise? Speech recognition : Homophone selection 台 北 市 一 位 小 孩 走 失 了 台 北 市 小 孩 台 北 適 宜 走 失 事 宜 一 位 一 味 移 位
18
18/62 An Error-Tolerant Algorithm
19
19/62 Phrase, Sentence Combination
20
20/62 句意模版 句型模版 片語模版 字詞模版 Hierarchical Analysis
21
Perfect Group Each spin group contains 6 points, in which 4 points are from the first experiments 2 points are from the second experiment H O H N H C C C C C H O H N H C C C C C
22
Perfect Group Each spin group contains 6 points, in which 4 points are from the first experiments 2 points are from the second experiment H O H N H C C C C C H O H N H C C C C C
23
23/62 NHCIntensity 113.2937.89756.2941.64325e+008 113.2937.89727.8531.08099e+008 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi 56.29428.16562.54468.483 NHCIntensity113.2937.9262.5448.52851e+007 113.2937.9256.2944.71331e+007 113.2937.9268.483-8.54121e+007 113.2937.9228.165-3.49346e+007 CBCA(CO)NH CBCANH i -1 Ca Cb A Perfect Spin System Group
24
24/62 False Positives and False Negatives False positives Noise with high intensity Noise with high intensity Produce fake spin systems Produce fake spin systems False negatives Peaks with low intensity Peaks with low intensity Missing peaks Missing peaks In real wet-lab data, nearly 50% are noises (false positive).
25
25/62 Spin System Group Perfect False Negative False Positive N H
26
26/62 Outline Introduction Method Experiment Results Conclusion
27
27/62 Main Idea Deal with false negative in spin system generation procedures. Eliminate false positive in spin system linking procedures. Perform spin system generation and linking procedures in an iterative fashion.
28
28/62 Spin System Group Generation Three types of spin system group are generated based on the quality of CBCANH data: Perfect Perfect Weak false negative Weak false negative Severe false negative Severe false negative
29
29/62 Perfect Spin Systems A spin system is determined without any added pseudo peak. NHCIntensity 113.2937.89756.2941.64325e+008 113.2937.89727.8531.08099e+008 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi 56.29428.16562.54468.483 NHCIntensity113.2937.9262.5448.52851e+007 113.2937.9256.2944.71331e+007 113.2937.9268.483-8.54121e+007 113.2937.9228.165-3.49346e+007 CBCA(CO)NH CBCANH i -1 Ca Cb
30
30/62 Weak False Negative Spin System Group NHCIntensity 115.4819.60460.0441.30407e+008 115.4819.60430.666.93923e+007 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi 60.04431.29159.41927.583 A spin system is determined with an added pseudo peak. NHCIntensity115.4819.61659.4192.25295e+008 115.4819.61631.291-4.82097e+007 115.4819.61627.853-1.33326e+008 CBCA(CO)NH CBCANH i -1 Ca Cb Ca 115.481 9.604 60.044 1.30407e+008
31
31/62 Severe false Negative Spin System Group NHCIntensity 119.8578.43528.1663.36293e+007 119.8578.43559.4191.56434e+008 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi 59.41928.16658.48128.79 A spin system is determined with two added pseudo peaks. NHCIntensity119.8568.47758.4813.7353e+008 119.8568.47728.79-2.55735e+008 CBCA(CO)NH CBCANH 119.857 8.435 28.166 3.36293e+007 119.857 8.435 59.419 1.56434e+008 i -1 Ca Cb Ca Note: it is also possible that C a i-1 = 28.166 and C b i-1 = 59.419
32
32/62 A note on spin system generation To generate *ALL* possible spin systems, a peak can be included in more than one spin system. False positives are eliminated in spin system linking procedure. False positives are eliminated in spin system linking procedure. False negative are treated by adding pseudo peaks. False negative are treated by adding pseudo peaks. A rule-based mechanism is used to filter out incompatible spin systems (false positives). Adopt maximum weight independent set algorithm Adopt maximum weight independent set algorithm
33
33/62 Spin System Linking Goal Link spin system as long as possible. Link spin system as long as possible. Constraints Each spin system is uniquely assigned to a position of the target protein sequence. Each spin system is uniquely assigned to a position of the target protein sequence. Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds. Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.
34
A Peculiar Parking Lot (valet parking) Information you have: The make of your car, the car parked in front of you (approximately). Together with others, try to identify as many cars in the right order as possible (maximizing the overall satisfaction).
35
Backbone Assignment DGRIGEIKGRKTLATPAVRRLAMENNIKLS
36
36/62 Spin System Positioning 55.266 38.675 44.555 0 44.417 0 55.043 30.04 44.417 0 30.665 28.72 55356 29.782 60.044 37.541 D 50G 10R 40I 50|51 55.266 38.675 44.555 0 => 50 10 44.417 0 55.043 30.04 =>10 40 44.417 0 30.665 28.72 =>10 40 55356 29.782 60.044 37.541 => 40 50 We assign spin system groups to a protein sequence according to their codes. Spin System
37
37/62 Segment 3 Segment 2 Segment 1 Link Spin System groups 55.266 38.675 44.555 0 44.417 0 55.043 30.04 44.417 0 30.665 28.72 55356 29.782 60.044 37.541 DGRI
38
38/62 Iterative Concatenation DGRI….FKJJREKL …. Step n Segment 99 1 2 …. 56 Spin Systems 1 2 47 1 Step1 56 … Step2 Segment 1 Segment 2 Segment 31 … Step n-1 Segment 78Segment 79 …
39
39/62 Conflict Segments DGRIGEIKGRKTLATPAVRRLAMENNIKLS Segment 78 Segment 71 Segment 79 Segment 99Segment 98 Segment 97 Two kinds of conflict segments Overlap (e.g. segment 71, segment 99) Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1 )
40
40/62 A Graph Model for Spin System Linking G(V,E) V: a set of nodes (segments). V: a set of nodes (segments). E: (u, v), u, v V, u and v are conflict. E: (u, v), u, v V, u and v are conflict. Goal Assign as many non-conflict segments as possible => find the maximum independent set of G. Assign as many non-conflict segments as possible => find the maximum independent set of G.
41
41/62 An Example of G Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE Segment1: SP12->SP13->SP14 Segment2: SP9->SP13->SP20->SP4 Segment3: SP8->SP15->SP21 Segment4: SP7->SP1->SP15->SP3 Seg1Seg3Seg4Seg2 Seg1 Seg3 Seg2 Seg4 SP13 SP15 Overlap
42
42/62 Segment weight The larger length of segment is, the higher weight of segment is. The less frequency of segment is, the higher of segment is.
43
43/62 Find Maximum Weight Independent Set of G Boppana, R. and M.M. Halld ό rsson, Approximatin Maximum Independent Sets bt Excluding Subgraphs. BIR, 1992. 32(2).
44
44/62 An Iterative Approach We perform spin system generation and linking iteratively. Three stages.
45
45/62 First Stage Generate perfect spin systems; Perform spin system concatenation on spin systems (newly generated perfect) to generate segments; Perform spin system concatenation on spin systems (newly generated perfect) to generate segments; Retain segments that contain at least 3 spin systems; Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments; Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in the resulting segments. Drop spin systems (and related peaks) that are used in the resulting segments.
46
46/62 Second Stage Generate weak false negative spin systems. Perform segment extension on the resulting segments of the first iteration (using unused perfect and newly generated weak false negative); Perform segment extension on the resulting segments of the first iteration (using unused perfect and newly generated weak false negative); Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate longer segments; Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate longer segments; Retain segments that contain at least 3 spin systems; Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments; Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in the resulting segments. Drop spin systems (and related peaks) that are used in the resulting segments.
47
47/62 Third Stage Generate severe false negative spin systems. Perform segment extension on the resulting segments of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false negative); Perform segment extension on the resulting segments of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false negative); Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false negative) to generate longer segments; Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false negative) to generate longer segments; Retain segments that contain at least 3 spin systems; Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments. Perform MaxIndSet on the segments.
48
48/62 ….FKJJREKL…. Segment Extension 109 1 2 …. 45 12 29 109 29 New 109 New spin systems
49
49/62 Segment Extension DGRGEKGRKTLATPAVRRLAMENNIKLS DGRGEKGRKTLATPAVRRLAMENNIKLS MaxIndSet 77 99‘ 97‘ 99 97 45 23 26 31 29 32 33 24 27 28 77 71 78 99‘ 97‘ 99 97
50
50/62 Outline Introduction Method Experimental Results Conclusion
51
51/62 Experimental Results Two datasets obtained from our collaborator Dr. Tai-Huang, Huang in IBMS, Academia Sinica: Average precision: 87.5% Average precision: 87.5% Average recall: 73.1% Average recall: 73.1% Perfect data from BMRB: 99.1%
52
52/62 Real Wet-Lab Datasets The two datasets are obtained from our collaborator Dr. Tai- Huang, Huang in IBMS at Academia Sinica, Taiwan. Datasetssbdlbd # of amino acids5385 # of amino acids that are assigned manually by biologists4280 # of HSQC peaks5878 # of CBCA(CO)NH peaks258271 # of HNCACB peaks224620 # of expected CBCA(CO)NH84160 # of expected HNCACB168320 false positive of CBCA(CO)NH67.4% 41.0 % false positive of HNCACB25.0% 48.4 %
53
53/62 Experimental Results on Real Data datasetssbdlbd # of amino acid 5385 # of assigned amino acid 4281 # of HSQC 5878 # of CBCANH peaks 224620 # of CBCA(CO)NH peaks 258271 # of correctly assigned# of assignedaccuracyrecall Method on sbd323591.4%76.2% Method on lbd566783.6%70.0%
54
54/62 Outline Introduction Method Experiment Results Conclusion
55
55/62 Conclusion We model the backbone assignment problem as a constraint satisfaction problem This problem is solved using a natural language parsing technique (both bottom- up and top-down approach) The same approach seem to work for a large class of noise reduction problems that are discrete in nature
56
56/62 A genetic algorithm for NMR backbone resonance assignment (I) Randomly generate a population of chromosomes Each chromosome represents a possible backbone resonance assignment Each chromosome represents a possible backbone resonance assignment Fitness function Evaluate the fitness of each chromosome according to the connectivity between adjacent amino acids Evaluate the fitness of each chromosome according to the connectivity between adjacent amino acids
57
57/62 A genetic algorithm for NMR backbone resonance assignment (II) Crossover operation An offspring inherits different connected blocks from parents An offspring inherits different connected blocks from parents Mutation operation Make a new connected block from any position to increase the popular diversity Make a new connected block from any position to increase the popular diversity
58
58/62 Generation of a random chromosome Step1. Randomly select a position x Step2. Randomly select a SSGroup i from CL( x ) Step3. Extend connected fragments from i to both sides by using adjacency lists until no more extension can be found. Step4. Repeat Step1~Step3 until all positions are assigned. 27116221832172359
59
59/62 Fitness Evaluation Fitness(ch) = The number of connected pairs associate with their chemical shift differences. Two principles: 1. The more connected pairs it has, the higher score it gets. 2. The less chemical shift differences it has, the higher score it gets. 27116221832172359 Building Blocks: connected fragments
60
60/62 Crossover Operation 27116221832172359 parents offspring cutting site
61
61/62 Mutation operation Once a position is going to mutate, the following positions will also mutate to produce a connected fragments. Mutation point
62
62/62 Experiment Results The accuracy on two real dataset SBD:95.1% (FP: 67%) SBD:95.1% (FP: 67%) LBD:100% (FP: 48%) LBD:100% (FP: 48%) The average accuracy on perfect BMRB datasets (902 proteins)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.