1/60 An Iterative Relaxation Technique for the NMR Backbone Assignment Problem Wen-Lian Hsu Institute of Information Science Academia Sinica.

Slides:



Advertisements
Similar presentations
Protein NMR terminology COSY-Correlation spectroscopy Gives experimental details of interaction between hydrogens connected via a covalent bond NOESY-Nuclear.
Advertisements

At this point, we have used COSY and TOCSY to connect spin
Areas of Spectrum.
Protein NMR.
QR Code Recognition Based On Image Processing
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Reference Assisted Nucleic Acid Sequence Reconstruction from Mass Spectrometry Data Gabriel Ilie 1, Alex Zelikovsky 2 and Ion Măndoiu 1 1 CSE Department,
Object Detection by Matching Longin Jan Latecki. Contour-based object detection Database shapes: …..
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Smart Templates for Chemical Identification in GCxGC-MS QingPing Tao 1, Stephen E. Reichenbach 2, Mingtian Ni 3, Arvind Visvanathan 2, Michael Kok 2, Luke.
Reconstructing Circular Order from Inaccurate Adjacency Information Applications in NMR Data Interpretation Ming-Yang Kao.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
What is an assignment? Associate a given signal back to the originating spin.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics,
How to make a presentation (Oral and Poster) Dr. Bernard Chen Ph.D. University of Central Arkansas July 5 th Applied Research in Healthy Information.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
1 Refined Solution Structure of HIV-1 Nef Stephen Grzesiek, Ad Bax, Jin-Shan Hu, Joshua Kaufman, Ira Palmer, Stephen J Stahl, Nico Tjandra and Paul T.
The Effects of Ranging Noise on Multihop Localization: An Empirical Study from UC Berkeley Abon.
Presented by Tienwei Tsai July, 2005
S EGMENTATION FOR H ANDWRITTEN D OCUMENTS Omar Alaql Fab. 20, 2014.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Protein Secondary Structure Prediction: A New Improved Knowledge-Based Method Wen-Lian Hsu Institute of Information Science Academia Sinica, Taiwan.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Common parameters At the beginning one need to set up the parameters.
Biomolecular Nuclear Magnetic Resonance Spectroscopy BASIC CONCEPTS OF NMR How does NMR work? Resonance assignment Structure determination 01/24/05 NMR.
Sample From User Collect NMR Data by users Structure Calc. & Display Schedule Set up Experiments Data transfer Maintain Software Instruction Maintain NMRs.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
-1/2 E +1/2 low energy spin state
Automating Steps in Protein Structure Determination by NMR CS April 13, 2009.
1/67 Institute of Information Science, Academia Sinica Research Assistant: Lin, Hsin-Nan 林信男.
Biomolecular Nuclear Magnetic Resonance Spectroscopy FROM ASSIGNMENT TO STRUCTURE Sequential resonance assignment strategies NMR data for structure determination.
National Magnetic Resonance Facility At Madison NMRFAM
The number of protons yielding correlations in a 2D NOESY spectrum quickly overwhelms the space available on A 2D map. 15N labeling can help simplify the.
Approximation Algorithms For Protein Folding Prediction Giancarlo MAURI,Antonio PICCOLBONI and Giulio PAVESI Symposium on Discrete Algorithms, pp ,
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Jia-Ming Chang 0508 Graph Algorithms and Their Applications to Bioinformatics 1/38.
Biomolecular Nuclear Magnetic Resonance Spectroscopy BASIC CONCEPTS OF NMR How does NMR work? Resonance assignment Structural parameters 01/28/08 Reading:
Protein NMR Part II.
Graph-based Deformable Matching of 3D Line Segments with Application in Protein Fitting 12 1 HANG DOU 1, MATTHEW L BAKER 2, TAO JU Washington University.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
A Critique and Improvement of an Evaluation Metric for Text Segmentation A Paper by Lev Pevzner (Harvard University) Marti A. Hearst (UC, Berkeley) Presented.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
1/62 An Iterative Relaxation Technique for the NMR Backbone Assignment Problem Wen-Lian Hsu Institute of Information Science Academia Sinica.
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
. Finding Motifs in Promoter Regions Libi Hertzberg Or Zuk.
Assignment Strategies -N-C  -CO-N-C  -CO- H H-C-H H O-C-O HH H-C-H O-H -N-C  -CO-N-C  -CO- H H-C-H H O-C-O HH H-C-H O-H Homonuclear  Two steps needed.
Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
These 2D methods work for proteins up to about 100 amino acids, and even here, anything from amino acids is difficult. We need to reduce the complexity.
Areas of Spectrum. Remember - we are thinking of each amino acid as a spin system - isolated (in terms of 1 H- 1 H J-coupling) from the adjacent amino.
Protein NMR IV - Isotopic labeling
Protein NMR Spectroscopy Institute of Biomedical Sciences
Calcium-Induced Conformational Switching of Paramecium Calmodulin Provides Evidence for Domain Coupling Jaren et al. Biochemistry 2002, 41,
Real-time Wall Outline Extraction for Redirected Walking
NMR Spectroscopy – Part 2
NMR Spectroscopy Question and Answer Session
Unmasking the Annexin I Interaction from the Structure of Apo-S100A11
Proteins Have Too Many Signals!
Presentation transcript:

1/60 An Iterative Relaxation Technique for the NMR Backbone Assignment Problem Wen-Lian Hsu Institute of Information Science Academia Sinica

2 Characteristics of Our Method  Model this as a constraint satisfaction problem  Solve it using natural language parsing techniques Both top-down and bottom-up Both top-down and bottom-up  An iterative approach Create spin systems based on noisy data. Create spin systems based on noisy data. Link spin systems by using maximum independent set finding techniques. Link spin systems by using maximum independent set finding techniques.

3 Outline  Introduction  Method  Experiment Results  Conclusion

4 Blind Man’s Elephant  We cannot directly “see” the positions of these atoms (the structure)  But we can measure a set of parameters (with constraints) on these atoms Which can help us infer their coordinates Which can help us infer their coordinates Each experiment can only determine a subset of parameters (with noises) To combine the parameters of different experiments we need to stitch them together

5 The Flow of NMR Experiments Structure ConstraintsResonance assignment Get protein Samples Calculation and simulation - Energy minimization - Fitness of structure constraints Collect NMR spectra

6 Find out Chemical Shift for Each Atom Backbone atoms: Ca, Cb, C’, N, NH Various experiments: HSQC, CBCANH, CBCACONH, HN(CA)CO, HNCO, HN(CO)CA, HNCA Side chain: all others (especially CHs) TOCSY-HSQC, HCCCONH, CCCONH, HCCH-TOCSY CC CC N H H CC CC CC H2H2 H2H2 H3H3 Chemical Shift Assignment One amino acid

7 H-C-H C H-C-HH -N-C-C-N-C-C-N-C-C-N-C-C- O O O O H H H H HO H H-C-H CH3 Backbone Some Relevant Parameters ppm CH

8 Backbone: Ca, Cb, C’, N, NH HSQC, CBCANH, CBCA(CO)NH, HN(CA)CO, HNCO, HN(CO)CA, HNCA  sequential assignment  chemical shifts of Ca, Cb, NH HSQC Three important experiments

Our NMR spectra CBCANH CBCA(CO)NH  HSQC  CBCA(CO)NH (2 peaks)  HNCACB (4 peaks)

10 HSQC Spectra  HSQC peaks (1 chemical shifts for an amino acid) HNIntensity HSQC

11 CBCA(CO)NH Spectra  CBCA(CO)NH peaks (2 chemical shifts for one amino acid) HNCIntensity

12 CBCANH Spectra  CBCANH peaks (4 chemical shifts for one amino acid) Ca (+), Cb (-) Ca (+), Cb (-) HNCIntensity ─ ─

13 A Dataset Example  HSQC  HNCACB 4  CBCA(CO)NH 2 N H

14 Backbone Assignment  Goal Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone. Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone.  General approaches Generate spin systems Generate spin systems A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb).A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb). Link spin systems Link spin systems

15 Ambiguities  All 4 point experiments are mixed together  All 2 point experiments are mixed together  Each spin system can be mapped to several amino acids in the protein sequence  False positives, false negatives

16 Previous Approaches  Constrained bipartite matching problem The spin system might be ambiguous The spin system might be ambiguous Can’t deal with ambiguous link Can’t deal with ambiguous link Legal matching Illegal matching under constraints

17 Natural Language Processing ─ Signal or Noise?  Speech recognition : Homophone selection 台 北 市 一 位 小 孩 走 失 了 台 北 市 小 孩 台 北 適 宜 走 失 事 宜 一 位 一 味 移 位

18 An Error-Tolerant Algorithm

19 Phrase, Sentence Combination

20 句意模版 句型模版 片語模版 字詞模版 Hierarchical Analysis

Perfect Group   Each spin group contains 6 points, in which 4 points are from the first experiments 2 points are from the second experiment H O H  N H C C C C C    H O H  N H C C C C C   

Perfect Group   Each spin group contains 6 points, in which 4 points are from the first experiments 2 points are from the second experiment H O H  N H C C C C C    H O H  N H C C C C C   

23 NHCIntensity e e+008 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi NHCIntensity e e e e+007 CBCA(CO)NH CBCANH i -1 Ca Cb A Perfect Spin System Group

24 False Positives and False Negatives  False positives Noise with high intensity Noise with high intensity Produce fake spin systems Produce fake spin systems  False negatives Peaks with low intensity Peaks with low intensity Missing peaks Missing peaks  In real wet-lab data, nearly 50% are noises (false positive).

25 Spin System Group Perfect False Negative False Positive N H

26 Outline  Introduction  Method  Experiment Results  Conclusion

27 Main Idea  Deal with false negative in spin system generation procedures.  Eliminate false positive in spin system linking procedures.  Perform spin system generation and linking procedures in an iterative fashion.

28 Spin System Group Generation  Three types of spin system group are generated based on the quality of CBCANH data: Perfect Perfect Weak false negative Weak false negative Severe false negative Severe false negative

29 Perfect Spin Systems  A spin system is determined without any added pseudo peak. NHCIntensity e e+008 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi NHCIntensity e e e e+007 CBCA(CO)NH CBCANH i -1 Ca Cb

30 Weak False Negative Spin System Group NHCIntensity e e+007 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi  A spin system is determined with an added pseudo peak. NHCIntensity e e e+008 CBCA(CO)NH CBCANH i -1 Ca Cb Ca e+008

31 Severe false Negative Spin System Group NHCIntensity e e+008 C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi  A spin system is determined with two added pseudo peaks. NHCIntensity e e+008 CBCA(CO)NH CBCANH e e+008 i -1 Ca Cb Ca Note: it is also possible that C a i-1 = and C b i-1 =

32 A note on spin system generation  To generate *ALL* possible spin systems, a peak can be included in more than one spin system. False positives are eliminated in spin system linking procedure. False positives are eliminated in spin system linking procedure. False negative are treated by adding pseudo peaks. False negative are treated by adding pseudo peaks.  A rule-based mechanism is used to filter out incompatible spin systems (false positives). Adopt maximum weight independent set algorithm Adopt maximum weight independent set algorithm

33 Spin System Linking  Goal Link spin system as long as possible. Link spin system as long as possible.  Constraints Each spin system is uniquely assigned to a position of the target protein sequence. Each spin system is uniquely assigned to a position of the target protein sequence. Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds. Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.

A Peculiar Parking Lot (valet parking) Information you have: The make of your car, the car parked in front of you (approximately). Together with others, try to identify as many cars in the right order as possible (maximizing the overall satisfaction).

Backbone Assignment DGRIGEIKGRKTLATPAVRRLAMENNIKLS

36 Spin System Positioning D 50G 10R 40I 50| => => => =>  We assign spin system groups to a protein sequence according to their codes. Spin System

37 Segment 3 Segment 2 Segment 1 Link Spin System groups DGRI

38 Iterative Concatenation DGRI….FKJJREKL …. Step n Segment …. 56 Spin Systems Step1 56 … Step2 Segment 1 Segment 2 Segment 31 … Step n-1 Segment 78Segment 79 …

39 Conflict Segments DGRIGEIKGRKTLATPAVRRLAMENNIKLS Segment 78 Segment 71 Segment 79 Segment 99Segment 98 Segment 97  Two kinds of conflict segments Overlap (e.g. segment 71, segment 99) Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1 )

40 A Graph Model for Spin System Linking  G(V,E) V: a set of nodes (segments). V: a set of nodes (segments). E: (u, v), u, v  V, u and v are conflict. E: (u, v), u, v  V, u and v are conflict.  Goal Assign as many non-conflict segments as possible => find the maximum independent set of G. Assign as many non-conflict segments as possible => find the maximum independent set of G.

41 An Example of G  Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE Segment1: SP12->SP13->SP14 Segment2: SP9->SP13->SP20->SP4 Segment3: SP8->SP15->SP21 Segment4: SP7->SP1->SP15->SP3 Seg1Seg3Seg4Seg2 Seg1 Seg3 Seg2 Seg4 SP13 SP15 Overlap

42 Segment weight  The larger length of segment is, the higher weight of segment is.  The less frequency of segment is, the higher of segment is.

43 Find Maximum Weight Independent Set of G  Boppana, R. and M.M. Halld ό rsson, Approximatin Maximum Independent Sets bt Excluding Subgraphs. BIR, (2).

44 An Iterative Approach  We perform spin system generation and linking iteratively.  Three stages.

45 First Stage  Generate perfect spin systems; Perform spin system concatenation on spin systems (newly generated perfect) to generate segments; Perform spin system concatenation on spin systems (newly generated perfect) to generate segments; Retain segments that contain at least 3 spin systems; Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments; Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in the resulting segments. Drop spin systems (and related peaks) that are used in the resulting segments.

46 Second Stage  Generate weak false negative spin systems. Perform segment extension on the resulting segments of the first iteration (using unused perfect and newly generated weak false negative); Perform segment extension on the resulting segments of the first iteration (using unused perfect and newly generated weak false negative); Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate longer segments; Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate longer segments; Retain segments that contain at least 3 spin systems; Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments; Perform MaxIndSet on the segments; Drop spin systems (and related peaks) that are used in the resulting segments. Drop spin systems (and related peaks) that are used in the resulting segments.

47 Third Stage  Generate severe false negative spin systems. Perform segment extension on the resulting segments of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false negative); Perform segment extension on the resulting segments of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false negative); Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false negative) to generate longer segments; Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false negative) to generate longer segments; Retain segments that contain at least 3 spin systems; Retain segments that contain at least 3 spin systems; Perform MaxIndSet on the segments. Perform MaxIndSet on the segments.

48 ….FKJJREKL…. Segment Extension … New 109 New spin systems

49 Segment Extension DGRGEKGRKTLATPAVRRLAMENNIKLS DGRGEKGRKTLATPAVRRLAMENNIKLS MaxIndSet 77 99‘ 97‘ ‘ 97‘ 99 97

50 Outline  Introduction  Method  Experimental Results  Conclusion

51 Experimental Results  Two datasets obtained from our collaborator Dr. Tai-Huang, Huang in IBMS, Academia Sinica: Average precision: 87.5% Average precision: 87.5% Average recall: 73.1% Average recall: 73.1%  Perfect data from BMRB: 99.1%

52 Real Wet-Lab Datasets  The two datasets are obtained from our collaborator Dr. Tai- Huang, Huang in IBMS at Academia Sinica, Taiwan. Datasetssbdlbd # of amino acids5385 # of amino acids that are assigned manually by biologists4280 # of HSQC peaks5878 # of CBCA(CO)NH peaks # of HNCACB peaks # of expected CBCA(CO)NH84160 # of expected HNCACB false positive of CBCA(CO)NH67.4% 41.0 % false positive of HNCACB25.0% 48.4 %

53 Experimental Results on Real Data datasetssbdlbd # of amino acid 5385 # of assigned amino acid 4281 # of HSQC 5878 # of CBCANH peaks # of CBCA(CO)NH peaks # of correctly assigned# of assignedaccuracyrecall Method on sbd %76.2% Method on lbd %70.0%

54 Outline  Introduction  Method  Experiment Results  Conclusion

55 Conclusion  We model the backbone assignment problem as a constraint satisfaction problem  This problem is solved using a natural language parsing technique (both bottom- up and top-down approach)  The same approach seem to work for a large class of noise reduction problems that are discrete in nature