Download presentation
Presentation is loading. Please wait.
1
Protein Encoding Optimization Student: Logan Everett Mentor: Endre Boros Funded by DIMACS REU 2004
2
Project Overview Model Biological Scoring Matrices Weighted Binary Hamming Space Optimize Using Linear Programming Accurate Random Generation
3
Scoring Matrices A Q M K R H… A R M I F L… 4 1 5 –3 –3 -3…
4
Encode To Binary Strings Hamming Distances Easy to Approximate on Binary Strings Statistically Proven Methods More Efficient How Do Similarity and Distance Relate? Inverse Relationship First Create “Real” Distance Vector: D
5
Precise Problem: Distortion D ij (1– ) h[ i, j ] D ij (1+ ) unique pairs i,j ( n C 2 ) s.t. 0 1 and 0
6
Encoding Scheme as Vector C = 0110101101010101010 S = 1010110101011010101 T = 0110101010011011010 P = 1010101011010100110 A = 1011010101011011010 G = 1010101100111010101 y2y1y2y1
7
Modified Inequality D(1– ) Ax D(1+ ) s.t. 0 1 and 0 Let x = y
8
Linear Programming Problem Need All Linear Expressions D(1 – ) Ax and Ax D(1 + ) -Ax – D -D and Ax – D D All x i, 0 Goal: Minimize Solve with CPLEX
9
Problem Size Number of Constraints (Rows) 2( n C 2 ) = 380 Number of Variables (Columns) 2 n-1 = 524,288 Total Size – App. 2x10 8 CPLEX – App. 1 Minute
10
Linear Programming Solution Solution Contains: Min Value of Scaled Weight Vector x Non-Integral Values in x Convert to p Vector X = x i p i = x i / X
11
Random Encodings Randomly Select Cross Sections Based on Percent Weights Can Scale For Any N-Length Encoding Longer Encodings Should Approach Minimum Distortion
12
Results
13
Courtesy of DIMACS Mentor: Endre Boros – RUTCOR Logan Everett – DIMACS REU 2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.