Download presentation
Presentation is loading. Please wait.
1
Bioinformatics Tool Development Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064 http://digbio.missouri.edu
2
Components of development Identify a problem Algorithm Application Math. model Software engineering
3
Identify the problem What is exactly the problem? New ideas? Is the problem biologically important? Significance of the work? New problem or improvement? Improve accuracy or speed? Is the computationally problem solvable? Simulate human quantum mechanically?
4
Mathematical Model What is the underlying math problem? Baseline information study Formulation Definition
5
Algorithm (1) Pick up the right method Implementation Testing
6
Algorithm (2) Implementation Data structure/representation Language: C, C++, Perl, Java, Matlab? Unix/Linux or Windows? Modular programming (objected oriented) Style: should be user oriented!!!
7
Algorithm (3) Debugging Tools: gdb, dbx, Visual C++ Logic? Toy cases Print intermediates
8
Algorithm (4) Testing and code refinement Benchmark select good test set, Jack knifes… internal test application to real cases beta test send to friendly users for initial tests
9
Software Engineering (1) Suggestions: Easy to read (structured with comments) Avoid “spaghetti” code (goto) Easy to modify Portable to other machines Always think about computational complexity and clock cycles Use dynamic memory allocation
10
Software Engineering (2) Polynomial evaluation y = a+b*x+ c*x**2.0+d*x**3.0+e*x**4.0+f*x**5.0 (42.3 s) y = a+b*x+ c*x**2+d*x**3+e*x**4+f*x**5 (5.63 s) y = a+b*x+ c*x*x+d*x*x*x+e*x*x*x*x+f*x*x*x*x*x (3.15 s) x2 = x*x (2.83 s) x4 = x2*x2 y = a+b*x+ c*x2+d*x*x2+e*x4+f*x*x4 y = a+x*(b+x*(c+x*(d+x*(e+f*x)))) (1.83 s)
11
Software Engineering (3) Precision: Big numbers Tiny numbers Iteration effects Machine dependent score = 1- [(1-P 1 ) (1-P 2 ) (1-P 3 ) (1-P 4 )] = 1- exp [ ( Log(1-P 1 ) + Log(1-P 2 ) + Log(1-P 3 ) + Log(1-P 4 ) ) ]
12
Software Engineering (4) Precision: 1 + ½ + 1/3 + ¼+…+ 1/(M-1) + 1/M = log (M) M -> infinity Forward sumBackward sumlog (M) M= 10^614.357414.392713.8155 M=10^815.403718.807918.4207
13
Software Engineering (5) Loop optimization (1): C program for (i=0; i<1000; i++) (78 msec) for (j=0; j<1000; j++) c[i][j] = c[i][j] + a[i][j] + b[i][j] for (j=0; j<1000; j++)(1860 msec) for (i=0; i<1000; i++) c[i][j] = c[i][j] + a[i][j] + b[i][j]
14
Software Engineering (6) Loop optimization (2): for (i=0; i<100000; i++) (30 msec) x = x*a[i] + b[i] for (i=0; i<100000; i++) y = y*a[i] + b[i] for (i=0; i<100000; i++) (16 msec) { x = x*a[i] + b[i] y = y*a[i] + b[i] }
15
Software Engineering (7) Compiler optimization switch: -O (often improve by 50%, but depending on machines) -O2 (same as –O on some machines): simple inline optimization -O3 (-O4 on some machines): more complex optimizations designed to pipeline code, but may alter semantics)
16
Software Engineering (8) Friendly user interface Graphics, Web, options, automation Pipeline interface with other tools parallel computing multiple machine (server/client) network query
17
Applications Get feedback for adding new features Find good experimental collaborators From tools to papers Continues bug reports
18
Summary Identify a problem: solvable, biologically important Mathematical model: formulation and definition Algorithm: rigorous method, fast implementation, and systematic testing Software Engineering: friendly user interface and integration of different tools Application: work with experimentalists
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.