Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison.

Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison of large sequences (up to 250 000 000) 5 Efficient data search structures and algorithms 6 Proteins...

2. Comparison of short sequences (<10.000 bps) Summary (more or less) 2.1 Dot matrix 2.2 Pairwise alignment. 2.3 Hash algorithms. 2.4 Multiple alignment.

2. Dot matrix Given two sequences, how we can analyse their degree of identity? By searching those parts that match: S1S1 S2S2 x y 1/0 1 if both characters coincide

2. Dot matrix Given two sequences, how we can analyse their degree of identity? By searching those parts that match: S1S1 S2S2 x y S1S1 S2S2 x..x.. y..... 1/0 1 if both characters coincide ?

2.1 Dot matrix What is the cost of the algorithm? When are the matchings relevant? accaccacaccacaacgagcata … acctgagcgatat acc..tacc..t L=window length m(i,j)=1 iff S1(i..i+L)=S2(j..j+L): exact matching m(i,j)=1 iff k over L coincide: approximate matching. m(i,j)=k iff k over L coincide: approximate matching

2.1. Dot matrix: algorithm cost accaccacaccacaacgagcata … acctgagcgatat acc..tacc..t long(S1)*long(S2)* L in other words O(n 2 L) can long(S1)*long(S2) be possible? can we also say that O(n 2 ) is independent of L?

2.1. Dot matrix: signals A: transposons C: Random B: S1=S2 When are signals statistically significant?

2.1. Dot matrix: statistical significance: We need to define a random model against which to compare the signals: we define RV: X number of characters that coincide, then Prob(X=k)=comb(L,k) p k (1-p) L-k Given x..x.. y..... S1S1 S2S2 L=window length What is its expected value?

Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison.

Similar presentations

Presentation on theme: "Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison.

Similar presentations

Presentation on theme: "Bioinformatics PhD. Course Summary (approximate) 1. Biological introduction 2. Comparison of short sequences (<10.000 bps) 4 Sequence assembly 3 Comparison."— Presentation transcript:

Similar presentations

About project

Feedback