Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Revolution: COMPSCI 004G 5.1 Searching l Why does SSAHA compute locations of all n-mers?  agttc occurs at (1,3) (1,18) (2,6) (3,13), …  What is.

Similar presentations


Presentation on theme: "Genome Revolution: COMPSCI 004G 5.1 Searching l Why does SSAHA compute locations of all n-mers?  agttc occurs at (1,3) (1,18) (2,6) (3,13), …  What is."— Presentation transcript:

1 Genome Revolution: COMPSCI 004G 5.1 Searching l Why does SSAHA compute locations of all n-mers?  agttc occurs at (1,3) (1,18) (2,6) (3,13), …  What is cost of precomputation?  Why are these ordered-pairs sometimes stored on disk?  Time-space tradeoff l How do we search a string once for an nmer?  Java string method indexOf There are two versions!

2 Genome Revolution: COMPSCI 004G 5.2 What’s good, what’s bad? EcoRI ArrayList list = new ArrayList (); String eco = “GAATTC”; for(String s : strands){ for(int k=0; k <= s.length() - eco.length(); k++){ // start at location k in s and location j in eco boolean match = true; for(int j=0; j < eco.length(); j++){ if (eco.charAt(j) != s.charAt(k+j)){ match = false; } if (match){ list.add(s); }

3 Genome Revolution: COMPSCI 004G 5.3 Concepts from EcoRI What is boolean match variable used for?  flag or state variable, value tells us something  Once match is false, should we keep searching?  Correctness and efficiency l Why does the code fail?  Test case causes failure, how can we fix this?  Can we use another state variable? First-time?  What else can the language provide for us?

4 Genome Revolution: COMPSCI 004G 5.4 Language constructs/idioms l You need to stop a loop early from executing  Found what we’re looking for, calculation done  return early from method, break from loop l You need to search in one string for another  Why are two loops needed?  Stopping point for one loop? The other?  Common idiom, language provides solution!

5 Genome Revolution: COMPSCI 004G 5.5 Two versions of indexOf One parameter: s.indexOf(“GAATTC”)  Returns first location at which GAATTC found  Returns -1 if not found, why is this ok? Two parameter: s.indexOf(“GAATTC”,16)  First location on/after index 16  Returns -1 if not found, why is this ok?  Loop to find all occurrences, what’s first value of position/index second parameter?  When do we stop loop?

6 Genome Revolution: COMPSCI 004G 5.6 Eric Lander l Leader of HGP l Westinghouse winner at 17 l MacArthur Fellow l NAS member l City of Medicine Award! l Math major at Princeton  PhD Math as Rhodes Scholar  Managerial Economics Prof at HBS 1981-1990 l Erdos number 2, Bacon number?


Download ppt "Genome Revolution: COMPSCI 004G 5.1 Searching l Why does SSAHA compute locations of all n-mers?  agttc occurs at (1,3) (1,18) (2,6) (3,13), …  What is."

Similar presentations


Ads by Google