Programming Challenge Problem Chris Haack Joe Como
Programming Challenge Problem Logistics You can work in groups of 2-3! Code and brief write up due Tuesday May 21st at 2:30pm Must be done in python3! no extensions are going to be provided You will then make a presentation on your algorithm and ideas! We will run an in class competition
Language Requirement Your code must be written in Python 3. Why Python? We want you to focus on your algorithms and not the language optimizations. Easy to understand what is going on with your code!
Challenge Introduction Implement a function that retrieves the minimum number of steps it takes to generate a string using only tandem duplications from one of the following starts strings [0, 1, 01, 10, 101, 010] as listed in problem 1a on homework one
Refresher Problem 1a 00010101
Tandem Duplication A tandem duplication of length k is defined as a duplication of a substring of length k next to it’s original position. For example 0011101 with a duplication of length 3 at position 2 would become 0011011101 **note python is 0 indexed, so something at position 1 in python is at position 2 in a string!
Duplication Distance The duplication distance is defined as the minimum number of steps required to generate a given string from a seed by a tandem duplication process. 001001 can be generated from 01 in two ways 01 -> 0101 -> 00101 -> 001001 ( 3 steps) 01 -> 001 -> 001001 (2 steps) The duplication distance for 001001 is then 2 since this is the minimal number of tandem duplications required to generate 001001 from seed 01.
Duplication Distance Example What is the duplication distance of string 10111011 101 -> 101101 101101 -> 1011101 1011101 -> 10111011
Duplication Distance Example Part 2 How can we get to 10111011 with just tandem duplications 101 -> 1011 1011 -> 10111011 2 steps
Reverse? 10111011 -> 1011 -> 101
Solution 0101001101 010100101 01010101 0101 01!
Problems? Once you have completed this you will realize that for large strings it takes a lot of time for your program to grow as the search space becomes much larger.
Last Years Winners!
Three Points!!!
Transition to simple solution and explain code and files here Transition to simple solution and explain code and files here! This will be done interactively for students at the lecture!
100100 Bio. Info. Channel
100100 SEED: 10 Bio. Info. Channel
100100 SEED: 10 Generate possible duplications: [110, 100, 1010] Repeat? Bio. Info. Channel
100100 Now Start from ending string. Bio. Info. Channel
100100 Possible deduplications: [10010, 10100, 100] Repeat until you reach a seed. What do you have to keep count of? Bio. Info. Channel
100100 Possible deduplications: [10010, 10100, 100] Repeat until you reach a seed. What do you have to keep count of? Second generation: [1010, 100, 1010 10] Bio. Info. Channel
Optimization The challenge is to think of ways to optimize this generation process and argue why your solution is good. Please in your solution describe your algorithm and approach. Discuss pros and cons and what things you like/dislike about your algorithm.
Tips! Work With Friends! Start Early! Ask Your TA’s for help and advice
Reminders Write Code in solver.py and have your main solution be in the solve function! Feel free to use any other helper functions! There is useful template code in helpers.py and a walk through on an initial algorithm in the intro_solver ipython notebook. If you have questions please reach out to jcomo@caltech.edu - Joe sidjain@caltech.edu - Sidd