STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Python Tutorial II Monty Python, Game of Life and Sequence Alignment Feb 1, 2011 Daniel Fernandez and Alejandro Quiroz 1
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology 1 st ACT (1 hour) Random Module Monty Hall Game of Life Sequence Alignment INTERMISSION Chillout sessions (10 min) 2 nd ACT (1 hour 50 min) Homework help Q5, Q6, Q7 and Q8. 2
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Important Module: random MethodResult randint(x,y)Integer Random numbers between integer x and y rand()Random() distname(a,b)Uniform, Triangular, Gaussian, Lognormal, Negative, Exponential, Gamma, Beta, Pareto, Weibull choice(list)Choose an element from a list at random sample(list, k)Choose k elements from a list at random – without replacement! shuffle(list)Shuffles the element in list seed()Change the seed to generate random numbers
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Example. Simulate Flip of a Coin. import random coin = [‘heads’, ‘tails’] num_heads = 0 num_tails = 0 for i in range(0,1000): flip = random.choice(coin) if flip == ‘heads’: num_heads += 1 else: num_tails += 1 print ‘number of heads: ‘, num_heads print ‘number of heads: ‘, num_tails
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Monty Hall Problem Suppose you’re on a game show, and you’re given a choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say number 3, and the host, who knows what’s behind the doors, opens another door, say number 2, which has a goat. He says to you, ‘Do you want to pick door number 1?’ Is it to your advantage to switch your choice of doors? 5
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Monty Hall Problem Run montyhall.py to see the results. Read montyhall.py and try to understand what did the program do? Visual SimulationVisual Simulation. Python sourcePython source. 6 Solution: montyhall.py Usage: python montyhall.py
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Exercise 1. Read a fasta file. Write a python module for reading fasta files – add it to your utils.py module – if feeling lazy read q7 code. Solution: ex1_fasta.py Usage: from ex1_fasta import * 7
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Exercise 2. Complimentary DNA sequence and palindromic sequence Write a program that takes as an input a DNA sequence 5’ to 3’ and returns the same sequence 3’ to 5’ end (i.e., its reverse complement). Also make the program to output if the sequence is a palindromic sequence or not. HINT: _biology) Solution: ex2_complimentarydna.py Usage: python ex2_complimentarydna.py 8
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology9 Life is a "game" or cellular automaton - an evolving computational state system - developed by a Cambridge mathematician named John Conway. The idea is simple: start with a board of dimensions (x,y). Populate the board with an initial pattern of occupied and empty cells. In every turn, the rules are: (i) if an empty cell has three neighbors, fill it next turn; (ii) if an occupied cell has zero or one neighbor, it dies of loneliness; and (iii) if an occupied cell has four or more neighbors, it dies of overcrowding. You can get really strange, unpredictable behavior out of very simple initial patterns, and many mathematicians have spent a lot of time thinking about how this works.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Game of Life 10 Life is a "game" or cellular automaton developed by Conway. Instructions: Start with a board of dimensions (x,y). Populate the board with an initial pattern of occupied and empty cells. In every turn, the rules are: (i) if an empty cell has three neighbors, fill it next turn; (ii) if an occupied cell has zero or one neighbor, it dies of loneliness; and (iii) if an occupied cell has four or more neighbors, it dies of overcrowding. Life is a "game" or cellular automaton - an evolving computational state system - developed by a Cambridge mathematician named John Conway. The idea is simple: start with a board of dimensions (x,y). Populate the board with an initial pattern of occupied and empty cells. In every turn, the rules are: (i) if an empty cell has three neighbors, fill it next turn; (ii) if an occupied cell has zero or one neighbor, it dies of loneliness; and (iii) if an occupied cell has four or more neighbors, it dies of overcrowding. You can get really strange, unpredictable behavior out of very simple initial patterns, and many mathematicians have spent a lot of time thinking about how this works.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Game of Life Run the game of life (in the terminal) –First Install Jython Standard package into –Then add to your.bash_profile # For Jython export JYTHON_HOME=/Users/dfernan/bin/jython2.5.2/ export PATH=$JYTHON_HOME:$PATH export CLASSPATH=$JYTHON_HOME/jython.jar:$CLASSPATH –jython LifeGame.py 11 Solution: LifeGame.py (GridMutator.py) Usage: jython LifeGame.py
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology HH Question 5. Melting Temp 12
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology HH Question 5. Melting Temp 13 Usage: python q5.py q5_input.txt q5.output 20 55
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology HH Question 6. Longest Sequence Any ideas for retrieving the longest exact matching sequence between two sequences? How to read a fasta file? Write a function that takes a file name as an input and outputs a list containing each sequence in the fasta file. –If lazy, just look at homework q7. Solution: fasta.py, Q8_input.fasta Usage: Use it as a python module containing the fasta class and the read_fasta function 14
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Sequence Alignment 15 Life is a "game" or cellular automaton - an evolving computational state system - developed by a Cambridge mathematician named John Conway. The idea is simple: start with a board of dimensions (x,y). Populate the board with an initial pattern of occupied and empty cells. In every turn, the rules are: (i) if an empty cell has three neighbors, fill it next turn; (ii) if an occupied cell has zero or one neighbor, it dies of loneliness; and (iii) if an occupied cell has four or more neighbors, it dies of overcrowding. You can get really strange, unpredictable behavior out of very simple initial patterns, and many mathematicians have spent a lot of time thinking about how this works. How many operations? _____
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Sequence Alignment 16 HOMOLOGOUSHOMOLOGOUS Paralogs Orthologous
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Sequence Alignment 17 Life is a "game" or cellular automaton - an evolving computational state system - developed by a Cambridge mathematician named John Conway. The idea is simple: start with a board of dimensions (x,y). Populate the board with an initial pattern of occupied and empty cells. In every turn, the rules are: (i) if an empty cell has three neighbors, fill it next turn; (ii) if an occupied cell has zero or one neighbor, it dies of loneliness; and (iii) if an occupied cell has four or more neighbors, it dies of overcrowding. You can get really strange, unpredictable behavior out of very simple initial patterns, and many mathematicians have spent a lot of time thinking about how this works.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology18 Life is a "game" or cellular automaton - an evolving computational state system - developed by a Cambridge mathematician named John Conway. The idea is simple: start with a board of dimensions (x,y). Populate the board with an initial pattern of occupied and empty cells. In every turn, the rules are: (i) if an empty cell has three neighbors, fill it next turn; (ii) if an occupied cell has zero or one neighbor, it dies of loneliness; and (iii) if an occupied cell has four or more neighbors, it dies of overcrowding. You can get really strange, unpredictable behavior out of very simple initial patterns, and many mathematicians have spent a lot of time thinking about how this works. Align the following sequences and explain it. Bellow are the sequences and the match/mismatch (sub)BLOSUM matrix (HH1 and HH7) Sequence Alignment
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Sequence Alignment 19 Dynamic Programming: “The art of dividing a problem into simpler (sub)problems and then apply the sub-solutions recursively in order to obtain the final solution” Life is a "game" or cellular automaton - an evolving computational state system - developed by a Cambridge mathematician named John Conway. The idea is simple: start with a board of dimensions (x,y). Populate the board with an initial pattern of occupied and empty cells. In every turn, the rules are: (i) if an empty cell has three neighbors, fill it next turn; (ii) if an occupied cell has zero or one neighbor, it dies of loneliness; and (iii) if an occupied cell has four or more neighbors, it dies of overcrowding. You can get really strange, unpredictable behavior out of very simple initial patterns, and many mathematicians have spent a lot of time thinking about how this works. i j New best alignment = Best previous alignment + align (i,j) How many operations? _____ Memory cost? _______
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology Sequence Alignment 20 Life is a "game" or cellular automaton - an evolving computational state system - developed by a Cambridge mathematician named John Conway. The idea is simple: start with a board of dimensions (x,y). Populate the board with an initial pattern of occupied and empty cells. In every turn, the rules are: (i) if an empty cell has three neighbors, fill it next turn; (ii) if an occupied cell has zero or one neighbor, it dies of loneliness; and (iii) if an occupied cell has four or more neighbors, it dies of overcrowding. You can get really strange, unpredictable behavior out of very simple initial patterns, and many mathematicians have spent a lot of time thinking about how this works. Strategy: Align the two sequences. Read template code and think how to fill it in.