Download presentation
Published byRodger Tate Modified over 9 years ago
1
GE3M25: Computer Programming for Biologists Python, Class 5
TCD, 08/12/2015 Karsten Hokamp, PhD Genetics
2
Overview http://bioinf.gen.tcd.ie/GE3M25/ Recap Modules Dictionaries
Working from the command line Weekly task
3
Recap Collections: list(), tuple(), set()
Special methods: 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort' Special functions: all(), any(), len(), max(), min(), sorted(), sum(), zip() Find out more through help() function
4
Exercise: Create a variable 'seq' containing a DNA string
Create a list 'dna1' from the DNA string Create a tuple 'dna2' from the DNA string Create a set 'dna3' from the DNA string Compare structure and content of the collections Try to access the first element of each collection Try to modify the first element of each collection Add an element to each of your collection Try to remove the last element from each of your collection
5
Weekly task: Read in a DNA sequence in FASTA format from a file
Prompt the user for a short motif Split the sequence at the sites that match Print the fragment lengths in sorted order Do not report fragments of zero length
6
Python modules Software packages that add functionality
Part of distribution (random, math, string, ...) External packages: wiki.python.org/moin/UsefulModules
7
Python modules Load module: import module_name Use module:
module_name.variable module_name.method() Documentation: help(module_name)
8
Python modules Examples: import random random.random() 0.231185
random.randint(1,10) 3 random.choice('ACGT') 'G'
9
Python modules Exercises: Create a random number
Create a random integer between 50 and 100 Get a random letter from the word 'mississippi' Check out the help for module 'string' Print all small letters, one per line Sort the ascii_letters string, which letter is first? Check out the help for module 'math' Calculate the log2 value of 0.5 Print the value of pi
10
Python modules Exercise:
Revisit the script 'gene_list.py' from last lesson Change it to read a file name from the command line (instead of hard-coding it into the script) Tip: Use module 'string' , object 'argv' Run your script from the command line: python3 gene_list.py ~/Downloads/gene_list.txt
11
Exercise: Read in a file with probe ids, gene ids, fold-change and p-values, separated by tab Print out only gene ids and fold-change Print out gene ids and fold-change as log2 values 3. Print all the lines with absolute fold-change > 2 and p-value <= 0.05 Print values to a file instead of the screen
12
DNA Protein translation
Process a DNA string three nucleotides at a time Translate that codon Print the amino acid
13
DNA Protein translation
Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' codon = dna[0:3] print(codon) 3 6 15 …
14
DNA Protein translation
Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' i = 0 codon = dna[i:i+3] print(codon) 3 6 15 …
15
DNA Protein translation
Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' for i in range(0, 16, 3) : codon = dna[i:i+3] print(codon) 3 6 15 …
16
DNA Protein translation
Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' for i in range(0, len(dna)-2, 3) : codon = dna[i:i+3] print(codon) 3 6 15 …
17
DNA Protein translation
2. Translate the codon dna = 'ATGCCAGGTTTACACGGT' for i in range(0, len(dna)-2, 3) : codon = dna[i:i+3] if codon == 'AAA' : print('K') elif codon == 'AAC' : print('N') …
18
DNA Protein translation
2. Translate the codon dna = 'ATGCCAGGTTTACACGGT' for i in range(0, len(dna)-2, 3) : codon = dna[i:i+3] if codon == 'AAA' : print('K') elif codon == 'AAC' : print('N') … We need a look-up table!
19
Dictionary Collection of key-value pairs Symbols: {} and []
Initialisation: table = {} table = dict() Storing values: table = { 'AAA' : 'K', 'AAG' : 'K' } table['AAC'] = 'N' key value
20
Dictionary Accessing keys and values aa = table['AAC']
aa = table[codon] codons = table.keys() amino_acids = set(table.values()) for codon in table.keys() : print("translate %s into %s" %
21
Dictionary Exercise: Generate one million random integers from 1 to 10
Use a dictionary (occ) to count how often each integer occurs Calculate and print the frequency of each integer Tips: check if a key exists: if key in occ.keys() increase value to an existing key: occ[key] += 1
22
Dictionary Look-up table for codons:
23
Dictionary Generate table on the fly:
24
Dictionary Exercise: Read a DNA sequence from a file and translate it into a protein sequence Make it work for upper and lower case
25
Weekly task 5 Option a: 100 HOXA protein sequences have been assembled from UniProt First align the sequences with the tool of your choice from the EBI website and then load the tree file into TreeDraw. Use the controls to generate a tree that is best suited to indicate the clustering of sequences and relationships between genes from different species. Submit an image of your tree together with a short description of how you generated the alignment and the tree and a discussion of the presented relationships. Possible points of discussion: Can you think of a suitable sequence to use for rooting the tree? Can you detect any inconsistencies/surprises in the tree in respect to known/expected evolutionary relation of species?
26
Weekly task 5 Option b: Write a Python script that does the following:
Read in a DNA sequence from a file in Fasta format Translate the DNA into a protein sequence and print to the screen Repeat mutating one nucleotide at a time and stop if a) the start codon is changed b) a stop codon is introduced before the end of the sequence Report for each mutation where it occurs and what substitution is made
27
Weekly task 5 To be submitted by e-mail to kahokamp@tcd.ie
before Thursday, 17th December, 5 pm
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.