Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.

Similar presentations


Presentation on theme: "Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015."— Presentation transcript:

1 Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015

2 Trinity College Dublin, The University of Dublin Overview Recap Python2 vs Python3 Working with strings Reading a file Weekly task http://bioinf.gen.tcd.ie/GE3M25/

3 Trinity College Dublin, The University of Dublin Python2 vs Python3 division treated as integer no parentheses required for print input expects an expression raw_input takes plain text

4 Trinity College Dublin, The University of Dublin Python2 vs Python3 division treated as float parentheses required for print input takes plain text

5 Trinity College Dublin, The University of Dublin Strings # assign a sequence string to a variable: seq = 'CTTTCACGGCTCTCTCTACAGGAATCGTATTCGGTCTCTTTTAAN' Rules for variable names:  Can contain both letters and numbers  Must begin with a letter  Can contain the underscore character  Must not clash with reserved keywords

6 Trinity College Dublin, The University of Dublin Strings  Strings are indexed, starting from 0  Indices are specified in square brackets, e.g. seq[0] seq[1] seq[-1]  Slices are specified via ':' and start/end are optional: seq[0:9] same as seq[:9] seq[9:1000] same as seq[9:]  A 3 rd parameter specifies an interval: seq[::2] extracts every second letter

7 Trinity College Dublin, The University of Dublin Exercises:  Define variable 'seq' as a 20 bp sequence string  Try to extract sub-strings using indices and slices 1.The first character 2.The last character 3.The 4 th till 9 th character 4.The last 10 characters 5.Every 3 rd character 6.What does [::-1] do?

8 Trinity College Dublin, The University of Dublin Functions that work on strings  len()  list()  max()  min()  print()  sorted() Give them a try!

9 Trinity College Dublin, The University of Dublin String Methods  Many more methods associated with strings  Try dir(seq) for an overview  These are applied as follows: variable.function(arguments), e.g. seq.find('A')  Documentation available: help(seq.isalpha)

10 Trinity College Dublin, The University of Dublin String Methods  Applying a method does not change the string!  Assign output to a new or the same variable to make changes permanent: rna = dna.replace('t', 'u') dna = dna.lower()

11 Trinity College Dublin, The University of Dublin Exercises 1. Find out what the following functions do: startswith, rindex, join, replace, upper, rjust, strip 2. Change the sequence to lower case 3. Change all 't' to 'u' and save as 'rna' 4. Add an 'n' to the end of your sequence 4. Find the first occurrence of 'n' 5. Count how many 'g' are in the sequence

12 Trinity College Dublin, The University of Dublin The for loop Repeating an action a defined number of times:

13 Trinity College Dublin, The University of Dublin The for loop Loops can be nested:

14 Trinity College Dublin, The University of Dublin Exercises - Loop through 'AGTC' to find occurrences of these letters in your (upper-case) sequence variable - Print out letter and number of occurrences

15 Trinity College Dublin, The University of Dublin Formatting Output % operator: instead of:

16 Trinity College Dublin, The University of Dublin % operator: Formatting Output Space holders with formatting variables

17 Trinity College Dublin, The University of Dublin Formatting Output Space holders: i integer f,e,E normal, 'E' notation of floating point numbers s strings % literal '%'

18 Trinity College Dublin, The University of Dublin Formatting Output Modifications: %x.yf overall width and digits after decimal point %5.2f  left padded with space to width of 5, precision 2 %0xi zero-padding %05i  numbers < 10000 are left-padded with 0 %-xs left align %-5s  strings will be right-padded with space to length 5

19 Trinity College Dublin, The University of Dublin Exercise Calculate the GC content of a DNA string stored in a variable 1. get count of 'G' 2. get count of 'C' 3. get length of string 4. calculate (G+C)/len*100 5. print result to screen (rounded to two digits after the decimal point)

20 Trinity College Dublin, The University of Dublin Reading from files Built-in function 'open': open( name ) open( name, mode) name = path to file mode = 'r', 'w', 'a' (read, write, append) e.g.: open('/Users/kahokamp/Downloads/test.fa', 'r') Error if file does not exist!

21 Trinity College Dublin, The University of Dublin Reading from files Open returns a file handle, capture in a variable: f = open('/Users/kahokamp/Downloads/test.fa', 'r') Methods for f: f.read() f.readline() f.close() read in whole content read in one line close file handle

22 Trinity College Dublin, The University of Dublin Reading from files file = '/Users/kahokamp/Downloads/test.fa' f = open(file, 'r') header = f.readline() print("Header: %s" +% header) for line in f: print(line) f.close() Line contains newline character at the end!

23 Trinity College Dublin, The University of Dublin Reading from files file = '/Users/kahokamp/Downloads/test.fa' f = open(file, 'r') line = f.readline() while line : print(line) line = f.readline() f.close() using the while loop

24 Trinity College Dublin, The University of Dublin Exercise Write a script that reads a FASTA sequence and reports the GC content. Start with Pseudocode and work in small steps!

25 Trinity College Dublin, The University of Dublin Learnt so far: String functions and methods String formatting For loop Opening a file for reading Reading line by line from a file handle

26 Trinity College Dublin, The University of Dublin Weekly task Read a Fastq file Shorten the DNA and quality strings to a width of 30 characters Print output to screen or to a new file Submit Python script by e-mail to kahokamp@tcd.ie Latest by next Tuesday (24 th Nov) 10 am Include loads of comments Even pseudocode will give you points!

27 Trinity College Dublin, The University of Dublin Don't forget to log out!


Download ppt "Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015."

Similar presentations


Ads by Google