Presentation is loading. Please wait.

Presentation is loading. Please wait.

Python – a HowTo Peter Wad Sackett and Henrike Zschach.

Similar presentations


Presentation on theme: "Python – a HowTo Peter Wad Sackett and Henrike Zschach."— Presentation transcript:

1 Python – a HowTo Peter Wad Sackett and Henrike Zschach

2 Introduction While evaluating the exercises, we have noticed some examples of sub-optimal programming. While most of those constructs are not technically wrong or will lead to errors, they are not good programming! Therefore we’ll show some examples of how people tend to go wrong and how it can be made right. Motto: Making code better Thank you, wikipedia

3 The use of flow control break and continue are so-called flow control tools that can be very useful, but they can also be used to be a bit lazy precisely because they give control over the flow of the program. Excessive use of flow control can be a sign that your loops are poorly designed and your logic flawed. When programming, you should strive for code that makes the most sense. How long does this loop have to run? What is the step size? What are the entry and exit conditions? Letting a loop run further even though there is no more need is not stronger logic than terminating! looking_for = input(“What are you looking for?\n”) found = False for item in myList: if item == looking_for: found = True break for loops can only terminate ”before time” by using break.

4 Flow control and while Using break in while loops is almost always weak logic, because while loops are already condition-dependent. looking_for = input(“What are you looking for?\n”) found = False i = 0 while i < len(myList): if myList[i] == looking_for: found = True break i += 1 Write instead while i < len(myList) and myList[i] != looking_for : The logic is to stop stop looking (incrementing i) when you have seen the entire list or found the item.

5 Flow of programming You should separate actions that belong inside a loop (f.x. data extraction) from actions that belong outside the loop because they only need to run once (f.x. data processing, setting up constant variables). Do not write for line in infile: if not line.startswith(‘>’): seq += line.rstrip() seq_len += len(seq) transTable = str.maketrans(’ATCG’,’TAGC’) complementdna = seq.translate(transTable) Write instead seq_len = len(seq)

6 Conditional statements
This one is a bit hard to exemplify but basically think about whether the conditions you pose are sensible. # You would not be in the loop if the word was STOP # No need to check an extra time while word != "STOP”: if word != "STOP": # If line is “this” then it is not “that” if line == that: if line == this: Sometimes the same condition signifies several actions to be taken. for line in infile: if line.startswith(“SQ”): # Code here to extract length # Ups, I also need to extract the whole sequence seq_flag = True Join the actions together under one if.

7 List building Task: I have some info in this line that I want to save in a list -> Many build-in functions actually return lists, e.g. split(). # don’t data_list = list() for item in line.split(): data_list.append(item) # instead data_list = line.split() Also, our favorite sys.argv is already a list. Take advantage of slicing. # don't input_files = list() for index in range(1, len(sys.argv)): input_files.append(sys.argv[index]) # instead input_files = sys.argv[1:] .

8 Regular expressions - warning
Some people, when confronted with a problem, think: "I know, I'll use regular expressions.” Now they have two problems. - Jamie Zawinski While regex are a very useful and powerful concept, for many problems there exist simpler solutions. Regex in python are slow compared to ’simple’ functions because they have huge overhead. They are also often harder to follow and hard to get right without missing rare cases.

9 Regular expressions – alternatives
Examples of tasks for which functions from the base library are preferable: - Splitting strings on elementary separators (space/tab/semicolon/comma) - Extracting information from such a line split_line = line.split() split_line = line.split(’;’) ID = line.split()[1] - Checking for identifiers if line.startswith(‘ID’): if line.endwith(‘)’):

10 Regular expressions – proper use
Because the results of regex commands are typically small it is more sensible to store the result for future tasks than to re-execute the command: # Don’t if re.search(r'^ID\s+(\S+)', line) is not None: sp_id = re.search(r'^ID\s+(\S+)', line).group(1) # Instead REresult = re.search(r'^ID\s+(\S+)', line) if REresult is not None: sp_id = REresult.group(1)

11 Variable names This should be obvious. You (and we) will have a much easier time if you name your variables in a way that the name is descriptive of what the variable does or stores is distinct from other variable names is not a reserved keyword Please don’t: def_regex = re.compile("^ID\s+(\w+)”) dee_regex = re.compile("^AC\s+(\w+)”) des_regex = re.compile("^SQ\s+(\w+)”) Or: for char in range(len(line)): base = line[char] # char is in no way a character!!!

12 Indexing Pay close attention when indexing strings. Example:
sequence: A T G T T G A G A T A G human: machine: exon: CDS join(1..12) Consider: Where is the first codon position-wise? What will happen with this line: print(seq[0:2]) What is seq[1:12] ? Which index do you actually need to extract the full exon?

13 Output verification Unfortunately we see programs that almost work too often. The reason is the output has not been checked thoroughly. When extracting information, do a one-on-one check for at least one example. ‘Looks ok’ is not sufficient. It still looks allright if you are not extracting the last or the first base of every sequence because your indexing is wrong! What does your program do if you give different legal input (f.x. a different genbank file)? What does it do if you give illegal input (f.x. your grade sheet from last semester)? You should also get used to making ‘sanity checks’, i.e. is my result sensible, is it what I expected? Integrate your knowledge of what you know or assume to be true, i.e. exons start with ATG and end with a stop codon.

14 Happy Programming 


Download ppt "Python – a HowTo Peter Wad Sackett and Henrike Zschach."

Similar presentations


Ads by Google