How to python source: Web_Dev fan on pinterest
Intro While evaluating the exercises, we have noticed some examples of sub-optimal programming. While most of those constructs are not technically wrong or will lead to errors they are not good programming! Therefore today we’ll show some examples of how to make better and easier to read code.
The use of flow control break and continue are so-called flow control tools that can be very useful, but they can also be used to be a bit lazy precisely because they give control over the flow of the program. Excessive use of flow control can be a sign that your loops are poorly designed. When programming, you should strive for code that makes the most sense. How long does this loop have to run? What is the step size? What are the exit conditions? Letting a loop run further even though there is no more need is not stronger logic than terminating! break in while loops is almost always weak logic, because while loops are already condition-dependent! looking_for = input(“What are you looking for?\n”) found = False for item in myList: if item == looking_for: found = True break
Flow of programming You should separate actions that belong inside a loop (f.x. data extraction) from actions that belong outside the loop because they only need to run once (f.x. data processing, setting up constant variables). for line in infile: if ! line.startswith(‘>’): seq += line.rstrip() # don’t: seq_len += len(seq) transTable = str.maketrans(’ATCG’,’TAGC’) #instead: seq_len = len(seq) complementdna = seq.translate(translationTable)
List building Task: I have some info in this line that I want to save in a list -> Many build-in functions actually return lists, e.g. split(). #don’t data_list = list() for item in line.split(): data_list.append(item) #instead: data_list = line.split() Also, our favorite sys.argv is already a list. Take advantage of slicing. #don't input_files = list() for index in range(1, len(sys.argv)): input_files.append(sys.argv[index]) #instead input_files = sys.argv[1:] .
List building But I want to append to an existing list. -> Ok, so use extend. data_list.extend(line.split()) What is the difference between extend and append? (Pro tip: I also call this ’How to mess up your data structure in one line’ .)
Conditional statements This one is a bit hard to exemplify but basically think about whether the conditions you pose are sensible. while word != "STOP”: if word != "STOP": if line == that: if line == this: Sometimes the same condition signifies several actions to be taken. for line in infile: if line.startswith(“SQ”): extract length #oh shit, I also need to extract the whole sequence seq_flag = True
About regex “Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.” - Jamie Zawinski While regex are a very useful and powerful concept, for many problems there exist simpler solutions. Regex in python are slow compared to ’simple’ functions because they have huge overhead. They are also often harder to follow and hard to get right without missing rare cases.
About regex 2 Examples of tasks for which functions from the base library are preferable: - Splitting strings on elementary separators (space/tab/semicolon/comma) - Extracting information from such a line split_line = line.split() split_line = line.split(‘;’) ID = line.split()[1] - Checking for identifiers if line.startswith(‘ID’): if line.endwith(‘)’):
About regex 3 Because the results of regex commands are typically small it is more sensible to store the result for future tasks than to re-execute the command: #don’t: if re.search(r'^ID\s+(\S+)', line) is not None: sp_id = re.search(r'^ID\s+(\S+)', line).group(1) #instead: REresult = re.search(r'^ID\s+(\S+)', line) if REresult is not None: sp_id = REresult.group(1)
Variable Names This should be obvious. You (and we) will have a much easier time if you name your variables in a way that the name is descriptive of what the variable does or stores is distinct from other variable names is not a reserved keyword Please don’t: def_regex = re.compile("^ID\s+(\w+)”) dee_regex = re.compile("^AC\s+(\w+)”) des_regex = re.compile("^SQ\s+(\w+)”) Or: for char in range(len(line)): base = line[char] #char is in no way a character!!!
Indexing Pay close attention when indexing strings! Example: sequence: ATG TTG AGA T A G human: 123 456 789 10 11 12 machine: 012 345 678 9 10 11 exon: CDS join(1..12) Consider: Where is the first codon position-wise? What will happen with this line: print(seq[0:2])? What is seq[1:12] ? Which index do you actually need to extract the full exon?
Output verification Something we unfortunately also see too often is programs that almost work because the output has not been checked thoroughly. When extracting information, do a one-on-one check for at least one example. ‘Looks ok’ is not good enough if you are not printing the last or the first base of every sequence because your indexing is wrong! What does your program do if you give different legal input (f.x. a different genbank file)? What does it do if you give illegal input (f.x. your grade sheet from last semester)? You should also get used to making ‘sanity checks’, i.e. is my result sensible, is it what I expected? Integrate your knowledge of what you know or assume to be true, i.e. exons start with ATG and end with a stop codon.
Happy Programming source: snakeypython.com