Python Syntax tips Henrike Zschach
2DTU Systems Biology, Technical University of Denmark Why are we talking about syntax ’Good’ coding Good syntax should be concise while maintaining a maximum level of straightforwardness and readability. While evaluating the exercises I have experienced that many students use syntax that is unnecessarily complicated (in fact so often that I learned how to spell ’unnecessarily’ without looking it up) so I’m going to show some examples of how to make simpler and easier to read code.
3DTU Systems Biology, Technical University of Denmark Initialization Intializing a variable to an emtpy instance and then initializing it again to the desired value. OBS: I’m not talking about initializing to empty and then filling it with a loop or function. #don’t my_dict = dict() my_dict = {'ATT':'I', 'ATC':'I',..., 'TGA':'STOP’} #instead: my_dict = {'ATT':'I', 'ATC':'I',..., 'TGA':'STOP’} So what if I want an empty dict? empty_dict = {} The same applies to lists, strings, ect.
4DTU Systems Biology, Technical University of Denmark List building Task: I have some info in this line that I want to save in a list -> Many build-in functions actually return lists, e.g. split() #don’t data_list = list() for item in line.split(): data_list.append(item) #instead: data_list = line.split() Also, our favorite sys.argv is already a list #don't input_files = list() for index in range(1, len(sys.argv)): input_files.append(sys.argv[index]) #instead input_files = sys.argv[1:] Take advantage of splicing.
5DTU Systems Biology, Technical University of Denmark List building But I want to append to an existing list. -> Ok, so use extend. data_list.extend(line.split()) What is the difference between extend and append? (Pro tip: I also call this ’How to mess up your data structure in one line’.)
6DTU Systems Biology, Technical University of Denmark Abuse of iterators There is no need to create an iterator if you don’t use it. #don't split_line = line.split() for i in range(0,len(split_line)): sum += float(split_line[i]) #instead: for number in line.split(): sum += float(number) Also, you should name your iterable something sensible. This is will enhance readability of your code.
7DTU Systems Biology, Technical University of Denmark ‘Dumping’ complex data structures into print I’ve seen this many times: print(my_list) print(my_dict) Please don't. It's not just a question of aesthetics (thought it is hard to read), your output will be useless to any follow-up analysis or task. There are build-in solutions for printing of lists: print('\n'.join(my_list))
8DTU Systems Biology, Technical University of Denmark Overuse of regex “Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.” - Jamie Zawinski Regex are a very useful and powerful concept, but for many problems there exist simpler solutions. Regex are slow compared to ’simple’ functions because they have huge overhead. They are also often harder to follow and hard to get right without missing rare cases. Task: I want to split this string on space/ tab/ semicolon. -> Regex is overkill for splitting on elementary separators. #don’t: pattern = re.compile("\s") split_line = pattern.split(line) #instead: split_line = line.split()
9DTU Systems Biology, Technical University of Denmark Overuse of Regex Task: I want to extract information from a line with clear separators (e.g. Tabs, semicolons, ect). Such files are e.g. Output files from other program (BLAST) or comma separated files. -> use str.split() Hit_ID = line.split()[1] Task: I want to check if the line I’m reading contains the information I’m looking for (and possibly where it is). -> use ’in’, str.startswith(), str.endswith() or str.find() if ’ CDS ’ in line: position = line.find(”join(”)
10DTU Systems Biology, Technical University of Denmark The True-and-false-ness of variables Task: I want to check if my flag has been set. Which comparison should I use? if flag:#preferred syntax if flag == True:#depricated if flag is True:#in special cases In Python, every non-empty value is treated as true in context of condition checking. In the context of Boolean operations, and also when expressions are used by control flow statements, the following values are interpreted as false: False, None, numeric zero of all types, and empty strings and containers (including strings, tuples, lists, dictionaries, sets and frozensets). All other values are interpreted as true. ( operations) The ’is’ comparative operator is useful, when you make actual distinction between True value and every other that could be treated as true. The same applies to if cond is False. This expression is true only if cond has actual value of False - not empty list, empty tuple, empty set, zero etc.