Presentation is loading. Please wait.

Presentation is loading. Please wait.

More for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.

Similar presentations


Presentation on theme: "More for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble."— Presentation transcript:

1 More for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble

2 For loop review for <element> in <object>:
don’t forget the colon! for <element> in <object>: <statement> . . . <last statement> block of code <element> is a newly created variable name. You can access the variable after the loop completes. <object> is a container of one or more <element>s. It must already exist. range() will make a list of integers “on the fly” for index in range(0,100): <statement>

3 Choosing good variable names
Pick names that are descriptive. Change a name if you decide there’s a better choice. (Use search and replace to be sure you don't miss any.) Very locally used names can be short and arbitrary The Python standard recommends using internal underscores rather than camelCase. list_of_lines = my_file.readlines() sequence = "GATCTCTATCT" my_DP_matrix = [[0,0,0],[0,0,0],[0,0,0]] sum = 0 for i in range(len(list_of_ints)): sum = sum + list_of_ints[i] (more code)

4 Comment liberally Any place a # sign appears, the rest of the line is a comment (ignored by program). Blank lines are also ignored – use them to visually group code. import sys query = sys.argv[1] my_file = open(sys.argv[2], "r") lines = my_file.readlines() # put all the lines from a file into a list # now process each file line to remove the \n character, then # search the line for query and record each result in a list of ints int_list = [] for line in lines: position = line.find(query) int_list.append(position) etc.

5 Examples of for loops for base in <sequence>:
<do something with each base> for line in <filehandle>: <do something with each sequence> for index in range(5,200): <do something with each index>

6 Looping on command line arguments
import sys for argument in sys.argv[1:]: print(argument)

7 Looping on lines in a file
my_file = open(“foo.txt”, “r”) for line in my_file: print(len(line))

8 Sample problem #1 Write a program sum-series.py that takes two non-negative integers A and B as input and returns the sum of all values from A to B inclusive. > python3 sum-series.py 9 10 19 > python3 sum-series.py 1 100 5050 > python3 sum-series.py -1 5 Error: Both arguments must be non-negative > python3 sum-series.py 5 4 Error: Second value must be greater than or equal to first value.

9 Solution #1 import sys if (len(sys.argv) != 3): print("Expected 2 arguments.") sys.exit() small_value = int(sys.argv[1]) large_value = int(sys.argv[2]) # Check for valid command line options. if (small_value > large_value): print("Error: Second value must be greater than or equal to first value.") elif ( (small_value < 0) or (large_value < 0) ): print("Error: Both arguments must be non-negative.") else: # Compute the requested sum. sum = 0 for i in range(small_value, large_value + 1): sum += i print(sum)

10 Sample problem #2 Write a program average.py that reads a text file containing numeric values and computes the average score on each line. Download the example "matrix.txt" from the course web page. > python average.py matrix.txt 2.17 4.05 5.25 5.59 etc. My solution has 10 lines.

11 Each line has 20 text fields separated by 19 tabs
This is the BLOSUM62 matrix, with amino acid names removed for simplicity. Each line has 20 text fields separated by 19 tabs

12 Solution #2 import sys my_file = open(sys.argv[1], "r") for line in my_file: sum = 0.0 num_values = 0 for value in line.split(): sum += float(value) num_values += 1 print(sum / num_values) my_file.close()

13 Sample problem #3 Write a program compute-variance.py that reads a text file containing numeric values and computes the variance of the scores on each line. > python compute-variance.py matrix.txt etc. Previous solution import sys my_file = open(sys.argv[1], "r") for line in my_file: sum = 0.0 num_values = 0 for value in line.split(): sum += float(value) num_values += 1 print(sum / num_values) my_file.close() where x is each value, m is the mean of values, and N is the number of values

14 import sys my_file = open(sys
import sys my_file = open(sys.argv[1], "r") for line in my_file: fields = line.strip().split() # strip removes new line etc. scoreList = [] # list of scores for this line scoreSum = 0 for field in fields: value = float(field) # convert to floating point scoreList.append(value) scoreSum += value # keep track of the sum mean = float(scoreSum) / len(scoreList) # compute mean using float math squareSum = 0 for score in scoreList: # compute the numerator of variance squareSum += (score - mean) * (score - mean) variance = float(squareSum) / (len(scoreList) - 1) # compute variance print("{:.2f}".format(variance)) my_file.close() Solution #3

15 Sample problem #4 > python3 get-seq-len.py sample.fa
Write a program get-seq-len.py that reads a file of fasta format sequences and prints the name and length of each sequence and their total length. > python3 get-seq-len.py sample.fa sp|P38787|PANE_YEAST 379 sp|P29468|PAP_YEAST 568 sp|P0CE91|PAU18_YEAST 120 Total length 1067 Here’s what fasta sequences look like: >foo gatactgactacagttt ggatatcg >bar agctcacggtatcttag agctcacaataccatcc ggatac >etc… ('>' followed by name, newline, sequence on any number of lines until next '>')

16 Solution #4 import sys filename = sys.argv[1]
my_file = open(filename, "r") cur_name = None # initialize required variables cur_len = 0 total_len = 0 for line in my_file: if (line.startswith(">")): # we reached a new fasta sequence if (cur_name != None): print(cur_name, cur_len) # write values for previous sequence total_len += cur_len # increment total_len cur_name = line.strip()[1:] # record the name of the new sequence cur_len = # reset current length else: # still in the current sequence, increment length cur_len = cur_len + len(line.strip()) my_file.close() print(cur_name, cur_len) # print the values for the last sequence total_len += cur_len # increment total length print("Total length", total_len)

17 Reading Chapter 7 of Think Python (1st edition) by Allen B. Downey.
More sample problems at on-programming/for-loop


Download ppt "More for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble."

Similar presentations


Ads by Google