While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Slides:



Advertisements
Similar presentations
Chapter 4 - Control Statements
Advertisements

Dictionaries (aka hash tables or hash maps) Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
ThinkPython Ch. 10 CS104 Students o CS104 n Prof. Norman.
CATHERINE AND ANNIE Python: Part 3. Intro to Loops Do you remember in Alice when you could use a loop to make a character perform an action multiple times?
Chapter 7 - Iteration. Chapter Goals Program repitiation statements – or loops – with the for, while, and do-while statements Program repitiation statements.
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
While loops.
IT 325 OPERATING SYSTEM C programming language. Why use C instead of Java Intermediate-level language:  Low-level features like bit operations  High-level.
Loops continued and coding efficiently Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Functions Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
08/2012Tanya Mishra1 EASYC for VEX Cortex Llano Estacado RoboRaiders FRC Team 1817.
CS 100: Roadmap to Computing Fall 2014 Lecture 0.
Introduction to Computing Science and Programming I
Loops (Part 1) Computer Science Erwin High School Fall 2014.
Loops For While Do While. Loops Used to repeat something Loop statement creates and controls the loop.
File input and output if-then-else Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Program Design and Development
1 Lecture 7:Control Structures I (Selection) Introduction to Computer Science Spring 2006.
Group practice in problem design and problem solving
The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"
General Programming Introduction to Computing Science and Programming I.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley STARTING OUT WITH Python Python First Edition by Tony Gaddis Chapter 4 Decision.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Functions. Built-in functions You’ve used several functions already >>> len("ATGGTCA")‏ 7 >>> abs(-6)‏ 6 >>> float("3.1415")‏ >>>
Python uses boolean variables to evaluate conditions. The boolean values True and False are returned when an expression is compared or evaluated.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Conditional Loops CSIS 1595: Fundamentals of Programming and Problem Solving 1.
Susie’s lecture notes are in the presenter’s notes, below the slides Disclaimer: Susie may have made errors in transcription or understanding. If there.
Counting Loops.
Python Mini-Course University of Oklahoma Department of Psychology Day 3 – Lesson 11 Using strings and sequences 5/02/09 Python Mini-Course: Day 3 – Lesson.
More Sequences. Review: String Sequences  Strings are sequences of characters so we can: Use an index to refer to an individual character: Use slices.
File input and output and conditionals Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Flow Control in Imperative Languages. Activity 1 What does the word: ‘Imperative’ mean? 5mins …having CONTROL and ORDER!
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Chapter 2 Variables and Constants. Objectives Explain the different integer variable types used in C++. Declare, name, and initialize variables. Use character.
String and Lists Dr. José M. Reyes Álamo.
CS170 – Week 1 Lecture 3: Foundation Ismail abumuhfouz.
Algorithmic complexity: Speed of algorithms
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
(optional - but then again, all of these are optional)
CompSci 101 Introduction to Computer Science
Topics The if Statement The if-else Statement Comparing Strings
Chapter 5 Structures.
Conditionals (if-then-else)
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Topics The if Statement The if-else Statement Comparing Strings
Topics Introduction to File Input and Output
T. Jumana Abu Shmais – AOU - Riyadh
Coding Concepts (Basics)
Arrays .
String and Lists Dr. José M. Reyes Álamo.
LabVIEW.
Algorithmic complexity: Speed of algorithms
Fundamentals of visual basic
More for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Chapter 2: Introduction to C++.
CS1110 Today: collections.
Functions continued.
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Algorithmic complexity: Speed of algorithms
Introduction to Computer Science
Comparing Python and Java
Data Types Every variable has a given data type. The most common data types are: String - Text made up of numbers, letters and characters. Integer - Whole.
Topics Introduction to File Input and Output
Variables and Constants
How to allow the program to know when to stop a loop.
Presentation transcript:

while loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Hints on variable names Pick names that are descriptive Change a name if you decide there’s a better choice Give names to intermediate values for clarity Use the name to describe the type of object Very locally used names can be short and arbitrary listOfLines = myFile.readlines() seqString = "GATCTCTATCT" myDPMatrix = [[0,0,0],[0,0,0],[0,0,0]] intSum = 0 for i in range(5000): intSum = intSum + listOfInts[i] (more code)

Comment your code! Any place a # sign appears, the rest of the line is a comment (ignored by program). Blank lines are also ignored – use them to visually group code. import sys query = sys.argv[1] myFile = open(sys.argv[2], "r") lineList = myFile.readlines() # put all the lines from a file into a list # now I want to process each line to remove the \n character, # then search the line for query and record all the results # in a list of ints intList = [] for line in lineList: position = line.find(query) intList.append(position) etc.

for loop review block of code for in :... can be a newly created variable. You can access the variable only INSIDE the loop. is a container of 1 or more s and it must already exist. range() will make a list of ints “on the fly” for index in range(0,100):

while loop while (conditional test):... While something is True keep running the loop, exit as soon as the test is False. The conditional test syntax is the same as for if and elif statements. Similar to a for loop

What does this program do? sum = 0 count = 1 while (count < 10): sum = sum + count count = count + 1 print count # should be 10 print sum # should be 45

for vs. while you will probably use for loops more for is natural to loop through a list, characters in a string, etc. (anything of determinate size). while is natural to loop an indeterminate number of times until some condition is met.

for base in sequence: for sequence in database: for index in range(5,200): Examples of for loops

Examples of while loops while (error > 0.05): while (score > 0):

Reminder - comparison operators Boolean: and, or, not Numeric:, ==, !=, >=, <= String: in, not in <is less than > is greater than ==is equal to !=is NOT equal to =is greater than or equal to Comparisons evaluate to True or False

Terminating a loop continue : jumps to the top of the enclosing loop break : breaks completely out of the enclosing loop while loops use continue and break in the same way as for loops:

x += 1 is the same as x = x + 1 A common idiom in Python (and other languages). It's never necessary, but people use it frequently. Also works with other math operators: x += y # adds y to the value of x x *= y # multiplies x by the value y x -= y # subtracts y from x x /= y # divides x by y the increment operator shorthand

program exit In addition to accessing command-line arguments, the sys module has many other useful functions (look them up in the Python docs). import sys # Make sure we got one argument on the command line. if len(sys.argv) != 2: print "USAGE: argument expected" sys.exit() sys.exit() # exit program immediately In use:

Sample problem #1 Write a program add-arguments.py that reads any number of integers from the command line and prints the cumulative total for each successive argument using a while loop. > python add-arguments.py > python add-arguments.py

Solution #1 import sys total = 0 i = 1 while i < len(sys.argv): total += int(sys.argv[i]) print total i += 1

Sample problem #2 Write a program count-fasta.py that counts the number of fasta sequences in a file specified on the command line. Use either a while loop or a for loop. >identifier1 [optional comments] AAOSIUBOASIUETOAISOBUAOSIDUGOAIBUOABOIUAS AOSIUDTOAISUETOIGLKBJLZXCOITLJLBIULEIJLIJ >identifier2 [optional comments] TXDIGSIDJOIJEOITJOSIJOIGJSOIEJTSOE >identifier3 Etc. Fasta format: sequence on any number of lines until next line that starts with “>” Two files are linked in News on the course web page – run your program on both: small.fasta and large.fasta

Solution #2 import sys # Make sure we got an argument on the command line. if (len(sys.argv) != 2): print "USAGE: count-fasta.py one file argument required" sys.exit() # Open the file for reading. fasta_file = open(sys.argv[1], "r") lineList = fastaFile.readlines() num_seqs = 0 for line in lineList: # Increment if this is the start of a sequence. if (line[0] == ">"): num_seqs += 1 print num_seqs fasta_file.close() Not required, but a good habit to get into

Challenge problem Write a program seq-len.py that reads a file of fasta sequences and prints the name and length of each sequence and their total length. >seq-len.py seqs.fasta seq1 432 seq2 237 seq3 231 Total length 900

import sys filename = sys.argv[1] myFile = open(filename, "r") myLines = myFile.readlines() myFile.close() # we read the file, now close it cur_name = "" # initialize required variables cur_len = 0 total_len = 0 first_seq = True # special variable to handle the first sequence for line in myLines: if (line.startswith(">")): # we reached a new fasta sequence if (first_seq): # if first sequence, record name and continue cur_name = line.strip() first_seq = False # mark that we are done with the first sequence continue else: # we are past the first sequence print cur_name, cur_len # write values for previous sequence total_len += cur_len # increment total_len cur_name = line.strip() # record the name of the new sequence cur_len = 0 # reset cur_len else: # still in the current sequence, increment length cur_len += len(line.strip()) print cur_name, cur_len # we need to write the last values print “Total length", total_len Challenge problem solution 1

import sys filename = sys.argv[1] myFile = open(filename, "r") myLines = myFile.readlines() myFile.close() # we read the file, now close it cur_name = myLines[0] # initialize required variables cur_len = 0 total_len = 0 for index in range(1, len(myLines)): if (myLines[index].startswith(">")): # we reached a new fasta sequence print cur_name, cur_len # write values for previous sequence total_len += cur_len # increment total_len cur_name = line.strip() # record the name of the new sequence cur_len = 0 # reset cur_len else: # still in the current sequence, increment length cur_len += len(myLines[index].strip()) print cur_name, cur_len # we need to write the last values print "Total length", total_len Another solution (more compact but has the disadvantage that it assumes the first line has a fasta name)

A student last year (Lea Starich) came up with a simpler solution, though it won't work if there are internal '>' characters. Here is my version using Lea’s method: import sys filename = sys.argv[1] myFile = open(filename, "r") whole_string = myFile.read() myFile.close() seqList = whole_string.split(">") total_len = 0 for seq in seqList: lineList = seq.split("\n") length = len("".join(lineList[1:])) total_len += length print lineList[0], length print "Total length", total_len What this does is split the text of the entire file on “>”, which gives a list of strings (each containing the sequence with its name). Each of these strings is split at “\n” characters, which gives a list of lines. The 0 th line in this list is the name, and the rest of the lines are sequence. The funky looking join statement just merges all the sequence lines into one long string and gets its length.

One of the arts of programming is seeing how to write elegant loops that do complex things. It takes time and practice.

import sys from Bio import Seq from Bio import SeqIO filename = sys.argv[1] myFile = open(filename, "r") seqRecords = SeqIO.parse(myFile, "fasta") total_len = 0 for record in seqRecords: print record.name, len(record.seq) total_len += len(record.seq) print "Total length", total_len myFile.close() By the way, here is the challenge problem solution done using BioPython (which you will learn about later) shorter and much easier to write and understand