File input and output and conditionals Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Slides:



Advertisements
Similar presentations
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Introduction to Python
Python Programming Language
CS0007: Introduction to Computer Programming
Introduction to C Programming
Μαθαίνοντας Python [Κ4] ‘Guess the Number’
Computer Science 111 Fundamentals of Programming I Files.
File input and output if-then-else Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
1 Loops. 2 Often we want to execute a block of code multiple times. Something is always different each time through the block. Typically a variable is.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
Introduction to C Programming
The University of Texas – Pan American
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
Line Continuation, Output Formatting, and Decision Structures CS303E: Elements of Computers and Programming.
The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"
EGR 2261 Unit 4 Control Structures I: Selection  Read Malik, Chapter 4.  Homework #4 and Lab #4 due next week.  Quiz next week.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 4 Decision Structures and Boolean Logic.
Introduction to Python
General Programming Introduction to Computing Science and Programming I.
Decision Structures and Boolean Logic
CATHERINE AND ANNIE Python: Part 4. Strings  Strings are interesting creatures. Although words are strings, anything contained within a set of quotes.
PYTHON CONDITIONALS AND RECURSION : CHAPTER 5 FROM THINK PYTHON HOW TO THINK LIKE A COMPUTER SCIENTIST.
Strings CSE 1310 – Introduction to Computers and Programming Vassilis Athitsos University of Texas at Arlington 1.
Lists and the ‘ for ’ loop. Lists Lists are an ordered collection of objects >>> data = [] >>> print data [] >>> data.append("Hello!") >>> print data.
Built-in Data Structures in Python An Introduction.
Topics: IF If statements Else clauses. IF Statement For the conditional expression, evaluating to True or False, the simple IF statement is if : x = 7.
CSC 110 Using Python [Reading: chapter 1] CSC 110 B 1.
9/14/2015BCHB Edwards Introduction to Python BCHB Lecture 4.
Python Conditionals chapter 5
Fall 2002CS 150: Intro. to Computing1 Streams and File I/O (That is, Input/Output) OR How you read data from files and write data to files.
Guide to Programming with Python Chapter Seven Files and Exceptions: The Trivia Challenge Game.
Quiz 3 is due Friday September 18 th Lab 6 is going to be lab practical hursSept_10/exampleLabFinal/
16. Python Files I/O Printing to the Screen: The simplest way to produce output is using the print statement where you can pass zero or more expressions,
Decision Structures, String Comparison, Nested Structures
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
A FIRST BOOK OF C++ CHAPTER 14 THE STRING CLASS AND EXCEPTION HANDLING.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Python Basics  Values, Types, Variables, Expressions  Assignments  I/O  Control Structures.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Introduction to Python
Basic concepts of C++ Presented by Prof. Satyajit De
Introduction to Python
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Line Continuation, Output Formatting, and Decision Structures
Chapter 2, Part I Introduction to C Programming
Topics The if Statement The if-else Statement Comparing Strings
File input and output Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble Notes from 2009: Sample problem.
Conditionals (if-then-else)
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Imperative Programming
Topics The if Statement The if-else Statement Comparing Strings
Line Continuation, Output Formatting, and Decision Structures
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble Notes for 2010: I skipped slide 10. This is.
Topics Introduction to File Input and Output
Fundamentals of Programming I Files
Introduction to Python
More for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Topics Introduction to File Input and Output
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Introduction to Python
“Everything Else”.
Data Types Every variable has a given data type. The most common data types are: String - Text made up of numbers, letters and characters. Integer - Whole.
Topics Introduction to File Input and Output
Imperative Programming
Introduction to Computer Science
Presentation transcript:

File input and output and conditionals Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Opening files The built-in open() function returns a file object : = open(, ) Python will read, write or append to a file according to the access type requested: –'r' = read –'w' = write (will replace the file if it exists) –'a' = append (appends to an existing file) For example, open for reading a file called "hello.txt": >>> myFile = open('hello.txt', 'r')

Reading the whole file You can read the entire content of the file into a single string. If the file content is the text “Hello, world!\n”: >>> myString = myFile.read() >>> print myString Hello, world! >>> why is there a blank line here?

Reading the whole file Now add a second line to the file (“How ya doin’?\n”) and try again. >>> myFile = open("hello.txt", "r") >>> myString = myFile.read() >>> print myString Hello, world! How ya doin'? >>>

Reading the whole file Alternatively, you can read the file into a list of strings, one string for each line: >>> myFile = open("hello.txt", "r") >>> myStringList = myFile.readlines() >>> print myStringList ['Hello, world!\n', 'How ya doin'?\n'] >>> print myStringList[1] How ya doin'? this file method returns a list of strings, one for each line in the file notice that each line has the newline character at the end

Reading one line at a time The readlines() method puts all the lines into a list of strings. The readline() method returns only the next line: >>> myFile = open("hello.txt", "r") >>> myString = myFile.readline() >>> print myString Hello, world! >>> myString = myFile.readline() >>> print myString How ya doin'? >>> print myString.strip() # strip the newline off How ya doin'? >>> notice that readline() automatically keeps track of where you are in the file - it reads the next line after the one previously read

Writing to a file Open a file for writing (or appending): >>> myFile = open("new.txt", "w") # (or "a") Use the.write() method: >>> myFile.write("This is a new file\n") >>> myFile.close() >>> Ctl-D (exit the python interpreter) > cat new.txt This is a new file always close a file after you are finished reading from or writing to it. open("new.txt", "w") will overwrite an existing file (or create a new one) open("new.txt", "a") will append to an existing file

.write() is a little different from print.write() does not automatically append a new-line character..write() requires a string as input. >>> newFile.write("foo") >>> newFile.write(1) Traceback (most recent call last): File " ", line 1, in ? TypeError: argument 1 must be string or read-only character buffer, not int >>> newFile.write(str(1)) # str converts to string (also of course print goes to the screen and.write() goes to a file)

if-elif-else Conditional code execution and code blocks

The if statement >>> if (seq.startswith("C")):... print "Starts with C"... Starts with C >>> A block is a group of lines of code that belong together. if ( ): In the Python interpreter, the ellipsis indicates that you are inside a block (on my Win machine it is just a blank indentation). Python uses indentation to keep track of code blocks. You can use any number of spaces to indicate a block, but you must be consistent. Using one is simplest. An unindented or blank line indicates the end of a block.

The if statement Try doing an if statement without indentation: >>> if (seq.startswith("C")):... print "Starts with C" File " ", line 2 print "Starts with C" ^ IndentationError: expected an indented block the interpreter expects you to be inside a code block (…) but it is not indented properly

Multiline blocks Try doing an if statement with multiple lines in the block. >>> if (seq.startswith("C")):... print "Starts with C"... print "All right by me!"... Starts with C All right by me! When the if statement is true, all of the lines in the block are executed (in this case two lines in the block).

Multiline blocks What happens if you don’t use the same number of spaces to indent the block? >>> if (seq.startswith("C")):... print "Starts with C"... print "All right by me!" File " ", line 4 print "All right by me!" ^ SyntaxError: invalid syntax This is why I prefer to use a single character – it is always exactly correct.

Comparison operators Boolean: and, or, not Numeric:, ==, !=, >=, <= String: in, not in <is less than > is greater than ==is equal to !=is NOT equal to =is greater than or equal to

Examples seq = 'CAGGT' >>> if ('C' == seq[0]):... print 'C is first in', seq... C is first in CAGGT >>> if ('CA' in seq):... print 'CA is found in', seq... CA is found in CAGGT >>> if (('CA' in seq) and ('CG' in seq)):... print "Both there!"... >>> comparison operators

Beware! = versus == Single equal assigns a value. Double equal tests for equality.

Combining tests x = 1 y = 2 z = 3 if ((x < y) and (y != z)): do something if ((x > y) or (y == z)): do something else Evaluation starts with the innermost parentheses and works out. When there are multiple parentheses at the same level, evaluation starts at the left and moves right. The statements can be arbitrarily complex. if (((x <= y) and (x < z)) or ((x == y) and not (x == z)))

if-else statements if : else: The else block executes only if is false. >>> if (seq.startswith('T')):... print 'T start'... else:... print 'starts with', seq[0]... starts with C >>> evaluates to FALSE

if-elif-else if : elif : else: elif block executes if is false and then performs a second Only one of the blocks is executed. Can be read this way: if test1 is true then run block1, else if test2 is true run block2, else run block3

Example >>> base = 'C' >>> if (base == 'A'):... print "adenine"... elif (base == 'C'):... print "cytosine"... elif (base == 'G'):... print "guanine"... elif (base == 'T'):... print "thymine"... else:... print "Invalid base!"... cytosine

= open(, 'r'|'w'|'a') =.read() =.readline() =.readlines().write( ).close() if : elif : else: Boolean: and, or, not Numeric:, ==, !=, >=, <= String: in, not in

Sample problem #1 Write a program read-first-line.py that takes a file name from the command line, opens the file, reads the first line, and prints the line to the screen. > python read-first-line.py hello.txt Hello, world! >

Solution #1 import sys filename = sys.argv[1] myFile = open(filename, "r") firstLine = myFile.readline() myFile.close() print firstLine

Sample problem #2 Modify your program to print the first line without an extra new line. > python read-first-line.py hello.txt Hello, world! >

Solution #2 import sys filename = sys.argv[1] myFile = open(filename, "r") firstLine = myFile.readline() firstLine = firstLine[:-1] myFile.close() print firstLine (or use firstLine.strip(), which removes all the whitespace from both ends) remove last character

Sample problem #3 Write a program math-two-numbers.py that reads one integer from the first line of one file and a second integer from the first line of a second file. If the first number is smaller, then print their sum, otherwise print their multiplication. Indicate the entire operation in your output. > add-two-numbers.py four.txt nine.txt = 13 >

Solution #3 import sys fileOne = open(sys.argv[1], "r") valOne = int(fileOne.readline()[:-1]) fileOne.close() fileTwo = open(sys.argv[2], "r") valTwo = int(fileTwo.readline()[:-1]) fileTwo.close() if valOne < valTwo: print valOne, "+", valTwo, "=", valOne + valTwo else: print valOne, "*", valTwo, "=", valOne * valTwo

import sys fileOne = open(sys.argv[1], "r") valOne = int(fileOne.readline().strip()) fileOne.close() fileTwo = open(sys.argv[2], "r") valTwo = int(fileTwo.readline().strip()) fileTwo.close() if valOne < valTwo: print valOne, "+", valTwo, "=", valOne + valTwo else: print valOne, "*", valTwo, "=", valOne * valTwo Here's a version that is more robust because it doesn't matter whether the file lines have white space or a newline:

Sample problem #4 (review) Write a program find-base.py that takes as input a DNA sequence and a nucleotide. The program should print the number of times the nucleotide occurs in the sequence, or a message saying it’s not there. > python find-base.py A GTAGCTA A occurs twice > python find-base.py A GTGCT A does not occur at all Hint: S.find('G') returns -1 if it can't find the requested string.

Solution #4 import sys base = sys.argv[1] sequence = sys.argv[2] position = sequence.find(base) if (position == -1): print base, "does not occur at all" else: n = sequence.count(base) print base, "occurs " + n + "times"

Challenge problems Write a program that reads a sequence file (seq1) and a sequence (seq2) based on command line arguments and makes output to the screen that either: 1)says seq2 is entirely missing from seq1, or 2)counts the number of times seq2 appears in seq1, or 3)warns you that seq2 is longer than seq1 > python challenge.py seqfile.txt GATC > GATC is absent (or > GATC is present 7 times) (or > GATC is longer than the sequence in seqfile.txt) TIP – file.read() includes the newline characters from a multiline file Make sure you can handle multiline sequence files. Do the same thing but output a list of all the positions where seq2 appears in seq1 (tricky with your current knowledge).

Write a program that is approximately equivalent to the find and replace function of word processors. Take as arguments: 1) a string to find, 2) a string to replace with, 3) input file name, 4) output file name. You don't really need this, but try to incorporate a conditional test. > f_and_r.py Watson Crick infile.txt outfile.txt (should replace all appearances of "Watson" in the input file with "Crick".) Challenge problems

Reading First parts of chapters 5 and 14 from Think Python by Downey