COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS Jehan-François Pâris

Slides:



Advertisements
Similar presentations
Java File I/O. File I/O is important! Being able to write and read from files is necessary and is also one common practice of a programmer. Examples include.
Advertisements

Computer Science 111 Fundamentals of Programming I Files.
Reading and Writing Files Keeping Data. Why do we use files? ä For permanently storing data. ä For dealing with information too large to fit in memory.
Introduction to Python
Files in Python The Basics. Why use Files? Very small amounts of data – just hardcode them into the program A few pieces of data – ask the user to input.
Guide To UNIX Using Linux Third Edition
Introduction to Unix (CA263) Introduction to Shell Script Programming By Tariq Ibn Aziz.
CHAPTER 6 FILE PROCESSING. 2 Introduction  The most convenient way to process involving large data sets is to store them into a file for later processing.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Extended Prelude to Programming Concepts & Design, 3/e by Stewart Venit and.
Python Mini-Course University of Oklahoma Department of Psychology Lesson 17 Reading and Writing Files 5/10/09 Python Mini-Course: Lesson 17 1.
Lesson 7-Creating and Changing Directories. Overview Using directories to create order. Managing files in directories. Using pathnames to manage files.
Systems Software & Operating systems
© Copyright 2012 by Pearson Education, Inc. All Rights Reserved. Chapter 13 Files and Exception Handling 1.
Introduction to Shell Script Programming
Lists in Python.
November 15, 2005ICP: Chapter 7: Files and Exceptions 1 Introduction to Computer Programming Chapter 7: Files and Exceptions Michael Scherger Department.
An Introduction to Unix Shell Scripting
17. Python Exceptions Handling Python provides two very important features to handle any unexpected error in your Python programs and to add debugging.
Functions Reading/writing files Catching exceptions
Guide to Programming with Python Chapter Seven (Part 1) Files and Exceptions: The Trivia Challenge Game.
Extended Prelude to Programming Concepts & Design, 3/e by Stewart Venit and Elizabeth Drake Chapter 6: Sequential Data Files.
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
Shell Script Programming. 2 Using UNIX Shell Scripts Unlike high-level language programs, shell scripts do not have to be converted into machine language.
 Pearson Education, Inc. All rights reserved Introduction to Java Applications.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley STARTING OUT WITH Python Python First Edition by Tony Gaddis Chapter 7 Files.
With Python.  One of the most useful abilities of programming is the ability to manipulate files.  Python’s operations for file management are relatively.
Storing and Retrieving Data
5 1 Data Files CGI/Perl Programming By Diane Zak.
COSC 1306 COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris
COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS REVISITED Jehan-François Pâris
Using Text Files in Excel File I/O Methods. Working With Text Files A file can be accessed in any of three ways: –Sequential access: By far the most common.
Fall 2002CS 150: Intro. to Computing1 Streams and File I/O (That is, Input/Output) OR How you read data from files and write data to files.
Guide to Programming with Python Chapter Seven Files and Exceptions: The Trivia Challenge Game.
16. Python Files I/O Printing to the Screen: The simplest way to produce output is using the print statement where you can pass zero or more expressions,
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
Files Tutor: You will need ….
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Python Let’s get started!.
Introduction to Files in VB Chapter 9.1, 9.3. Overview u Data Files  random access  sequential u Working with sequential files  open, read, write,
FILES. open() The open() function takes a filename and path as input and returns a file object. file object = open(file_name [, access_mode][, buffering])
Lecture 4 Python Basics Part 3.
FILES IN C. File Operations  Creation of a new file  Opening an existing file  Reading from a file  Writing to a file  Moving to a specific location.
NXT File System Just like we’re able to store multiple programs and sound files to the NXT, we can store text files that contain information we specify.
Python: File Directories What is a directory? A hierarchical file system that contains folders and files. Directory (root folder) Sub-directory (folder.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
FILE I/O: Low-level 1. The Big Picture 2 Low-Level, cont. Some files are mixed format that are not readable by high- level functions such as xlsread()
COSC 1306 COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris
Information and Computer Sciences University of Hawaii, Manoa
COSC 1306 COMPUTER SCIENCE AND PROGRAMMING
Chapter 6: Sequential Data Files
Introduction to Python
Chapter 8 Text Files We have, up to now, been storing data only in the variables and data structures of programs. However, such data is not available.
Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Exceptions and files Taken from notes by Dr. Neil Moore
File Handling Programming Guides.
Topics Introduction to File Input and Output
Chapter 7 Files and Exceptions
File IO and Strings CIS 40 – Introduction to Programming in Python
Using files Taken from notes by Dr. Neil Moore
Fundamentals of Programming I Files
COSC 1306 COMPUTER SCIENCE AND PROGRAMMING
Exceptions and files Taken from notes by Dr. Neil Moore
CISC101 Reminders Quiz 2 graded. Assn 2 sample solution is posted.
Fundamentals of Data Structures
Introduction to Python: Day Three
Files Handling In today’s lesson we will look at:
Topics Introduction to Functions Defining and Calling a Function
Winter 2019 CISC101 4/29/2019 CISC101 Reminders
Topics Introduction to File Input and Output
Introduction to Computer Science
Presentation transcript:

COSC 1306—COMPUTER SCIENCE AND PROGRAMMING PYTHON FUNCTIONS Jehan-François Pâris

Module Overview We will learn how to read, create and modify files –Pay special attention to pickled files They are very easy to use!

The file system Provides long term storage of information. Will store data in stable storage (disk) Cannot be RAM because: – Dynamic RAM loses its contents when powered off – Static RAM is too expensive –System crashes can corrupt contents of the main memory

Overall organization Data managed by the file system are grouped in user-defined data sets called files The file system must provide a mechanism for naming these data –Each file system has its own set of conventions –All modern operating systems use a hierarchical directory structure

Windows solution Each device and each disk partition is identified by a letter –A: and B: were used by the floppy drives –C: is the first disk partition o f the hard drive –If hard drive has no other disk partition, D: denotes the DVD drive Each device and each disk partition has its own hierarchy of folders

Windows solution C: Windows Users Second disk D: Program Files Flash drive F:

UNIX/LINUX organization Each device and disk partition has its own directory tree –Disk partitions are glued together through the operation to form a single tree Typical user does not know where her files are stored

UNIX/LINUX organization Root partition bin usr / Other partition The magic mount Second partition can be accessed as /usr

Mac OS organization Similar to Windows –Disk partitions are not merged –Represented by separate icons on the desktop

Accessing a file (I) Your Python programs are stored in a folder AKA directory –On my home PC it is C:\Users\Jehan-Francois Paris\Documents\ Courses\1306\Python All files in that directory can be directly accessed through their names – "myfile.txt"

Accessing a file (II) Files in subdirectories can be accessed by specifying first the subdirectory – Windows style: "test\\sample.txt" – Note the double backslash – Linux/Unix/Mac OS X style: "test/sample.txt" – Generally works for Windows

Why the double backslash? The backslash is an escape character in Python –Combines with its successor to represent non-printable characters ‘\n’ represents a newline ‘\t’ represents a tab –Must use ‘ \\ ’ to represent a plain backslash

Accessing a file (III) For other files, must use full pathname – Windows Style: "C:\\Users\\Jehan-Francois Paris\\ Documents\\Courses\\1306\\Python\\ myfile.txt"

Accessing file contents Two step process: –First we open the file –Then we access its contents Read Write When we are done, we close the file.

What happens at open() time? The system verifies –That you are an authorized user –That you have the right permission Read permission Write permission Execute permission exists but doesn’t apply and returns a file handle / file descriptor

The file handle Gives the user –Direct access to the file No directory lookups –Authority to execute the file operations whose permissions have been requested

Python open() open(name, mode = ‘r’, buffering = -1) where – name is name of file – mode is permission requested Default is ‘ r ’ for read only – buffering specifies the buffer size Use system default value (code -1)

The modes Can request – ‘r’ for read-only – ‘w’ for write-only Always overwrites the file –‘a’ for append Writes at the end – ‘r+’ or ‘a+’ for updating (read + write/append)

Examples f1 = open("myfile.txt") same as f1 = open("myfile.txt", "r") f2 = open("test\\sample.txt", "r") f3 = open("test/sample.txt", "r") f4 = open("C:\\Users\\Jehan-Francois Paris\\ Documents\\Courses\\1306\\Python\\myfile.txt")

Reading a file Three ways: –Global reads –Line by line –Pickled files

Global reads fh.read() –Returns whole contents of file specified by file handle fh –File contents are stored in a single string that might be very large

Example f2 = open("test\\sample.txt", "r") bigstring = f2.read() print(bigstring) f2.close() # not required

Output of example To be or not to be that is the question Now is the winter of our discontent –Exact contents of file ‘test\sample.txt’

Line-by-line reads for line in fh : # do not forget the column #anything you want fh.close() # not required

Example f3 = open("test/sample.txt", "r") for line in f3 : # do not forget the column print(line) f3.close() # not required

Output To be or not to be that is the question Now is the winter of our discontent –With one or more extra blank lines

Why? Each line ends with an end-of-line marker print(…) adds an extra end-of-line

Trying to remove blank lines print(' ') f5 = open("test/sample.txt", "r") for line in f5 : # do not forget the column print(line[:-1]) # remove last char f5.close() # not required print(' ')

The output To be or not to be that is the question Now is the winter of our disconten The last line did not end with an EOL!

A smarter solution (I) Only remove the last character if it is an EOL – if line[-1] == ‘\n’ : print(line[:-1] else print line

A smarter solution (II) print(' ') fh = open("test/sample.txt", "r") for line in fh : # do not forget the column if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line) print(' ') fh.close() # not required

It works! To be or not to be that is the question Now is the winter of our discontent

Making sense of file contents Most files contain more than one data item per line –COSC UHPD Must split lines – mystring.split(sepchar) where sepchar is a separation character returns a list of items

Splitting strings >>> text = "Four score and seven years ago" >>> text.split() ['Four', 'score', 'and', 'seven', 'years', 'ago'] >>>record ="1,'Baker, Andy', 83, 89, 85" >>> record.split(',') [' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85'] Not what we wanted!

Example # how2split.py print(' ') f5 = open("test/sample.txt", "r") for line in f5 : words = line.split() for xxx in words : print(xxx) f5.close() # not required print(' ')

Output To be … of our discontent

Other separators (I) Commas –CSV Excel format Values are separated by commas Strings are stored without quotes –Unless they contain a comma “Doe, Jane”, freshman, 90, 90 –Quotes within strings are doubled

Other separators (II) Tabs( ‘\t’) – Advantages: Your fields will appear nicely aligned Spaces, commas, … are not an issue – Disadvantage: You do not see them –They look like spaces

Why it is important When you must pick your file format, you should decide how the data inside the file will be used: –People will read them –Other programs will use them –Will be used by people and machines

An exercise Converting our output to CSV format –Replacing tabs by commas Easy –Will use string replace function

First attempt fh_in = open('grades.txt', 'r') # the 'r' is optional buffer = fh_in.read() newbuffer = buffer.replace('\t', ',') fh_out = open('grades0.csv', 'w') fh_out.write(newbuffer) fh_in.close() fh_out.close() print('Done!')

The output Alice Bob Carol becomes Alice,90,90,90,90,90 Bob,85,85,85,85,85 Carol,75,75,75,75,75

Dealing with commas (I) Work line by line For each line –split input into fields using TAB as separator –store fields into a list Alice becomes [‘Alice’, ’90’, ’90’, ’90’, ’90’, ’90’]

Dealing with commas (II) –Put within double quotes any entry containing one or more commas –Output list entries separated by commas ['"Baker, Alice"', 90, 90, 90, 90, 90] becomes "Baker, Alice",90,90,90,90,90

Dealing with commas (III) Our troubles are not over: –Must store somewhere all lines until we are done –Store them in a list

Dealing with double quotes Before wrapping items with commas with double quotes replace –All double quotes by pairs of double quotes – 'Aguirre, "Lalo" Eduardo' becomes 'Aguirre, ""Lalo"" Eduardo' then '"Aguirre, ""Lalo"" Eduardo"'

General organization (I) linelist = [ ] for line in file –itemlist = line.split(…) –linestring = '' # empty string –for each item in itemlist remove any trailing newline double all double quotes if item contains comma, wrap add to linestring

General organization (II)

General organization (III)

The program (I) # betterconvert2csv.py """ Convert tab-separated file to csv """ fh = open('grades.txt','r') #input file linelist = [ ] # global data structure for line in fh : # outer loop itemlist = line.split('\t') # print(str(itemlist)) # just for debugging linestring = '' # start afresh

The program (II) for item in itemlist : #inner loop item = item.replace('"','""') # for quotes if item[-1] == '\n' : # remove it item = item[:-1] if ',' in item : # wrap item linestring += '"' + item +'"' + ',' else : # just append linestring += item +',' # end of inside for loop

The program (III) # must replace last comma by newline linestring = linestring[:-1] + '\n' linelist.append(linestring) # end of outside for loop fh.close() fhh = open('great.csv', 'w') for line in linelist : fhh.write(line) fhh.close()

Notes Most print statements used for debugging were removed –Space considerations Observe that the inner loop adds a comma after each item –Wanted to remove the last one Must also add a newline at end of each line

The input file Alice Bob Carol Doe, Jane Fulano, Eduardo "Lalo"

The output file Alice,90,90,90,90,90 Bob,85,85,85,85,85 Carol,75,75,75,75,75 "Doe, Jane",90,90,90,80,75 "Fulano, Eduardo ""Lalo""",90,90,90,90

Mistakes being made (I) Mixing lists and strings: –Earlier draft of program declared linestring = [ ] and did linestring.append(item) – Outcome was ['Alice,', '90,'. … ] instead of 'Alice,90, …'

Mistakes being made (II) Forgetting to add a newline –Output was a single line Doing the append inside the inner loop: –Output was Alice,90 Alice,90,90 Alice,90,90,90 …

Mistakes being made Forgetting that strings are immutable: –Trying to do linestring[-1] = '\n' instead of linestring = linestring[:-1] + '\n' – Bigger issue: Do we have to remove the last comma?

Could we have done better? (I) Make the program more readable by decomposing it into functions –A function to process each line of input do_line(line) –Input is a string ending with newline –Output is a string in CSV format –Should call a function processing individual items

Could we have done better? (II) –A function to process individual items do_item(item) –Input is a string –Returns a string With double quotes "doubled" Without a newline Within quotes if it contains a comma

The new program (I) def do_item(item) : item = item.replace('"','""') if item[-1] == '\n' : item = item[:-1] if ',' in item : item ='"' + item +'"' return item

The new program (II) def do_line(line) : itemlist = line.split('\t') linestring = '' # start afresh for item in itemlist : linestring += do_item(item) +',' linestring += '\n' return linestring

The new program (III) fh = open('grades.txt','r') linelist = [ ] for line in fh : linelist.append(do_line(line)) fh.close()

The new program (IV) fhh = open('great.csv', 'w') for line in linelist : fhh.write(line) fhh.close()

Why it is better Program is decomposed into small modules that are much easier to understand –Each fits on a PowerPoint slide

The break statement Makes the program exit the loop it is in In next example, we are looking for first instance of a string in a file –Can exit as soon it is found

Example (I) searchstring= input('Enter search string:') found = False fh = open('grades.txt') for line in fh : if searchstring in line : print(line) found = True break

Example (II) if found == True : print("String %s was found" % searchstring) else : print("String %s NOT found " % searchstring)

Flags A variable like found –That can either be True or False –That is used in a condition for an if or a while is often referred to as a flag

A dumb mistake Unlike C and its family of languages, Python does not let you write – if found = True for – if found == True There are still cases where we can do mistakes!

Example >>> b = 5 >>> c = 8 >>> a = b = c >>> a 8 >>> a = b == c >>> a True

HANDLING EXCEPTIONS

When a wrong value is entered When user is prompted for – number = int(input("Enter a number: ") and enters –a non-numerical string a ValueError exception is raised and the program terminates Python a programs catch errors

The try… except pair (I) try: except Exception as ex: Observe –the colons –the indentation

The try… except pair (II) try: except Exception as ex: If an exception occurs while the program executes the statements between the try and the except, control is immediately transferred to the statements after the except

A better example done = False while not done : filename= input("Enter a file name: ") try : fh = open(filename) done = True except Exception as ex: print ('File %s does not exist' % filename) print(fh.read())

An Example (I) done = False while not done : try : number = int(input('Enter a number:')) done = True except Exception as ex: print ('You did not enter a number') print ("You entered %.2f." % number) input("Hit enter when done with program.")

A simpler solution done = False while not done myinput = (input('Enter a number:')) if myinput.isdigit() : number = int(myinput) done = True else : print ('You did not enter a number') print ("You entered %.2f." % number) input("Hit enter when done with program.")

PICKLED FILES

Pickled files import pickle –Provides a way to save complex data structures in a file –Sometimes said to provide a serialized representation of Python objects

Basic primitives (I) dump(object,fh) –appends a sequential representation of object into file with file handle fh – object is virtually any Python object – fh is the handle of a file that must have been opened in 'wb' mode b is a special option allowing to write or read binary data

Basic primitives (II) target = load( filehandle) –assigns to target next pickled object stored in file filehandle – target is virtually any Python object – filehandle id filehandle of a file that was opened in rb mode

Example (I) >>> mylist = [ 2, 'Apples', 5, 'Oranges'] >>> mylist [2, 'Apples', 5, 'Oranges'] >>> fh = open('testfile', 'wb') # b is for BINARY >>> import pickle >>> pickle.dump(mylist, fh) >>> fh.close()

Example (II) >>> fhh = open('testfile', 'rb') # b is for BINARY >>> theirlist = pickle.load(fhh) >>> theirlist [2, 'Apples', 5, 'Oranges'] >>> theirlist == mylist True

What was stored in testfile? Some binary data containing the strings 'Apples' and 'Oranges'

Using ASCII format Can require a pickled representation of objects that only contains printable characters –Must specify protocol = 0 Advantage: –Easier to debug Disadvantage: –Takes more space

Example import pickle mydict = {'Alice': 22, 'Bob' : 27} fh = open('asciifile.txt', 'wb') # MUST be 'wb' pickle.dump(mydict, fh, protocol = 0) fh.close() fhh = open('asciifile.txt', 'rb') theirdict = pickle.load(fhh) print(mydict) print(theirdict)

The output {'Bob': 27, 'Alice': 22} {'Bob': 27, 'Alice': 22}

What is inside asciifile.txt? (dp0VBobp1L27Ls V Alicep2L22Ls.

Dumping multiple objects (I) import pickle fh = open('asciifile.txt', 'wb') for k in range(3, 6) : mylist = [i for i in range(1,k)] print(mylist) pickle.dump(mylist, fh, protocol = 0) fh.close()

Dumping multiple objects (II) fhh = open('asciifile.txt', 'rb') lists = [ ] # initializing list of lists while 1 : # means forever try: lists.append(pickle.load(fhh)) except EOFError : break fhh.close() print(lists)

Dumping multiple objects (III) Note the way we test for end-of-file ( EOF ) – while 1 : # means forever try: lists.append(pickle.load(fhh)) except EOFError : break

The output [1, 2] [1, 2, 3] [1, 2, 3, 4] [[1, 2], [1, 2, 3], [1, 2, 3, 4]]

What is inside asciifile.txt? (lp0L1LaL2La.(lp0L1LaL2LaL3La.(lp0L1LaL2L aL3LaL4La.

Practical considerations You rarely pick the format of your input files – May have to do format conversion You often have to use specific formats for you output files – Often dictated by program that will use them Otherwise stick with pickled files !