Download presentation
Presentation is loading. Please wait.
1
COSC 1306 COMPUTER SCIENCE AND PROGRAMMING
Jehan-François Pâris Fall 2016 1 1
2
THE ONLINE BOOK CHAPTER XI FILES
3
Chapter Overview We will learn how to read, create and modify files
Essential if we want to store our program inputs and results. Pay special attention to pickled files They are very easy to use!
4
Accessing file contents
Two step process: First we open the file Then we access its contents read write When we are done, we close the file
5
What happens at open() time?
The system verifies That you are an authorized user That you have the right permission Read permission Write permission Execute permission exists but doesn’t apply and returns a file handle /file descriptor
6
The file handle Gives the user Fast direct access to the file
No folder lookups Authority to execute the file operations whose permissions have been requested
7
Python open() open(name, mode = 'r', buffering = -1) where
name is name of file mode is permission requested Default is 'r' for read only buffering specifies the buffer size Use system default value (code -1)
8
The modes Can request 'r' for read-only 'w' for write-only
Always overwrites the file 'a' for append Writes at the end 'r+' or 'a+' for updating (read + write/append)
9
Examples f1 = open("myfile.txt") same as f1 = open("myfile.txt", "r")
f2 = open("test\\sample.txt", "r") f3 = open("test/sample.txt", "r") f4 = open("C:\\Users\\Jehan-Francois Paris\\Documents\\Courses\\1306\\Python\\myfile.txt")
10
The file system Provides long term storage of information.
Will store data in stable storage (disk) Cannot be RAM because: Dynamic RAM loses its contents when powered off Static RAM is too expensive System crashes can corrupt contents of the main memory
11
Overall organization Data managed by the file system are grouped in user-defined data sets called files The file system must provide a mechanism for naming these data Each file system has its own set of conventions All modern operating systems use a hierarchical directory structure
12
Windows solution Each device and each disk partition is identified by a letter A: and B: were used by the floppy drives C: is the first disk partition of the hard drive If hard drive has no other disk partition, D: denotes the DVD drive Each device and each disk partition has its own hierarchy of folders
13
Windows solution Flash drive F: C: Second disk D: Windows Users
Program Files
14
Linux organization Inherited from Unix
Each device and disk partition has its own directory tree Disk partitions are glued together through the operation to form a single tree Typical user does not know where her files are stored Uses "/" as a separator
15
UNIX/LINUX organization
Root partition / Other partition usr The magic mount bin Second partition can be accessed as /usr
16
Mac OS organization Similar to Windows Disk partitions are not merged
Represented by separate icons on the desktop
17
Accessing a file (I) Your Python programs are stored in a folder AKA directory On my home PC it is C:\Users\Jehan-Francois Paris\Documents\ Courses\1306\Python All files in that folder can be directly accessed through their names "myfile.txt"
18
The root Users J.-F. Paris Documents Courses\1306\Python\x.txt Courses
19
Accessing a file (II) Files in folders inside that folder—subfolders—can be accessed by specifying first the subfolder Windows style: "test\\sample.txt" Note the double backslash Linux/Unix/Mac OS X style: "test/sample.txt" Generally works for Windows
20
Why the double backslash?
The backslash is an escape character in Python Combines with its successor to represent non-printable characters ‘\n’ represents a newline ‘\t’ represents a tab Must use ‘\\’ to represent a plain backslash
21
Accessing a file (III) For other files, must use full pathname
Windows Style: "C:\\Users\\Jehan-Francois Paris\\ Documents\\Courses\\1306\\Python\\ myfile.txt" Linux and Mac: "/Users/Jehan-Francois Paris/ Documents/Courses/1306/Python/ myfile.txt"
22
Reading a file Four ways: Line by line Global reads
Within a while loop Also works with other languages Pickled files
23
Line-by-line reads for line in fh : # special for loop #anything you want fh.close() # optional
24
Example f3 = open("test/sample.txt", "r") for line in f3 : print(line) f3.close() # optional
25
Output To be or not to be that is the question Now is the winter of our discontent With one or more extra blank lines
26
Why? Each line ends with newline print(…) adds an extra newline
27
Trying to remove blank lines
print('-----') f5 = open("test/sample.txt", "r") for line in f5 : print(line[:-1]) # remove last char f5.close() # optional print('------')
28
The output ------ To be or not to be that is the question Now is the winter of our disconten The last line did not end with an newline!
29
A smarter solution (I) Only remove the last character if it is an newline if line[-1] == '\n' : print(line[:-1] else print line
30
A smarter solution (II)
print(' ') fh = open("test/sample.txt", "r") for line in fh : if line[-1] == '\n' : print(line[:-1]) # remove last char else : print(line) print('------') fh.close() # optional
31
It works! ------ To be or not to be that is the question Now is the winter of our discontent
32
We can do better Use the rstrip() Python method
astring.rstrip() remove all trailing spaces from astring astring.rstrip('\n') remove all trailing newlines from astring
33
Examples
34
The simplest solution print(' ') fh = open("test/sample.txt", "r") for line in fh : print(line.rstrip('\n') print('------') fh.close() # optional This will remove all trailing newlines even the ones we should keep
35
Global reads fh.read() Returns whole contents of file specified by file handle fh File contents are stored in a single string that might be very large
36
Example f2 = open("test\\sample.txt", "r") bigstring = f2.read() print(bigstring) f2.close() # optional
37
Output of example To be or not to be that is the question Now is the winter of our discontent Exact contents of file ‘test\sample.txt’ followed by an extra return
38
fh.read() and fh.read(n)
fh.read() reads in the whole fh file and returns its contents as a single string fh.read(n) reads the next n bytes of file fh
39
Reading within a loop Standard method for C/C++
infile = open("test sample.txt", "r") line = infile.readline() # priming read while line : # false if empty print(line.rstrip("\n") line = infile.readline() infile.close()
40
Making sense of file contents
Most files contain more than one data item per line COSC UHPD Must split lines mystring.split(sepchar) where sepchar is a separation character returns a list of items
41
Splitting strings >>> txt = "Four score and seven years ago" >>> txt.split() ['Four', 'score', 'and', 'seven', 'years', 'ago'] >>>record ="1,'Baker, Andy', 83, 89, 85" >>> record.split(',') [' 1', "'Baker", " Andy'", ' 83', ' 89', ' 85'] Not what we wanted!
42
Example # how2split.py print('-----') fh = open("test/sample.txt", "r") for line in fh : words = line.split() for xxx in words : print(xxx) fh.close() # optional
43
Output ----- To be … of our discontent ----- Spurious newlines
are gone
44
Standard way to access a file
# preprocessing # set up counters, strings and lists fh = open("input.txt", "r") for line in fh : words = line.split(sepchar) # often space for xxx in words : # do something fh.close() # optional # postprocessing # print results
45
Example List of expenditures with dates:
Rent 11/2/16 $850 Latte 11/2/16 $4.50 Food 11/2/16 $35.47 Latte 11/3/16 $4.50 Latte 11/3/16 $4.50 Outing 11/4/16 $27.00 Want to know how much money was spent on latte
46
First attempt Read line by line Will split all lines such as
"Food 11/2/16 $35.47" into ["Food", "11/2/16", "$35.47"] Will use first and last entries of each linelist
47
First attempt total = 0 # set up accumulator
fh = open("expenses.txt", "r") for line in fh : words = line.split(" ") if words[0] == 'Latte' : total += words[2] # increment fh.close() # optional print("you spent %.2f on latte" % total) It does not work!
48
Second attempt Must first remove the offending '$'
Must also convert string to float def price2float(s) : """ remove leading dollar sign""" if s[0] == "$" : returns float(s[1:]) else : return float(s)
49
Second attempt total = 0 # set up accumulator fh = open("expenses.txt", "r") for line in fh : words = line.split(" ") if words[0] == 'Latte' : total += price2float(words[2]) fh.close() # optional print("You spent $%.2f on latte" % total) You spent $13.50 on latte
50
Picking the right separator (I)
Commas CSV Excel format Values are separated by commas Strings are stored without quotes Unless they contain a comma “Doe, Jane”, freshman, 90, 90 Quotes within strings are doubled
51
Picking the right separator (II)
Tabs( ‘\t’) Advantages: Your fields will appear nicely aligned Spaces, commas, … are not an issue Disadvantage: You do not see them They look like spaces
52
Why it is important When you must pick your file format, you should decide how the data inside the file will be used: People will read them Other programs will use them Will be used by people and machines
53
An exercise Converting tab-separated data to CSV format
Replacing tabs by commas Easy Will use string replace function
54
Possible input lines Alice Doe,Jane Doe,John Kingsman,Edward "Ted"
55
First attempt fh_in = open('grades.txt', 'r') buffer = fh_in.read() newbuffer = buffer.replace('\t', ',') fh_out = open('grades0.csv', 'w') fh_out.write(newbuffer) fh_in.close() fh_out.close() print('Done!')
56
The output Alice Bob Carol becomes Alice,90,90,90,90,90 Bob,85,85,85,85,85 Carol,75,75,75,75,75
57
Dealing with commas (I)
Work line by line For each line split input into fields using TAB as separator store fields into a list Alice becomes [‘Alice’, ’90’, ’90’, ’90’, ’90’, ’90’]
58
Dealing with commas (II)
Put within double quotes any entry containing one or more commas Output list entries separated by commas ['"Baker, Ann"', 90, 90, 90, 90, 90] which will become later "Baker, Ann",90,90,90,90,90
59
Dealing with commas (III)
Our troubles are not over: Must store somewhere all lines until we are done Store them in a list
60
Dealing with double quotes
Before wrapping items with commas with double quotes replace All double quotes by pairs of double quotes 'Aguirre, "Lalo" Eduardo' becomes 'Aguirre, ""Lalo"" Eduardo' then '"Aguirre, ""Lalo"" Eduardo"'
61
Order matters (I) We must double the inside double quotes before wrapping the string into double quotes; From 'Aguirre, "Lalo" Eduardo' go to 'Aguirre, ""Lalo"" Eduardo' then to '"Aguirre, ""Lalo"" Eduardo"'
62
Order matters (II) Otherwise;
We go from 'Aguirre, "Lalo" Eduardo' to '"Aguirre, "Lalo" Eduardo"' then to '""Aguirre, ""Lalo"" Eduardo""' with all double quotes doubled
63
General organization (I)
linelist = [] # new file contents for line in file : itemlist = line.split(…) linestring = '' # start with empty line for item in itemlist : remove any trailing newline double all double quotes if item contains comma, wrap add to linestring append linestring to stringlist
64
General organization (II)
for line in file … remove last comma of linestring add newline at end of linestring append linestring to stringlist for linestring in in stringline write linestring into output file
65
The program (I) # betterconvert2csv.py
""" Convert tab-separated file to csv """ fh = open('grades.txt','r') #input file linelist = [ ] # global data structure for line in fh : # process an input line itemlist = line.split('\t') # print(str(itemlist)) # for debugging linestring = '' # start afresh
66
The program (II) for item in itemlist : # process item
# double all double quotes item = item.replace('"','""') if item[-1] == '\n' : # remove it item = item[:-1] if ',' in item : # wrap item linestring += '"' + item +'"' # just append linestring += item +',' # end of item loop
67
The program (III) # replace last comma by newline linestring = linestring[:-1] + '\n' linelist.append(linestring) # end of line loop fh.close() fhh = open('great.csv', 'w') for line in linelist : fhh.write(line) fhh.close()
68
Notes Most print statements used for debugging were removed
Space considerations Observe that the inner loop adds a comma after each item Wanted to remove the last one Must also add a newline at end of each line
69
The input file Alice Bob Carol Doe, Jane Fulano, Eduardo "Lalo"
70
The output file Alice,90,90,90,90,90 Bob,85,85,85,85,85 Carol ,75,75,75,75,75 "Doe, Jane",90,90,90 ,80 ,75 "Fulano, Eduardo ""Lalo""",90,90,90,90
71
Mistakes being made (I)
Mixing lists and strings: Earlier draft of program declared linestring = [ ] and did linestring.append(item) Outcome was ['Alice,', '90,'. … ] instead of 'Alice,90, …'
72
Mistakes being made (II)
Forgetting to add a newline Output was a single line Doing the append inside the inner loop: Output was Alice,90 Alice,90,90 Alice,90,90,90 …
73
Mistakes being made Forgetting that strings are immutable:
Trying to do linestring[-1] = '\n' instead of linestring = linestring[:-1] + '\n' Bigger issue: Do we have to remove the last comma?
74
Could we have done better? (I)
Make the program more readable by decomposing it into functions A function to process each line of input do_line(line) Input is a string ending with newline Output is a string in CSV format Should call a function processing individual items
75
Could we have done better? (II)
A function to process individual items do_item(item) Input is a string Returns a string With double quotes "doubled" Without a newline Within quotes if it contains a comma
76
The new program (I) def do_item(item) : item = item.replace('"','""') if item[-1] == '\n' : item = item[:-1] if ',' in item : item ='"' + item +'"' return item
77
The new program (II) def do_line(line) : itemlist = line.split('\t') linestring = '' # start afresh for item in itemlist : linestring += do_item(item) +',' if linestring != '' and linestring[-1] == ',' : linestring = linestring [:-1] linestring += '\n' return linestring
78
The new program (III) fh = open('grades.txt','r') linelist = [ ] for line in fh : linelist.append(do_line(line)) fh.close() fhh = open('great.csv', 'w') for line in linelist : fhh.write(line) fhh.close()
79
Why it is better Program is decomposed into small modules that are much easier to understand Each fits on a PowerPoint slide
80
The break statement Makes the program exit the loop it is in
In next example, we are looking for first instance of a string in a file Can exit as soon it is found
81
Example (I) searchStr= input('Enter search string:') found = False fh = open('grades.txt') for line in fh : if searchStr in line : print(line) found = True break
82
Example (II) if found == True : print("String %s was found" %
searchStr) else : print("String %s NOT found " % searchStr)
83
Flags A variable like found That can either be True or False
That is used in a condition for an if or a while is often referred to as a flag
84
PICKLED FILES (NOT ON THE QUIZ)
85
Pickled files import pickle
Provides a way to save complex data structures in a file Sometimes said to provide a serialized representation of Python objects
86
Basic primitives (I) dump(object,fh) appends a sequential representation of object into file with file handle fh object is virtually any Python object fh is the handle of a file that must have been opened in 'wb' mode b is a special option allowing to write or read binary data
87
Basic primitives (II) target = load( filehandle)
assigns to target next pickled object stored in file filehandle target is virtually any Python object filehandle is the filehandle of a file that was opened in rb mode
88
Example (I) >>> mylist = [ 2, 'Apples', 5, 'Oranges']
>>> fh = open('afile', 'wb') # b = BINARY >>> import pickle >>> pickle.dump(mylist, fh) >>> fh.close()
89
Example (II) >>> fhh = open('afile', 'rb') # b = BINARY
>>> theirlist = pickle.load(fhh) >>> theirlist [2, 'Apples', 5, 'Oranges'] >>> theirlist == mylist True
90
What was stored in testfile?
Some binary data containing the strings 'Apples' and 'Oranges'
91
Using ASCII format Can require a pickled representation of objects that only contains printable characters Must specify protocol = 0 Advantage: Easier to debug Disadvantage: Takes more space
92
Example import pickle mydict = {'Alice': 22, 'Bob' : 27} fh = open('asciifile.txt', 'wb') pickle.dump(mydict, fh, protocol = 0) fh.close() fhh = open('asciifile.txt', 'rb') theirdict = pickle.load(fhh) print(mydict) print(theirdict)
93
The output {'Bob': 27, 'Alice': 22} {'Bob': 27, 'Alice': 22}
94
What is inside asciifile.txt?
(dp0VBobp1L27LsVAlicep2L22Ls.
95
Dumping multiple objects (I)
import pickle fh = open('asciifile.txt', 'wb') for k in range(3, 6) : mylist = [i for i in range(1,k)] print(mylist) pickle.dump(mylist, fh, protocol = 0) fh.close()
96
Dumping multiple objects (II)
fhh = open('asciifile.txt', 'rb') lists = [ ] # initializing list of lists while 1 : # means forever try: lists.append(pickle.load(fhh)) except EOFError : break fhh.close() print(lists)
97
Dumping multiple objects (III)
Note the way we test for end-of-file (EOF) while 1 : # means forever try: lists.append(pickle.load(fhh)) except EOFError : break
98
The output [1, 2] [1, 2, 3] [1, 2, 3, 4] [[1, 2], [1, 2, 3], [1, 2, 3, 4]]
99
What is inside asciifile.txt?
(lp0L1LaL2La.(lp0L1LaL2LaL3La.(lp0L1LaL2LaL3LaL4La.
100
Practical considerations
You rarely pick the format of your input files May have to do format conversion You often have to use specific formats for you output files Often dictated by program that will use them Otherwise stick with pickled files!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.