Using files Taken from notes by Dr. Neil Moore

Slides:



Advertisements
Similar presentations
File Management in C. A file is a collection of related data that a computers treats as a single unit. File is a collection of data stored permanently.
Advertisements

Computer Science 111 Fundamentals of Programming I Files.
Files in Python The Basics. Why use Files? Very small amounts of data – just hardcode them into the program A few pieces of data – ask the user to input.
Files in Python Input techniques. Input from a file The type of data you will get from a file is always string or a list of strings. There are two ways.
Chapter 6 AN INTRODUCTION TO FILES AND FILE PROCESSING Dr. Ali Can Takinacı İstanbul Technical University Faculty of Naval Architecture and Ocean Engineering.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
Lists in Python.
CS190/295 Programming in Python for Life Sciences: Lecture 3 Instructor: Xiaohui Xie University of California, Irvine.
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
Chapter 9 1 Chapter 9 – Part 1 l Overview of Streams and File I/O l Text File I/O l Binary File I/O l File Objects and File Names Streams and File I/O.
Course A201: Introduction to Programming 12/9/2010.
C++ for Engineers and Scientists Second Edition Chapter 8 I/O File Streams and Data Files.
FILE HANDLING IN C++.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley STARTING OUT WITH Python Python First Edition by Tony Gaddis Chapter 7 Files.
Files in Python Buffers. Why a buffer? Computer equipment runs at different speeds, the hard drive and secondary storage in general is MUCH slower than.
Fall 2002CS 150: Intro. to Computing1 Streams and File I/O (That is, Input/Output) OR How you read data from files and write data to files.
Files Tutor: You will need ….
1 Printing in Python Every program needs to do some output This is usually to the screen (shell window) Later we’ll see graphics windows and external files.
FILES. open() The open() function takes a filename and path as input and returns a file object. file object = open(file_name [, access_mode][, buffering])
C Programming Day 2. 2 Copyright © 2005, Infosys Technologies Ltd ER/CORP/CRS/LA07/003 Version No. 1.0 Union –mechanism to create user defined data types.
1 Agenda  Unit 7: Introduction to Programming Using JavaScript T. Jumana Abu Shmais – AOU - Riyadh.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
FILE I/O: Low-level 1. The Big Picture 2 Low-Level, cont. Some files are mixed format that are not readable by high- level functions such as xlsread()
Next Week… Quiz 2 next week: –All Python –Up to this Friday’s lecture: Expressions Console I/O Conditionals while Loops Assignment 2 (due Feb. 12) topics:
ENGINEERING 1D04 Tutorial 2. What we’re doing today More on Strings String input Strings as lists String indexing Slice Concatenation and Repetition len()
String and Lists Dr. José M. Reyes Álamo.
List Algorithms Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Introduction to Python
File I/O File input/output Iterate through a file using for
Taken from notes by Dr. Neil Moore & Dr. Debby Keen
CS 115 Lecture 8 Structured Programming; for loops
Repeating code We could repeat code we need more than once: i = 1 print (i) i += 1 print (i) #… stop when i == 9 But each line means an extra line we might.
Strings Part 1 Taken from notes by Dr. Neil Moore
Exceptions and files Taken from notes by Dr. Neil Moore
Sentinel logic, flags, break Taken from notes by Dr. Neil Moore
Class 9 Reading and writing to files chr, ord and Unicode
Lists Part 1 Taken from notes by Dr. Neil Moore & Dr. Debby Keen
File Handling Programming Guides.
Topics Introduction to File Input and Output
Chapter 7 Files and Exceptions
Sentinel logic, flags, break Taken from notes by Dr. Neil Moore
List Algorithms Taken from notes by Dr. Neil Moore
Beginning C Lecture 11 Lecturer: Dr. Zhao Qinpei
Fundamentals of Programming I Files
File I/O File input/output Iterate through a file using for
Exceptions and files Taken from notes by Dr. Neil Moore
CISC101 Reminders Quiz 2 graded. Assn 2 sample solution is posted.
T. Jumana Abu Shmais – AOU - Riyadh
Fundamentals of Data Structures
File I/O in C Lecture 7 Narrator: Lecture 7: File I/O in C.
String and Lists Dr. José M. Reyes Álamo.
File Input and Output.
Lists in Python Creating lists.
CS 1111 Introduction to Programming Fall 2018
Lists Part 1 Taken from notes by Dr. Neil Moore
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Files Handling In today’s lesson we will look at:
Fundamentals of Python: First Programs
CS190/295 Programming in Python for Life Sciences: Lecture 3
Recursion Taken from notes by Dr. Neil Moore
CISC101 Reminders All assignments are now posted.
For loops Taken from notes by Dr. Neil Moore
Topics Introduction to File Input and Output
Winter 2019 CISC101 4/29/2019 CISC101 Reminders
“Everything Else”.
Topics Introduction to File Input and Output
CS 1111 Introduction to Programming Spring 2019
Strings Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Introduction to Computer Science
Presentation transcript:

Using files Taken from notes by Dr. Neil Moore CS 115 Lecture Using files Taken from notes by Dr. Neil Moore

Sequential and random access Sequential access: reading or writing the file in order starting from the beginning and going to the end Similar to how a for loop works on a list or string Read the first line, then the second line, etc. No skipping, no backing up! Random access: reading or writing without regard to order “Go to byte 7563 and put a 7 there” Like lists: we can use mylist[5] without having to go through indexes 0 through 4 first Also called direct access

Sequential and random access Random access does not work that well with text files Bytes don’t always match up with characters And they definitely don’t match up with lines. At what byte number does line 10 start? You’d have to go through the lines sequentially and count Text files usually use sequential access Binary files can use either sequential or random access

Reading a line at a time Here’s one way to read a line at a time for line in file: This technique is very useful but a little inflexible: It only lets us use one line per iteration But what if our file looked like this? Student 1 Score 1 … The readline method lets you control reading more precisely. line = infile.readline()

Readline line = infile.readline() This reads a single line from the file Returns a string, including the newline at the end The next time you do input, you get the next line Even if you use a different input technique next time readfile-readline.py

Buffers and the file pointer When you open a file in a Python program, Python and the OS set up a buffer for that file. It’s a piece of memory that holds some of the file’s data before you need it Why? Disks are much much slower than RAM And disks are faster if you read from them in large chunks, not byte by byte. Where have you seen buffers before? YouTube! “Video is buffering..” For the same reason, memory is much faster than the network A keyboard buffer “type-ahead” When you call readline, etc., that gets data from the buffer. If the program asks for data that isn’t in the buffer yet, the OS refills the buffer with new data from the file.

The buffer The buffer holds data our program has already read… … and data it has not read yet How does it tell the difference? A file pointer indicates the beginning of the unread part. Starts out at the beginning of the file (except in append mode) When you do a sequential read, like readline: It starts reading at the location of the file pointer When it sees the newline character, it advances the pointer to the next byte So that every input will get a different line Sequential access means the file pointer does not skip anything and never moves backward.

Reading a whole file at once Python also gives us two methods that read in the whole file at once That’s much easier if we need to process the contents out of order Or if we need to process each line several times But it doesn’t work if the file is larger than RAM. readlines gives us the whole file as a list of strings (note the s on the end of the name) line_list = infile.readlines() infile.close() #close, the file is exhausted for line in line_list: The lines still contain one newline each, at the end After the readlines, there is nothing left to read so might as well close Technically, it gives you the rest of the file, from the file pointer to the end Maybe you read in the first line with readline, then called readlines. readfile-readlines.py

Reading a whole file another way read gives us the whole (rest of the) file as a single string newlines and all content = infile.read() infile.close() As with readlines, you might as well close it immediately What to do with the string? You can use split line_list = content.split(“\n”) Unlike other methods, this gives you a list of strings without newlines Because split removes the delimiter

Splitting the string you get from read Be careful! If the last line in the file ends in a newline character, there will be an extra empty string in the list of lines content = ‘Hello\nWorld\n’ line_list = content.split(‘\n’) gives line_list = [‘Hello’,’World’,’’] Not just in Python: some OSes and programs treat that as a blank line readfile-read.py

Output files It’s possible to write to files too First, open the file for writing: outfile = open(“out.txt”, “w”) # w for write mode If the file does not already exist, it creates it If the file already exists, truncates it to zero bytes!! Cuts everything out of the file to start over The old data in the file is lost forever! Opening for writing is both creative and destructive. You can also open a file for appending: logfile = open(“audit.log”, “a”) # a for append Adds to the end of an existing file – no truncation, no data lost If the file doesn’t exist, will still create it. Which to use? Do you want to add to the file or overwrite it?

Open modes summary Mode Letter File exists File doesn’t exist Read r OK IOError Write w Truncates the file Creates the file Append a OK (adds to the end)

Writing to an output file There are two ways to write to an output file: You can use the same print function that we’ve been using Just give an extra argument at the end, file = fileobj print(“Hello,”, name, file = outfile) You can use all the functionality of print Printing strings, numbers, lists, etc. sep = “ “, end = “ “ You can also write a string with the write method Takes one string argument (nothing else!) outfile.write(“Hello, world!\n”) Does not automatically add a newline character! outfile-write.py

Closing files You should close a file when you are finished with it outfile.close() Frees resources (like the file buffer!) Closing files is even more important for output files The buffer isn’t written to the disk until either it’s full or the file is closed. Until you close the file, the most recent buffer-full is not ‘committed’ to disk And what if your program crashes without closing? Closing allows the file to be marked as “closed” = “not busy” by the OS Ever had a file that had been created during an application that crashed? Lots of times you cannot do anything to the file because Windows says “it’s busy” outfile.py When to close a file? As soon as you know you don’t have to read / write it again Immediately after your loop With read or readlines, immediately after that statement

Files versus lists Files are Lists are Permanent “persistent” Bigger (although must fit on your secondary storage device) Slower to access Sequential-access (as we do it in this class) Lists are Temporary (life of the program) Smaller (must fit in memory) Faster to access Random or sequential access

Example: letter count Let’s see a program to count letters in a file 26 counters, 26 different counters? We don’t want to make 26 different variables if char == “A”: a_count += 1 elif char == “B”: b_count += 1 …. Ugh! Instead we’ll make a list of counters. Initialize: counts = [ 0 ] * 26 That gives you a list of 26 elements which are all zero. Read through each character in the file Find and increment the corresponding counter

Converting characters to numeric codes The last time we heard about ASCII and Unicode They assign a numeric code to each possible character Python has functions to covert between characters and the code ord takes a character and returns its numeric code code = ord(“A”) Argument is a single character as a string, returns an integer chr takes a numeric code and returns the corresponding character char = chr(65) Argument is an integer, returns a one-character string Codes below 32 are control characters (newline, tab, …)

Converting characters to numeric codes We can use these to convert letters into a useful list subscript We want to put the counter for “A” at position 0, the counter for “B” at position 1, etc. So the subscript can be ord(char) – ord(“A”) gives a number from 0 to 25 To convert back: chr(i + ord(“A”)) if i is an integer from 0 to 25 lettercount.py