CSc 110, Spring 2017 Lecture 18: Line-Based File Input

Slides:



Advertisements
Similar presentations
CHAPTER 11 FILE INPUT & OUTPUT Introduction to Computer Science Using Ruby (c) 2012 Ophir Frieder et al.
Advertisements

Unit 6 File processing Special thanks to Roy McElmurry, John Kurkowski, Scott Shawcroft, Ryan Tucker, Paul Beck for their work. Except where otherwise.
Copyright 2006 by Pearson Education 1 Building Java Programs Chapter 6: File Processing.
INTRODUCTION TO PYTHON PART 4 – TEXT AND FILE PROCESSING CSC482 Introduction to Text Analytics Thomas Tiahrt, MA, PhD.
Topic 20 more file processing Copyright Pearson Education, 2010 Based on slides bu Marty Stepp and Stuart Reges from
Copyright © 2010 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design Second Edition by Tony Gaddis.
General Programming Introduction to Computing Science and Programming I.
Lecture 06 – Reading and Writing Text Files.  At the end of this lecture, students should be able to:  Read text files  Write text files  Example.
Week 7 Review and file processing. >>> Methods Hello.java public class Hello { public static void main(String[] args){ hello(); } public static void hello(){
Unit 6 File processing Special thanks to Roy McElmurry, John Kurkowski, Scott Shawcroft, Ryan Tucker, Paul Beck for their work. Except where otherwise.
Copyright 2008 by Pearson Education Building Java Programs Chapter 6: File Processing Lecture 6-2: Advanced file input reading: self-check: #7-11.
1 Reminder: String Scanners A Scanner can be constructed to tokenize a particular String, such as one line of an input file. Scanner = new Scanner( );
File Processing Copyright Pearson Education, 2010 Based on slides bu Marty Stepp and Stuart Reges from "We can only.
Copyright 2008 by Pearson Education Building Java Programs Chapter 6 Lecture 6-3: Searching Files reading: 6.3, 6.5.
11/13/07. >>> Overview * lists/arrays * reading files * writing files.
CSc 110, Autumn 2016 Lecture 26: Sets and Dictionaries
Thanks to Atif Memon from UMD for disaster examples
Introduction to Computing Science and Programming I
CSc 110, Autumn 2017 Lecture 16: Fencepost Loops and Review
CSc 110, Autumn 2017 Lecture 17: while Loops and Sentinel Loops
CSc 110, Autumn 2017 Lecture 19: While loops and File Input
Thanks to Atif Memon from UMD for disaster examples
CSc 110, Autumn 2017 Lecture 20: Line-Based File Input
CSc 110, Spring 2017 Lecture 29: Sets and Dictionaries
CSc 110, Autumn 2017 Lecture 21: Line-Based File Input
Adapted from slides by Marty Stepp and Stuart Reges
CSc 120 Introduction to Computer Programing II
Adapted from slides by Marty Stepp and Stuart Reges
Hours question Given a file hours.txt with the following contents:
CSc 110, Autumn 2017 Lecture 24: Lists for Tallying; Text Processing
CSc 110, Autumn 2017 Lecture 18: While loops and File Input
Vi Editor.
Line-based file processing
CSc 110, Spring 2017 Lecture 28: Sets and Dictionaries
CSc 110, Spring 2018 Lecture 32: Sets and Dictionaries
CSc 110, Autumn 2017 Lecture 31: Dictionaries
CSc 110, Autumn 2017 Lecture 30: Sets and Dictionaries
Topic 20 more file processing
Class 9 Reading and writing to files chr, ord and Unicode
CSc 110, Autumn 2016 Lecture 12: while Loops, Fencepost Loops, and Sentinel Loops Adapted from slides by Marty Stepp and Stuart Reges.
CSc 110, Spring 2017 Lecture 11: while Loops, Fencepost Loops, and Sentinel Loops Adapted from slides by Marty Stepp and Stuart Reges.
while loops; file I/O; introduction to lists
CSc 110, Autumn 2016 Lecture 16: File Input.
Building Java Programs
CSc 110, Spring 2018 Lecture 33: Dictionaries
Using files Taken from notes by Dr. Neil Moore
CSc 110, Spring 2018 Lecture 16: Fencepost Loops and while loops
Building Java Programs
Adapted from slides by Marty Stepp and Stuart Reges
Building Java Programs
CSc 110, Autumn 2016 Lecture 5: Loop Figures and Constants
Week 7 Lists Special thanks to Roy McElmurry, John Kurkowski, Scott Shawcroft, Ryan Tucker, Paul Beck for their work. Except where otherwise noted, this.
CSc 110, Spring 2017 Lecture 17: Line-Based File Input
CSc 110, Spring 2017 Lecture 4: Nested Loops and Loop Figures
Adapted from slides by Marty Stepp and Stuart Reges
Adapted from slides by Marty Stepp and Stuart Reges
Lecture 23: Tuples Adapted from slides by Marty Stepp and Stuart Reges
CSc 110, Spring 2018 Lecture 7: input and Constants
CSc 110, Autumn 2016 Lecture 17: Line-Based File Input
CSc 110, Spring 2018 Lecture 22: Line-Based File Input
CSc 110, Spring 2017 Lecture 20: Lists for Tallying; Text Processing
Reading and Writing Files
CSc 110, Spring 2018 Lecture 5: Loop Figures and Constants
CSc 110, Spring 2018 Lecture 27: Lists that change size and File Output Adapted from slides by Marty Stepp and Stuart Reges.
Chapters 6 and 7 Line-Based File Input, Arrays reading: , 7.1
CSc 110, Spring 2018 Lecture 21: Line-Based File Input
CSc 110, Autumn 2016 Programming feel like that?
Building Java Programs
Adapted from slides by Marty Stepp and Stuart Reges
Building Java Programs
Presentation transcript:

CSc 110, Spring 2017 Lecture 18: Line-Based File Input Adapted from slides by Marty Stepp and Stuart Reges Programming feel like that?

IMDb movies problem Consider the following Internet Movie Database (IMDb) data: 1 9.1 196376 The Shawshank Redemption (1994) 2 9.0 139085 The Godfather: Part II (1974) 3 8.8 81507 Casablanca (1942) Write a program that displays any movies containing a phrase: Search word? part Rank Votes Rating Title 2 139085 9.0 The Godfather: Part II (1974) 40 129172 8.5 The Departed (2006) 95 20401 8.2 The Apartment (1960) 192 30587 8.0 Spartacus (1960) 4 matches Note: the file is imdb.txt

Pseudocode ask the user for a search word open the IMDb data file create a list of the files contents if search word is in the list of files contents print the header for the output set matches counter for each line in the list if the search word is in the line increment the matches counter print the line in the proper format print the number of matches 1 9.1 196376 The Shawshank Redemption (1994 2 9.0 139085 The Godfather: Part II (1974) 3 8.8 81507 Casablanca (1942) Rank Votes Rating Title 2 139085 9.0 The Godfather: …

Functions for imdb Start with these: get_phrase() - use input() as usual - returns a string search_list(line_list, search_word) - searches line_list to find search_word - if finds search_word, returns the line - if no match found, returns an empty string We want to put in debugging prints statements We want to know the structure of imdb.txt (Let's take a look at it before writing the code.)

Remaining functions for imdb search_line(line, search_word) - searches string line to find search_word - if finds search_word, returns the line - if no match found, returns an empty string

Remaining functions for imdb display(line) - line is a string A string in line looks like this: '2 9.0 139085 The Godfather: Part II (1974)' The structure of the output is: Rank Votes Rating Title 2 139085 9.0 The Godfather: Part II (1974) What does the method split() do? -creates a list from a string -by default uses whitespace (thiink spaces) to delimit the elements

Helpful strategies Use print statements for debugging Make your input file small to start with Know the structure of your input file Review the methods and functions for lists and strings

IMDb main # Displays IMDB's Top 250 movies that match a search string. def main(): search_word = get_phrase() file = open("imdb.txt") line_list = file.readlines() line = search_list(line_list, search_word) if (len(line) > 0): print("Rank\tVotes\tRating\tTitle") matches = 0 for a_line in line_list: ans = search_line(a_line, search_word) if (len(ans) > 0): matches = matches + 1 display(a_line) print(str(matches) + " matches.") # Asks the user for their search word and returns it. def get_phrase(): search_word = input("Search word: ") search_word = search_word.lower() print() return search_word ...

IMDb functions ... # Searches a file for a line that match the search word. def search_list(line_list, search_word): for line in line_list: line_lower = line.lower() # case-insensitive match if (search_word in line_lower): return line return "" # not found # Looks for the search word in a single line. def search_line(line, search_word): line_lower = line.lower() # case-insensitive match # displays the line in the proper format on the screen. def display(line): parts = line.split() rank = parts[0] rating = parts[1] votes = parts[2] title = "" for i in range(3, len(parts)): title = title + parts[i] + " " # the rest of the line print(rank + "\t" + votes + "\t" + rating + "\t" + title)

"Chaining" main should be a concise summary of your program. It is bad if each function calls the next in a nested structure (we call this chaining): A better structure has main make most of the calls. Functions must return values to main to be passed on later. main functionA functionB functionC functionD main functionA functionB functionD

File output 11

Output to files Open a file in write or append mode 'w' - write mode – replaces everything in the file 'a' – append mode – adds to the bottom of the file preserving what is already in it name = open("filename", "w") # write name = open("filename", "a") # append

Output to files name.write(str) - writes the given string to the file name.close() - closes file once writing is done Example: out = open("output.txt", "w") out.write("Hello, world!\n") out.write("How are you?") out.close() text = open("output.txt").read() # Hello, world!\nHow are you? common PrintStream bug: - declaring it in a method that gets called many times. This causes the file to be re-opened and wipes the past contents. So only the last line shows up in the file.

Removing short words Write a program that reads a file, writes the file contents , and then creates a new file containing only the words greater than 3 characters.

Removing short words def main(): file = open("poem.txt") text = file.read() print(text) outfile = open("nosmallwords.txt", "w") text = text.split() for word in text: if len(word) > 3: print(word) outfile.write(word) outfile.close() main()