CSc 110, Spring 2017 Lecture 20: Lists for Tallying; Text Processing

Slides:



Advertisements
Similar presentations
1 Various Methods of Populating Arrays Randomly generated integers.
Advertisements

Building Java Programs Chapter 7 Arrays. 2 Can we solve this problem? Consider the following program (input underlined): How many days' temperatures?
Copyright 2008 by Pearson Education Building Java Programs Chapter 7 Lecture 7-2: Tallying and Traversing Arrays reading: 7.1 self-checks: #1-9 videos:
Multi-Dimensional Arrays Rectangular & Jagged Plus: More 1D traversal.
Copyright 2008 by Pearson Education Building Java Programs Chapter 7 Lecture 7-2: Tallying and Traversing Arrays reading: 7.1 self-checks: #1-9 videos:
Copyright 2010 by Pearson Education Building Java Programs Chapter 7 Lecture 7-3: Arrays for Tallying; Text Processing reading: 4.3, 7.6.
1 Reference semantics. 2 A swap method? Does the following swap method work? Why or why not? public static void main(String[] args) { int a = 7; int b.
Building Java Programs Chapter 7 Arrays Copyright (c) Pearson All rights reserved.
Building Java Programs Chapter 7 Arrays Copyright (c) Pearson All rights reserved.
1 BUILDING JAVA PROGRAMS CHAPTER 7 LECTURE 7-3: ARRAYS AS PARAMETERS; FILE OUTPUT.
Copyright 2008 by Pearson Education Building Java Programs Chapter 7 Lecture 7-3: Arrays as Parameters; File Output reading: 7.1, 4.3, 3.3 self-checks:
Building Java Programs Chapter 7 Lecture 7-3: Arrays for Tallying; Text Processing reading: 7.6, 4.3.
1 Building Java Programs Chapter 7: Arrays These lecture notes are copyright (C) Marty Stepp and Stuart Reges, They may not be rehosted, sold, or.
Building Java Programs Chapter 7
Copyright 2010 by Pearson Education Building Java Programs Chapter 7 Lecture 7-3: Arrays for Tallying; Text Processing reading: 7.1, 4.4 self-checks: #1-9.
Copyright 2010 by Pearson Education Building Java Programs Chapter 7 Lecture 17: Arrays for Tallying; Text Processing reading: 4.3, 7.6 (Slides adapted.
Topic 23 arrays - part 3 (tallying, text processing) Copyright Pearson Education, 2010 Based on slides bu Marty Stepp and Stuart Reges from
Copyright 2008 by Pearson Education Building Java Programs Chapter 7 Lecture 7-3: Arrays as Parameters; File Output reading: 7.1, 4.3, 3.3 self-checks:
Building Java Programs Chapter 7 Arrays Copyright (c) Pearson All rights reserved.
CSc 110, Autumn 2016 Lecture 26: Sets and Dictionaries
CSc 110, Autumn 2017 Lecture 19: While loops and File Input
Adapted from slides by Marty Stepp and Stuart Reges
CSc 110, Autumn 2016 Lecture 27: Sets and Dictionaries
CSc 110, Autumn 2017 Lecture 24: Lists for Tallying; Text Processing
CSc 110, Autumn 2017 Lecture 5: The for Loop and user input
CSc 110, Spring 2018 Lecture 32: Sets and Dictionaries
CSc 110, Autumn 2017 Lecture 31: Dictionaries
CSc 110, Autumn 2017 Lecture 30: Sets and Dictionaries
CSc 110, Spring 2017 Lecture 8: input; if/else
Building Java Programs
Building Java Programs
Adapted from slides by Marty Stepp and Stuart Reges
CSc 110, Spring 2018 Lecture 33: Dictionaries
Adapted from slides by Marty Stepp, Stuart Reges & Allison Obourn.
CSc 110, Spring 2017 Lecture 9: Advanced if/else; Cumulative sum
Building Java Programs
CSc 110, Spring 2018 Lecture 9: Parameters, Graphics and Random
CSc 110, Autumn 2016 Lecture 19: lists as Parameters
Adapted from slides by Marty Stepp and Stuart Reges
Lecture 15: lists Adapted from slides by Marty Stepp and Stuart Reges
CSc 110, Spring 2017 Lecture 17: Line-Based File Input
Lecture 7-3: Arrays for Tallying; Text Processing
Topic 23 arrays - part 3 (tallying, text processing)
Building Java Programs
Why did the programmer quit his job?
CSc 110, Autumn 2016 Lecture 20: Lists for Tallying; Text Processing
CSc 110, Autumn 2016 Lecture 10: Advanced if/else; Cumulative sum
Adapted from slides by Marty Stepp and Stuart Reges
Adapted from slides by Marty Stepp and Stuart Reges
Building Java Programs Chapter 7
Adapted from slides by Marty Stepp and Stuart Reges
CSc 110, Spring 2018 Lecture 5: The for Loop and user input
Building Java Programs
Adapted from slides by Marty Stepp and Stuart Reges
Lecture 19: lists Adapted from slides by Marty Stepp and Stuart Reges
CSc 110, Spring 2018 Lecture 5: Loop Figures and Constants
Building Java Programs
Building Java Programs
Adapted from slides by Marty Stepp and Stuart Reges
Adapted from slides by Marty Stepp and Stuart Reges
Building Java Programs
Building Java Programs
Building Java Programs
Building Java Programs
Building Java Programs
Building Java Programs
Building Java Programs
CSc 110, Autumn 2016 Lecture 20: Lists for Tallying; Text Processing
A counting problem Problem: Write a method mostFrequentDigit that returns the digit value that occurs most frequently in a number. Example: The number.
Building Java Programs
Presentation transcript:

CSc 110, Spring 2017 Lecture 20: Lists for Tallying; Text Processing Adapted from slides by Marty Stepp and Stuart Reges

Value/Reference Semantics - Review Variables of type int, float, boolean, store values directly: Values are copied from one variable to another: cats = age Variables of other types (like lists) store references to memory: References are copied from one variable to another: scores = grades age 20 cats 3 age 20 cats 20 index 1 2 value 89 78 93 grades scores

Lists for Tallying

Extracting digits Given a number, how do we extract the digits one at a time? Ex: 590823 Hint: use % and // >>> n = 590823 >>> n % 10 3 >>> n = n // 10 >>> n 59082 2 5908 8 >>>

A tallying problem Problem: Write a function most_frequent_digit(n) that returns the digit of a number n that occurs most frequently. Example: the number 669260267 contains: one 0, two 2s, four 6s, one 7, and one 9. most_frequent_digit(669260267) returns 6. If there is a tie, return the digit with the lower value. most_frequent_digit(57135203) returns 3.

A tallying problem This is well-suited for a list. Note that there are 10 digits. Consider a list of 10 elements. The value at index i holds the number of occurrences of digit i Example for 669260267: index 1 2 3 4 5 6 7 8 9 value

Creating a list of tallies # assume n = 669260267 counts = [0] * 10 while (n > 0): # pluck off a digit and add to its counter digit = n % 10 counts[digit] = counts[digit] + 1 n = n // 10 index 1 2 3 4 5 6 7 8 9 value

Tally solution # Returns the digit value that occurs most frequently in n. # Breaks ties by choosing the smaller value. def most_frequent_digit(n): counts = [0] * 10 while (n > 0): digit = n % 10 # pluck off a digit and tally it counts[digit] = count[digit] + 1 n = n // 10 # find the most frequently occurring digit best_index = 0 for i in range(1, len(counts)): if (counts[i] > counts[best_index]): best_index = i return best_index

Data transformations In many problems we transform data between forms. Example: digits  count of each digit  most frequent digit A transformation is computed/stored as a list. Sometimes we map between data and list indexes. tally (if digit is i, store its count at index i ) The problem structure affects the mapping

Section attendance question Read a file of section attendance (see next slide for structure): And produce the following output: Section 1 Student points: [20, 16, 17, 14, 11] Student grades: [100.0, 80.0, 85.0, 70.0, 55.0] Section 2 Student points: [16, 19, 14, 14, 8] Student grades: [80.0, 95.0, 70.0, 70.0, 40.0] Section 3 Student points: [16, 15, 16, 18, 14] Student grades: [80.0, 75.0, 80.0, 90.0, 70.0] Students earn 3 points for each section attended up to 20.

Section input file student 123451234512345123451234512345123451234512345 Each line represents 9 weeks of attendance data for a section. Each week has 5 characters because there are 5 students in all sections. Within each week, each character represents one student's attendance: a means the student was absent (+0 points) n means they attended but didn't do the problems (+2 points) y means they attended and did the problems (+3 points) week 1 2 3 4 5 6 7 8 9 section 1 section 2 section 3 yynyyynayayynyyyayanyyyaynayyayyanayyyanyayna ayyanyyyyayanaayyanayyyananayayaynyayayynynya yyayaynyyayyanynnyyyayyanayaynannnyyayyayayny

Section input file (fragment) Look at 2 weeks of one section (one line of the file): For index i, a student is i % 5. Student 1 Student 2 Student 3 Student 4 Student 5 0 % 5 1 % 5 2 % 5 3 % 5 4 % 5 5 % 5 6 % 5 7 % 5 8 % 5 9 % 5 Need a list of length 5 to calculate the cumulative points for each student student index 1 2 3 4 5 6 7 8 9 value y n a

Get the points for each student of a section #Computes points earned for each student in a particular section. def count_points(line): points = [0] * 5 for i in range(0, len(line)): student = i % 5 earned = 0 if (line[i] == 'y'): # values are 'y', 'n' or 'a' earned = 3 elif (line[i] == 'n'): earned = 2 points[student] = points[student] + earned) return points Note: fix the code to cap the points earned

Section Attendance - Answer # This program reads a file representing which students attended # which discussion sections and produces output of the students' # section attendance and scores. def main(): file = open("sections.txt") lines = file.readlines() section = 1 for line in lines: # process one section points = count_points(line) grades = compute_grades(points) results(section, points, grades) section += 1 # Produces all output about a particular section. def results(section, points, grades): print("Section " + str(section)) print("Student scores: " + str(points)) print("Student grades: " + str(grades)) print() ...

Section Attendance - answer ... # Computes the points earned for each student for a particular section. def count_points(line): points = [0] * 5 for i in range(0, len(line)): student = i % 5 earned = 0 if (line[i] == 'y'): #values are 'y', 'n' or 'a' earned = 3 elif (line[i] == 'n'): earned = 2 points[student] = min(20, points[student] + earned) return points # Computes the percentage for each student for a particular section. def compute_grades(points): grades = [0] * 5 for i in range(0, len(points)): grades[i] = 100.0 * points[i] / 20 return grades