Design Exercise UW CSE 160 Winter 2017.

Slides:



Advertisements
Similar presentations
Programming for Linguists
Advertisements

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Data Abstraction UW CSE 190p Summer Recap of the Design Exercise You were asked to design a module – a set of related functions. Some of these functions.
Dictionaries Chapter Chapter Contents Specifications for the ADT Dictionary A Java Interface Iterators Using the ADT Dictionary A Directory of Telephone.
Dictionaries Chapter 17 Slides by Steve Armstrong LeTourneau University Longview, TX  2007,  Prentice Hall.
CS324e - Elements of Graphics and Visualization Java Intro / Review.
More Data Abstraction UW CSE 190p Summer Recap of the Design Exercise You were asked to design a module – a set of related functions. Some of these.
Data Abstraction UW CSE 140 Winter What is a program? – A sequence of instructions to achieve some particular purpose What is a library? – A collection.
Design Exercise UW CSE 140 Winter Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions – For.
CSE 143 Lecture 11 Maps Grammars slides created by Alyssa Harding
111 Protocols CS 4311 Wirfs Brock et al., Designing Object-Oriented Software, Prentice Hall, (Chapter 8) Meyer, B., Applying design by contract,
More On Classes UW CSE 160 Spring Classes define objects What are objects we've seen? 2.
CSE 143 Lecture 14 (B) Maps and Grammars reading: 11.3 slides created by Marty Stepp
Design Exercise UW CSE 190p Summer 2012 download examples from the calendar.
Design Exercise UW CSE 160 Spring Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions – For.
12. MODULES Rocky K. C. Chang November 6, 2015 (Based on from Charles Dierbach. Introduction to Computer Science Using Python and William F. Punch and.
Compsci 06/101, Spring This week in Compsci 6/101 l From sets to dictionaries  Incredibly powerful in solving problems  Power in Perl, Python,
Data Abstraction UW CSE 160 Spring What is a program? – A sequence of instructions to achieve some particular purpose What is a library? – A collection.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
More Python Data Structures  Classes ◦ Should have learned in Simpson’s OOP ◦ If not, read chapters in Downey’s Think Python: Think like a Computer Scientist.
CSc 110, Autumn 2016 Lecture 26: Sets and Dictionaries
COMPSCI 107 Computer Science Fundamentals
Topics Dictionaries Sets Serializing Objects. Topics Dictionaries Sets Serializing Objects.
Computing and Data Analysis
CSc 110, Spring 2017 Lecture 29: Sets and Dictionaries
Algorithmic complexity: Speed of algorithms
Wednesday Notecards. Wednesday Notecards Wednesday Notecards.
CSc 110, Autumn 2016 Lecture 27: Sets and Dictionaries
Map Reduce.
Fundamentals of Programming I Design with Functions
Introduction to XHTML.
Product Retrieval Statistics Canada / Statistique Canada Title page
Design Exercise UW CSE 140 Winter 2014.
CSc 110, Spring 2018 Lecture 32: Sets and Dictionaries
CSc 110, Autumn 2017 Lecture 31: Dictionaries
CSc 110, Autumn 2017 Lecture 30: Sets and Dictionaries
MG4J – Managing GigaBytes for Java Introduction
Road Map CS Concepts Data Structures Java Language Java Collections
Introduction to javadoc
Data Abstraction UW CSE 160 Winter 2017.
SQL – Python and Databases (Continued)
Data Abstraction UW CSE 160 Winter 2016.
Topics Introduction to File Input and Output
Haskell Strings and Tuples
CSc 110, Spring 2018 Lecture 33: Dictionaries
Design Exercise UW CSE 160 Spring 2018.
CISC101 Reminders Assn 3 due Friday, this week. Quiz 3 next week.
Maps Grammars Based on slides by Marty Stepp & Alyssa Harding
Data Abstraction UW CSE 160 Spring 2018.
Introduction to Problem Solving and Control Statements
CSE 303 Concepts and Tools for Software Development
Algorithmic complexity: Speed of algorithms
Exercise Write a program that counts the number of unique words in a large text file (say, Moby Dick or the King James Bible). Store the words in a collection.
Michael Ernst UW CSE 140 Winter 2013
Using Voyant to Explore Text Data
CS2110: Software Development Methods
Topics Dictionaries Sets Serializing Objects. Topics Dictionaries Sets Serializing Objects.
Rocky K. C. Chang 15 November 2018 (Based on Dierbach)
Protocols CS 4311 Wirfs Brock et al., Designing Object-Oriented Software, Prentice Hall, (Chapter 8) Meyer, B., Applying design by contract, Computer,
slides created by Marty Stepp
Introduction to javadoc
slides created by Alyssa Harding
Design Exercise UW CSE 160 Winter 2016.
Algorithmic complexity: Speed of algorithms
slides created by Marty Stepp
A Level Computer Science Topic 6: Introducing OOP
Topics Introduction to File Input and Output
Data Abstraction UW CSE 160.
Presentation transcript:

Design Exercise UW CSE 160 Winter 2017

Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions For each function, provide the name of the function a doc string for the function 2) Sketch an implementation of each function In English, describe what the implementation needs to do This will typically be no more than about 4-5 lines per function

Example of high-level “pseudocode” def read_scores(filename) """Read scores from filename and return a dictionary mapping words to scores""" open the file For each line in the file, insert the word and its score into a dictionary called scores return the scores dictionary def compute_total_sentiment(searchterm): """Return the total sentiment for all words in all tweets in the first page of results returned for the search term""" Construct the twitter search url for searchterm Fetch the twitter search results using the url For each tweet in the response, extract the text add up the scores for each word in the text add the score to the total return the total

Exercise 1: Text analysis Design a module for basic text analysis with the following capabilities: Compute the total number of words in a file Find the 10 most frequent words in a file. Find the number of times a given word appears in the file. Also show how to use the interface by computing the top 10 most frequent words in the file testfile.txt

Text Analysis, Version 1 def word_count(filename, word): """Given a filename and a word, return the count of the given word in the given file.""" def top10(filename): """Given a filename, return a list of the top 10 most frequent words in the given file, from most frequent to least frequent.""" def total_words(filename): """Given a filename, return the total number of words in the file.""" # program to compute top 10: result = top10("somedocument.txt")

Pros: Cons:

Text Analysis, Version 2 def read_words(filename): """Given a filename, return a list of words in the file.""" def word_count(wordlist, word): """Given a list of words and a word, returns a pair (count, allcounts_dict). count is the number of occurrences of the given word in the list, allcounts_dict is a dictionary mapping words to counts.""" def top10(wordcounts_dict): """Given a dictionary mapping words to counts, return a list of the top 10 most frequent words in the dictionary, from most to least frequent.""" def total_words(wordlist): """Return total number of words in the given list.""" # program to compute top 10: word_list = read_words("somedocument.txt") (count, word_dict) = word_count(word_list, "anyword") result = top10(word_dict)

Pros: Cons:

Text Analysis, Version 3 def read_words(filename): """Given a filename, return a dictionary mapping each word in filename to its frequency in the file""" def word_count(word_counts_dict, word): """Given a dictionary mapping word to counts, return the count of the given word in the dictionary.""" def top10(word_counts_dict): """Given a dictionary mapping word to counts, return a list of the top 10 most frequent words in the dictionary, from most to least frequent.""" def total_words(word_counts_dict): """Given a dictionary mapping word to counts, return the total number of words used to create the dictionary""" # program to compute top 10: word_dict = read_words("somedocument.txt") result = top10(word_dict)

Pros: Cons:

Analysis Consider the 3 designs For each design, state positives and negatives Which one do you think is best, and why?

Changes to text analysis problem Ignore stopwords (common words such as “the”) A list of stopwords is provided in a file, one per line. Show the top k words rather than the top 10.

Design criteria Ease of use vs. ease of implementation Generality Module may be written once but re-used many times Generality Can it be used in a new situation? Decomposability: Can parts of it be reused? Testability: Can parts of it be tested? Documentability Can you write a coherent description? Extensibility: Can it be easily changed?

This “default” pattern is so common, there is a special method for it. From Exercise 1: def read_words(filename): """Given a filename, return a dictionary mapping each word in filename to its frequency in the file""" wordfile = open(filename) worddata = wordfile.read() word_list = worddata.split() wordfile.close() wordcounts = {} for word in word_list: if wordcounts.has_key(word): wordcounts[word] = wordcounts[word] + 1 else: wordcounts[word] = 1 return wordcounts This “default” pattern is so common, there is a special method for it.

This “default” pattern is so common, there is a special method for it. setdefault def read_words(filename): """Given a filename, return a dictionary mapping each word in filename to its frequency in the file""" wordfile = open(filename) worddata = wordfile.read() word_list = worddata.split() wordfile.close() wordcounts = {} for word in word_list: count = wordcounts.setdefault(word, 0) wordcounts[word] = count + 1 return wordcounts This “default” pattern is so common, there is a special method for it.

setdefault setdefault(key[, default]) for word in word_list: if wordcounts.has_key(word): wordcounts[word] = wordcounts[word] + 1 else: wordcounts[word] = 1 VS: count = wordcounts.setdefault(word, 0) wordcounts[word] = count + 1 setdefault(key[, default]) If key is in the dictionary, return its value. If key is NOT present, insert key with a value of default, and return default. If default is not specified, the value None is used. get(key[, default])Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError. setdefault(key[, default])If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.