Design Exercise UW CSE 160 Spring 2015 1. Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions – For.

Slides:



Advertisements
Similar presentations
1 After completing this lesson, you will be able to: Check spelling in a document. Check for grammatical errors. Find specific text. Replace specific text.
Advertisements

Programming for Linguists
MapReduce.
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Data Abstraction UW CSE 190p Summer Recap of the Design Exercise You were asked to design a module – a set of related functions. Some of these functions.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Chapter 2: Algorithm Discovery and Design
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
Dictionaries Chapter Chapter Contents Specifications for the ADT Dictionary A Java Interface Iterators Using the ADT Dictionary A Directory of Telephone.
Dictionaries Chapter 17 Slides by Steve Armstrong LeTourneau University Longview, TX  2007,  Prentice Hall.
Team Dosen UMN Physical DB Design Connolly Book Chapter 18.
CS324e - Elements of Graphics and Visualization Java Intro / Review.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Chapter 6 Functions -- QuickStart. "The Practice of Computing Using Python", Punch & Enbody, Copyright © 2013 Pearson Education, Inc. What is a function?
More Data Abstraction UW CSE 190p Summer Recap of the Design Exercise You were asked to design a module – a set of related functions. Some of these.
Data Abstraction UW CSE 140 Winter What is a program? – A sequence of instructions to achieve some particular purpose What is a library? – A collection.
CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.
Design Exercise UW CSE 140 Winter Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions – For.
CSE 143 Lecture 11 Maps Grammars slides created by Alyssa Harding
111 Protocols CS 4311 Wirfs Brock et al., Designing Object-Oriented Software, Prentice Hall, (Chapter 8) Meyer, B., Applying design by contract,
CSS446 Spring 2014 Nan Wang.  Java Collection Framework ◦ Set ◦ Map 2.
Building Java Programs Chapter 11 Lecture 11-1: Sets and Maps reading:
More On Classes UW CSE 160 Spring Classes define objects What are objects we've seen? 2.
CSE 143 Lecture 14 (B) Maps and Grammars reading: 11.3 slides created by Marty Stepp
Search Engine Optimisation. On page methodologies –Anything you can affect with the construction of a single page Off page methodologies –Refers to all.
Design Exercise UW CSE 190p Summer 2012 download examples from the calendar.
Computing Science 1P Large Group Tutorial 14 Simon Gay Department of Computing Science University of Glasgow 2006/07.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
The Data Liberation Initiative Training University of Regina December 2, 1999 _______________________ Statistics Canada / Statistique Canada.
12. MODULES Rocky K. C. Chang November 6, 2015 (Based on from Charles Dierbach. Introduction to Computer Science Using Python and William F. Punch and.
Compsci 06/101, Spring This week in Compsci 6/101 l From sets to dictionaries  Incredibly powerful in solving problems  Power in Perl, Python,
1 PDMLink Application - User Features & Functions Module 6: Search Capabilities.
Data Abstraction UW CSE 160 Spring What is a program? – A sequence of instructions to achieve some particular purpose What is a library? – A collection.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
HTML And the Internet. HTML and the Internet ► HTML: HyperText Markup Language  Language in which all pages on the web are written  Not Really a Programming.
Today… Functions, Cont.: –Designing functions. –Functional Decomposition –Importing our own module –A demo: Functional solution to assignment 2. Winter.
Why indexing? For efficient searching of a document
Computing and Data Analysis
CSc 110, Spring 2017 Lecture 29: Sets and Dictionaries
The Data Liberation Initiative Training University of Regina December 2, 1999 _______________________ Statistics Canada / Statistique Canada Title.
Sets and Dictionaries Examples Copyright © Software Carpentry 2010
Map Reduce.
Fundamentals of Programming I Design with Functions
Design Exercise UW CSE 140 Winter 2014.
Design Exercise UW CSE 160 Winter 2017.
CSc 110, Autumn 2017 Lecture 31: Dictionaries
Collections Michael Ernst UW CSE 140.
Data Abstraction UW CSE 160 Winter 2017.
Ruth Anderson UW CSE 160 Spring 2015
Data Abstraction UW CSE 160 Winter 2016.
Haskell Strings and Tuples
CSc 110, Spring 2018 Lecture 33: Dictionaries
Design Exercise UW CSE 160 Spring 2018.
Algorithm Efficiency, Big O Notation, and Javadoc

Data Abstraction UW CSE 160 Spring 2018.
CSE 303 Concepts and Tools for Software Development
Michael Ernst UW CSE 140 Winter 2013
Using Voyant to Explore Text Data
Rocky K. C. Chang 15 November 2018 (Based on Dierbach)
Protocols CS 4311 Wirfs Brock et al., Designing Object-Oriented Software, Prentice Hall, (Chapter 8) Meyer, B., Applying design by contract, Computer,
Design Exercise UW CSE 160 Winter 2016.
Algorithmic complexity: Speed of algorithms
Hash Maps Implementation and Applications
Data Abstraction UW CSE 160.
Presentation transcript:

Design Exercise UW CSE 160 Spring

Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions – For each function, provide the name of the function a doc string for the function 2) Sketch an implementation of each function – In English, describe what the implementation needs to do – This will typically be no more than about 4-5 lines per function 2

Example of high-level “pseudocode” def read_scores(filename) ""“Read scores from filename and return a dictionary mapping words to scores""" open the file For each line in the file, insert the word and its score into a dictionary called scores return the scores dictionary def compute_total_sentiment(searchterm): """Return the total sentiment for all words in all tweets in the first page of results returned for the search term""" Construct the twitter search url for searchterm Fetch the twitter search results using the url For each tweet in the response, extract the text add up the scores for each word in the text add the score to the total return the total 3

Exercise 1: Text analysis Design a module for basic text analysis with the following capabilities: Compute the total number of words in a file Find the 10 most frequent words in a file. Find the number of times a given word appears in the file. Also show how to use the interface by computing the top 10 most frequent words in the file testfile.txt 4

Text Analysis, Version 1 def countwords(filename, word): """Given a filename and a word, return the count of the given word in the given file.""" def top10(filename): """Given a filename, return a list of the top 10 most frequent words in the given file, from most frequent to least frequent.""" def totalwords(filename): """Given a filename, return the total number of words in the file.""" # program to compute top 10: result = top10("somedocument.txt") 5

Pros: Cons: 6

Text Analysis, Version 2 def read_words(filename): """Given a filename, return a list of words in the file.""" def countwords(wordlist, word): """Given a list of words and a word, returns a pair (count, allcounts_dict). count is the number of occurrences of the given word in the list, allcounts_dict is a dictionary mapping words to counts.""" def top10(wordcounts_dict): """Given a dictionary mapping words to counts, return a list of the top 10 most frequent words in the dictionary, from most to least frequent.""" def totalwords(wordlist): """Return total number of words in the given list.""" # program to compute top 10: word_list = read_words("somedocument.txt") (count, word_dict) = countwords(word_list, "anyword") result = top10(word_dict) 7

Pros: Cons: 8

Text Analysis, Version 3 def read_words(filename): """Given a filename, return a dictionary mapping each word in filename to its frequency in the file""" def countwords(word_counts_dict, word): """Given a dictionary mapping word to counts, return the count of the given word in the dictionary.""" def top10(word_counts_dict): """Given a dictionary mapping word to counts, return a list of the top 10 most frequent words in the dictionary, from most to least frequent.""" def totalwords(word_counts_dict): """Given a dictionary mapping word to counts, return the total number of words used to create the dictionary""" # program to compute top 10: word_dict = read_words("somedocument.txt") result = top10(word_dict) 9

Pros: Cons: 10

Analysis Consider the 3 designs For each design, state positives and negatives Which one do you think is best, and why? 11

Changes to text analysis problem Ignore stopwords (common words such as “the”) – A list of stopwords is provided in a file, one per line. Show the top k words rather than the top

Design criteria Ease of use vs. ease of implementation – Module may be written once but re-used many times Generality – Can it be used in a new situation? – Decomposability: Can parts of it be reused? – Testability: Can parts of it be tested? Documentability – Can you write a coherent description? Extensibility: Can it be easily changed? 13

Exercise 2: Quantitative Analysis Design a module for basic statistical analysis of files in UWFORMAT with the following capabilities: Create an S-T plot: the salinity plotted against the temperature. Compute the minimum o2 in a file. UWFORMAT: line 0: site temp salt o2 line N: 14

Quantitative Analysis, Version 1 import matplotlib.pyplot as plt def read_measurements(filename): """Return a list of 4-tuples, each one of the form (site, temp, salt, oxygen)""" def STplot(measurements): """Given a list of 4-tuples, generate a scatter plot comparing salinity and temperature""" def minimumO2(measurements): """Given a list of 4-tuples, return the minimum value of the oxygen measurement""" 15

Changes UWFORMAT has changed: UWFORMAT2: line 0: site, date, chl, salt, temp, o2 line N:,,,,, Find the average temperature for site “X” 16

From Exercise 1: def read_words(filename): """Given a filename, return a dictionary mapping each word in filename to its frequency in the file""" wordfile = open(filename) worddata = wordfile.read() word_list = worddata.split() wordfile.close() wordcounts = {} for word in word_list: if wordcounts.has_key(word): wordcounts[word] = wordcounts[word] + 1 else: wordcounts[word] = 1 return wordcounts 17 This “default” pattern is so common, there is a special method for it.

setdefault def read_words(filename): """Given a filename, return a dictionary mapping each word in filename to its frequency in the file""" wordfile = open(filename) worddata = wordfile.read() word_list = worddata.split() wordfile.close() wordcounts = {} for word in word_list: count = wordcounts.setdefault(word, 0) wordcounts[word] = count + 1 return wordcounts 18 This “default” pattern is so common, there is a special method for it.

setdefault for word in word_list: if wordcounts.has_key(word): wordcounts[word] = wordcounts[word] + 1 else: wordcounts[word] = 1 VS: for word in word_list: count = wordcounts.setdefault(word, 0) wordcounts[word] = count + 1 setdefault( key[, default ] ) If key is in the dictionary, return its value. If key is NOT present, insert key with a value of default, and return default. If default is not specified, the value None is used. 19