Design Exercise UW CSE 140 Winter 2014.

Slides:



Advertisements
Similar presentations
Programming for Linguists
Advertisements

MapReduce.
© Paradigm Publishing, Inc Word 2010 Level 2 Unit 1Formatting and Customizing Documents Chapter 2Proofing Documents.
Introduction to Computing Using Python CSC Winter 2013 Week 8: WWW and Search  World Wide Web  Python Modules for WWW  Web Crawling  Thursday:
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Data Abstraction UW CSE 190p Summer Recap of the Design Exercise You were asked to design a module – a set of related functions. Some of these functions.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Dictionaries Chapter 17 Slides by Steve Armstrong LeTourneau University Longview, TX  2007,  Prentice Hall.
CS324e - Elements of Graphics and Visualization Java Intro / Review.
More Data Abstraction UW CSE 190p Summer Recap of the Design Exercise You were asked to design a module – a set of related functions. Some of these.
Data Abstraction UW CSE 140 Winter What is a program? – A sequence of instructions to achieve some particular purpose What is a library? – A collection.
Design Exercise UW CSE 140 Winter Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions – For.
CSE 143 Lecture 11 Maps Grammars slides created by Alyssa Harding
111 Protocols CS 4311 Wirfs Brock et al., Designing Object-Oriented Software, Prentice Hall, (Chapter 8) Meyer, B., Applying design by contract,
More On Classes UW CSE 160 Spring Classes define objects What are objects we've seen? 2.
Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.
Design Exercise UW CSE 190p Summer 2012 download examples from the calendar.
Computing Science 1P Large Group Tutorial 14 Simon Gay Department of Computing Science University of Glasgow 2006/07.
Design Exercise UW CSE 160 Spring Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions – For.
12. MODULES Rocky K. C. Chang November 6, 2015 (Based on from Charles Dierbach. Introduction to Computer Science Using Python and William F. Punch and.
1 PDMLink Application - User Features & Functions Module 6: Search Capabilities.
Data Abstraction UW CSE 160 Spring What is a program? – A sequence of instructions to achieve some particular purpose What is a library? – A collection.
CMSC 104, Section 301, Fall Lecture 18, 11/11/02 Functions, Part 1 of 3 Topics Using Predefined Functions Programmer-Defined Functions Using Input.
Today… Functions, Cont.: –Designing functions. –Functional Decomposition –Importing our own module –A demo: Functional solution to assignment 2. Winter.
Why indexing? For efficient searching of a document
SQL – Python and Databases
CS 330 Class 7 Comments on Exam Programming plan for today:
Computing and Data Analysis
CSc 110, Spring 2017 Lecture 29: Sets and Dictionaries
Algorithmic complexity: Speed of algorithms
The Data Liberation Initiative Training University of Regina December 2, 1999 _______________________ Statistics Canada / Statistique Canada Title.
CSC 458– Predictive Analytics I, Fall 2017, Intro. To Python
Sets and Dictionaries Examples Copyright © Software Carpentry 2010
Map Reduce.
C++ Standard Library.
Methodology – Physical Database Design for Relational Databases
Product Retrieval Statistics Canada / Statistique Canada Title page
Design Exercise UW CSE 160 Winter 2017.
CSc 110, Autumn 2017 Lecture 31: Dictionaries
Collections Michael Ernst UW CSE 140.
Data Abstraction UW CSE 160 Winter 2017.
Ruth Anderson UW CSE 160 Spring 2015
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
Data Abstraction UW CSE 160 Winter 2016.
Haskell Strings and Tuples
CSc 110, Spring 2018 Lecture 33: Dictionaries
Winter 2018 CISC101 12/1/2018 CISC101 Reminders
Design Exercise UW CSE 160 Spring 2018.
CISC101 Reminders Assn 3 due Friday, this week. Quiz 3 next week.

Maps Grammars Based on slides by Marty Stepp & Alyssa Harding
Data Abstraction UW CSE 160 Spring 2018.
CSC 458– Predictive Analytics I, Fall 2018, Intro. To Python
Text Analysis and Search Analytics
Feature Extraction on Twitter Streaming data using Spark RDD
CSE 303 Concepts and Tools for Software Development
Algorithmic complexity: Speed of algorithms
Michael Ernst UW CSE 140 Winter 2013
Using Voyant to Explore Text Data
Rocky K. C. Chang 15 November 2018 (Based on Dierbach)
Protocols CS 4311 Wirfs Brock et al., Designing Object-Oriented Software, Prentice Hall, (Chapter 8) Meyer, B., Applying design by contract, Computer,
6. Dictionaries and sets Rocky K. C. Chang 18 October 2018
slides created by Alyssa Harding
Design Exercise UW CSE 160 Winter 2016.
Algorithmic complexity: Speed of algorithms
Text Analysis and Search Analytics
A Level Computer Science Topic 6: Introducing OOP
Hash Maps Implementation and Applications
Data Abstraction UW CSE 160.
Presentation transcript:

Design Exercise UW CSE 140 Winter 2014

Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions For each function, provide the name of the function a doc string for the function 2) Sketch an implementation of each function In English, describe what the implementation needs to do This will typically be no more than about 4-5 lines per function

Example of high-level “pseudocode” def read_scores(filename) ""“Read scores from filename and return a dictionary mapping words to scores""" open the file For each line in the file, insert the word and its score into a dictionary called scores return the scores dictionary def compute_total_sentiment(searchterm): """Return the total sentiment for all words in all tweets in the first page of results returned for the search term""" Construct the twitter search url for searchterm Fetch the twitter search results using the url For each tweet in the response, extract the text add up the scores for each word in the text add the score to the total return the total

Exercise 1: Text analysis Design a module for basic text analysis with the following capabilities: Compute the total number of words in a file Find the 10 most frequent words in a file. Find the number of times a given word appears in the file. Also show how to use the interface by computing the top 10 most frequent words in the file testfile.txt

Text Analysis, Version 1 def wordcount(filename, word): """Return the count of the given word in the given file""" def top10(filename): """Return a list of the top 10 most frequent words in the given file""" def totalwords(filename): """Return the total number of words in the file""" # program to compute top 10: result = top10("somedocument.txt") print result

Pros: Cons:

Text Analysis, Version 2 def read_words(filename): """Return a list of words in the file""" def wordcount(wordlist, word): """Given a list of words, returns a pair (count, allcounts). count is the number of occurrences of the given word in the list, allcounts is a dictionary mapping words to counts.""" def top10(wordcounts): """Given a dictionary mapping words to counts, return a list of the top 10 most frequent words in the dictionary, from most to least frequent.""" def totalwords(wordlist): """Return total number of words in the given list""" # program to compute top 10: words = read_words(filename) (cnt, allcounts) = wordcount(words, "anyword") result = top10(allcounts)

Pros: Cons:

Text Analysis, Version 3 def read_words(filename): """Return a dictionary mapping each word in filename to its frequency in the file""" def wordcount(wordcounts, word): """Given a dictionary mapping word to counts, return the count of the given word in the dictionary.""" def top10(wordcounts): """Given a dictionary mapping word to counts, return a list of the top 10 most frequent words in the dictionary, from most to least frequent.""" def totalwords(wordcounts): """Given a dictionary mapping word to counts, return the total number of words used to create the dictionary""" # program to compute top 10: wordcounts = read_words(filename) result = top10(wordcounts)

Pros: Cons:

Analysis Consider the 3 designs For each design, state positives and negatives Which one do you think is best, and why?

Changes to text analysis problem Ignore stopwords (common words such as “the”) A list of stopwords is provided in a file, one per line. Show the top k words rather than the top 10.

Design criteria Ease of implementation Generality Documentability More important for client than for library Generality Can it be used in a new situation? Decomposability: Can parts of it be reused? Testability: Can parts of it be tested? Documentability Can you write a coherent description? Extensibility: Can it be easily changed?

Exercise 2: Quantitative Analysis Design a module for basic statistical analysis of files in UWFORMAT with the following capabilities: Create an S-T plot: the salinity plotted against the temperature. Compute the minimum o2 in a file. UWFORMAT: line 0: site temp salt o2 line N: <string> <float> <float> <float>

Quantitative Analysis, Version 1 import matplotlib.pyplot as plt def read_measurements(filename): """Return a list of 4-tuples, each one of the form (site, temp, salt, oxygen)""" def STplot(measurements): """Given a list of 4-tuples, generate a scatter plot comparing salinity and temperature""" def minimumO2(measurements): """Given a list of 4-tuples, return the minimum value of the oxygen measurement"""

Changes UWFORMAT has changed: line 0: site, date, chl, salt, temp, o2 line N: <string>, <string>, <float>, <float>, <float>, <float> Find the average temperature for site “X”

This “default” pattern is so common, there is a special method for it. From Exercise 1: def read_words(filename): """Return a dictionary mapping each word in filename to its frequency in the file """ wordfile = open(filename) worddata = wordfile.read() words = worddata.split() wordfile.close() wordcounts = {} for w in words: if wordcounts.has_key(w): wordcounts[w] = wordcounts[w] + 1 else: wordcounts[w] = 1 return wordcounts This “default” pattern is so common, there is a special method for it.

This “default” pattern is so common, there is a special method for it. setdefault def read_words(filename): """Return a dictionary mapping each word in filename to its frequency in the file""“ wordfile = open(filename) worddata = wordfile.read() words = worddata.split() wordfile.close() wordcounts = {} for w in words: cnt = wordcounts.setdefault(w, 0) wordcounts[w] = cnt + 1 return wordcounts This “default” pattern is so common, there is a special method for it.

setdefault setdefault(key[, default]) for w in words: if wordcounts.has_key(w): wordcounts[w] = wordcounts[w] + 1 else: wordcounts[w] = 1 VS: cnt = wordcounts.setdefault(w, 0) wordcounts[w] = cnt + 1 setdefault(key[, default]) If key is in the dictionary, return its value. If key is NOT present, insert key with a value of default, and return default. If default is not specified, the value None is used.