Data Abstraction UW CSE 160 Winter 2016.

Slides:



Advertisements
Similar presentations
Data Abstraction UW CSE 190p Summer Recap of the Design Exercise You were asked to design a module – a set of related functions. Some of these functions.
Advertisements

Abstract Data Types and Encapsulation Concepts
Graph Implementations Chapter 31 Slides by Steve Armstrong LeTourneau University Longview, TX  2007,  Prentice Hall.
Singleton Christopher Chiaverini Software Design & Documentation September 18, 2003.
Chapter 11 Introduction to Classes Intro to Computer Science CS1510, Section 2 Dr. Sarah Diesburg.
CSC1018F: Object Orientation, Exceptions and File Handling Diving into Python Ch. 5&6 Think Like a Comp. Scientist Ch
1 CSC 221: Introduction to Programming Fall 2012 Functions & Modules  standard modules: math, random  Python documentation, help  user-defined functions,
More Data Abstraction UW CSE 190p Summer Recap of the Design Exercise You were asked to design a module – a set of related functions. Some of these.
Data Abstraction UW CSE 140 Winter What is a program? – A sequence of instructions to achieve some particular purpose What is a library? – A collection.
Design Exercise UW CSE 140 Winter Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions – For.
More On Classes UW CSE 160 Spring Classes define objects What are objects we've seen? 2.
Visualization UW CSE 140 Winter matplotlib Strives to emulate MATLAB – Pro: familiar to MATLAB users – Pro: powerful – Con: not the best design.
Chapter 10 Defining Classes. The Internal Structure of Classes and Objects Object – collection of data and operations, in which the data can be accessed.
Chapter 9 Dictionaries and Sets.
Python Primer 1: Types and Operators © 2013 Goodrich, Tamassia, Goldwasser1Python Primer.
Design Exercise UW CSE 190p Summer 2012 download examples from the calendar.
Chapter 3 Collections. Objectives  Define the concepts and terminology related to collections  Explore the basic structures of the Java Collections.
Design Exercise UW CSE 160 Spring Exercise Given a problem description, design a module to solve the problem 1) Specify a set of functions – For.
AP Computer Science A – Healdsburg High School 1 Unit 9 - Why Use Classes - Anatomy of a Class.
Data Abstraction UW CSE 160 Spring What is a program? – A sequence of instructions to achieve some particular purpose What is a library? – A collection.
David Evans CS201J: Engineering Software University of Virginia Computer Science Lecture 7: A Tale of Two Graphs (and.
CSE 373, Copyright S. Tanimoto, 2002 Abstract Data Types - 1 Abstract Data Types Motivation Abstract Data Types Example Using math. functions to describe.
COMPSCI 107 Computer Science Fundamentals
SQL – Python and Databases
Classes and Objects: Encapsulation
Sharing, mutability, and immutability
COMPSCI 107 Computer Science Fundamentals
Abstract Data Types and Encapsulation Concepts
Software Development Java Classes and Methods
Department of Computer Science,
Today’s Objectives Review the important points of classes
Component Implementations Using RESOLVE
Chapter 3: Using Methods, Classes, and Objects
11.1 The Concept of Abstraction
Computer Science 111 Fundamentals of Programming I
Design Exercise UW CSE 140 Winter 2014.
Design Exercise UW CSE 160 Winter 2017.
Sharing, mutability, and immutability
Creating and Using Classes
Lecture 9 Concepts of Programming Languages
Classes and Objects: Encapsulation
Data Abstraction UW CSE 160 Winter 2017.
Two Way Frequency Tables
SQL – Python and Databases (Continued)
Abstract Data Types and Encapsulation Concepts
Design Exercise UW CSE 160 Spring 2018.
Classes & Objects: Examples
Sharing, mutability, and immutability
Ruth Anderson UW CSE 160 Winter 2017
Data Abstraction UW CSE 160 Spring 2018.
Dynamic Data Structures and Generics
Sharing, mutability, and immutability
Ruth Anderson UW CSE 160 Spring 2018
Lecture 7: A Tale of Two Graphs CS201j: Engineering Software
Computer Science 111 Fundamentals of Programming I
CS100J Lecture 7 Previous Lecture This Lecture Java Constructs
Ruth Anderson UW CSE 160 Winter 2016
Python Primer 1: Types and Operators
Design Exercise UW CSE 160 Winter 2016.
Ruth Anderson CSE 160 University of Washington
Defining Classes and Methods
A Level Computer Science Topic 6: Introducing OOP
Lecture 13 Teamwork Bryan Burlingame 1 May 2019.
Hash Maps Implementation and Applications
CMPE212 – Reminders Assignment 2 due next Friday.
CMSC 202 Encapsulation Version 9/10.
Data Abstraction UW CSE 160.
11.1 The Concept of Abstraction
Lecture 9 Concepts of Programming Languages
Presentation transcript:

Data Abstraction UW CSE 160 Winter 2016

Recall the design exercise We created a module or library: a set of related functions The functions operated on the same data structure a dictionary associating words with a frequency count a list of tuples of measurements Each module contained: A function to create the data structure Functions to query the data structure We could have added functions to modify the data structure

Two types of abstraction Abstraction: Ignoring/hiding some aspects of a thing In programming, ignore everything except the specification or interface The program designer decides which details to hide and to expose Procedural abstraction: Define a procedure/function specification Hide implementation details Data abstraction: Define what the datatype represents Define how to create, query, and modify Hide implementation details of representation and of operations Also called “encapsulation” or “information hiding”

Data abstraction Describing word counts: Which do you prefer? Why? “dictionary mapping each word in filename to its frequency (raw count) in the file represented as an integer” “WordCounts” Which do you prefer? Why? (This must appear in the documentation string of every function related to field measurements!) encapsulation: commits us to a particular implementation, which makes it harder to change down the road

Review: Using the Graph class in networkx import networkx as nx g = nx.Graph() module name alias from networkx import Graph, DiGraph g = Graph() g.add_node(1) g.add_node(2) g.add_node(3) g.add_edge(1, 2) g.add_edge(2, 3) print g.nodes() print g.edges() print g.neighbors(2) Graph and DiGraph are now available in the global namespace

Representing a graph A graph consists of: Representations: nodes/vertices edges among the nodes Representations: Set of edge pairs (a, a), (a, b), (a, c), (b, c), (c, b) For each node, a list of neighbors { a: [a, b, c], b: [c], c: [b] } Matrix with boolean for each entry a b c In terms of information content, all of them are the same. Some might be more efficient in some contexts, and a b c ✓

Text analysis module (group of related functions) representation = dictionary # program to compute top 5: wordcounts = read_words(filename) result = topk(wordcounts, 5) def read_words(filename): """Return dictionary mapping each word in filename to its frequency.""" wordfile = open(filename) word_list = wordfile.read().split() wordfile.close() wordcounts_dict = {} for word in word_list: count = wordcounts_dict.setdefault(word, 0) wordcounts_dict[word] = count + 1 return wordcounts_dict def word_count(wordcounts_dict, word): """Return count of the word in the dictionary. """ return wordcounts_dict.get(word, 0) def topk(wordcounts_dict, k=10): """Return list of (count, word) tuples of the top k most frequent words.""" counts_with_words = [(c, w) for (w, c) in wordcounts_dict.items()] counts_with_words.sort(reverse=True) return counts_with_words[0:k] def total_words(wordcounts_dict): """Return the total number of words.""" return sum(wordcounts_dict.values())

Problems with the implementation # program to compute top 5: wordcounts = read_words(filename) result = topk(wordcounts, 5) The wordcounts dictionary is exposed to the client: the user might corrupt or misuse it. If we change our implementation (say, to use a list), it may break the client program. We prefer to Hide the implementation details from the client Collect the data and functions together into one unit

Datatypes and classes A class creates a namespace for: Variables to hold the data Functions to create, query, and modify Each function defined in the class is called a method Takes “self” (a value of the class type) as the first argument A class defines a datatype An object is a value of that type Comparison to other types: Type is int, value is 22 Type is the class, value is an object also known as an instantiation or instance of that type

Text analysis module (group of related functions) representation = dictionary # program to compute top 5: wordcounts = read_words(filename) result = topk(wordcounts, 5) def read_words(filename): """Return dictionary mapping each word in filename to its frequency.""" wordfile = open(filename) word_list = wordfile.read().split() wordfile.close() wordcounts_dict = {} for word in word_list: count = wordcounts_dict.setdefault(word, 0) wordcounts_dict[word] = count + 1 return wordcounts_dict def word_count(wordcounts_dict, word): """Return count of the word in the dictionary. """ return wordcounts_dict.get(word, 0) def topk(wordcounts_dict, k=10): """Return list of (count, word) tuples of the top k most frequent words.""" counts_with_words = [(c, w) for (w, c) in wordcounts_dict.items()] counts_with_words.sort(reverse=True) return counts_with_words[0:k] def total_words(wordcounts_dict): """Return the total number of words.""" return sum(wordcounts_dict.values())

Text analysis, as a class The type of wc is WordCounts # program to compute top 5: wc = WordCounts() wc.read_words(filename) result = wc.topk(5) class WordCounts: """Represents the words in a file.""" # Internal representation: # variable wordcounts is a dictionary mapping words to their frequency def read_words(self, filename): """Populate a WordCounts object from the given file""" word_list = open(filename).read().split() self.wordcounts = {} for w in word_list: self.wordcounts.setdefault(w, 0) self.wordcounts[w] += 1 def word_count(self, word): """Return the count of the given word""" return self.wordcounts.get(word, 0) def topk(self, k=10): """Return a list of the top k most frequent words in order""" scores_with_words = [(c,w) for (w,c) in self.wordcounts.items()] scores_with_words.sort(reverse=True) return scores_with_words[0:k] def total_words(self): """Return the total number of words in the file""" return sum(self.wordcounts.values()) Defines a class (a datatype) named WordCounts topk takes 2 arguments The type of self is WordCounts Modifies a WordCounts object read_words does not return a value; it mutates self Queries a WordCounts object The namespace of a WordCounts object: dict wordcounts read_words word_count topk total_words fn fn Each function in a class is called a method. Its first argument is of the type of the class. fn fn

# program to compute top 5: wc = WordCounts() wc.read_words(filename) result = wc.topk(5) result = WordCounts.topk(wc, 5) Weird constructor: it does not do any work You have to call a mutator immediately afterward A value of type WordCounts Two equivalent calls A namespace, like a module A function that takes two arguments

Class with constructor # program to compute top 5: wc = WordCounts(filename) result = wc.topk(5) class WordCounts: """Represents the words in a file.""" # Internal representation: # variable wordcounts is a dictionary mapping words to their frequency def __init__(self, filename): """Create a WordCounts object from the given file""" words = open(filename).read().split() self.wordcounts = {} for w in words: self.wordcounts.setdefault(w, 0) self.wordcounts[w] += 1 def word_count(self, word): """Return the count of the given word""" return self.wordcounts.get(word, 0) def topk(self, k=10): """Return a list of the top k most frequent words in order""" scores_with_words = [(c,w) for (w,c) in self.wordcounts.items()] scores_with_words.sort(reverse=True) return scores_with_words[0:k] def total_words(self): """Return the total number of words in the file""" return sum([c for (w,c) in self.wordcounts])

Alternate implementation # program to compute top 5: wc = WordCounts(filename) result = wc.topk(5) class WordCounts: """Represents the words in a file.""" # Internal representation: # variable words is a list of the words in the file def __init__(self, filename): """Create a WordCounts object from the given file""" self.words = open(filename).read().split() def word_count(self, word): """Return the count of the given word""" return self.words.count(word) def topk(self, k=10): """Return a list of the top k most frequent words in order""" scores_with_words = [(self.word_count(w),w) for w in set(self.words)] scores_with_words.sort(reverse=True) return scores_with_words[0:k] def total_words(self): """Return the total number of words in the file""" return len(self.words) Exact same program! The namespace of a WordCounts object: list words __init__ word_count topk total_words fn fn fn fn

Quantitative analysis # Program to plot mydict = read_measurements(filename) STplot(mydict) Quantitative analysis def read_measurements(filename): """Return a dictionary mapping column names to data. Assumes the first line of the file is column names.""" datafile = open(filename) rawcolumns = zip(*[row.split() for row in datafile]) columns = dict([(col[0], col[1:]) for col in rawcolumn]) return columns def tofloat(measurements, columnname): """Convert each value in the given iterable to a float""" return [float(x) for x in measurements[columnname]] def STplot(measurements): """Generate a scatter plot comparing salinity and temperature""" xs = tofloat(measurements, "salt") ys = tofloat(measurements, "temp") plt.plot(xs, ys) plt.show() def minimumO2(measurements): """Return the minimum value of the oxygen measurement""" return min(tofloat(measurements, "o2"))

Quantitative analysis, as a class # Program to plot mm = Measurements() mm.read_measurements(filename) mm.STplot() Quantitative analysis, as a class class Measurements: """Represents a set of measurements in UWFORMAT.""“ def read_measurements(self, filename): """Populate a Measurements object from the given file. Assumes the first line of the file is column names.""" datafile = open(filename) rawcolumns = zip(*[row.split() for row in datafile]) self.columns = dict([(col[0], col[1:]) for col in rawcolumn]) return columns def tofloat(self, columnname): """Convert each value in the given iterable to a float""" return [float(x) for x in self.columns[columnname]] def STplot(self): """Generate a scatter plot comparing salinity and temperature""" xs = tofloat(self.columns, "salt") ys = tofloat(self.columns, "temp") plt.plot(xs, ys) plt.show() def minimumO2(self): """Return the minimum value of the oxygen measurement""" return min(tofloat(self.columns, "o2"))

Quantitative analysis, with a constructor # Program to plot mm = Measurements(filename) mm.STplot() Quantitative analysis, with a constructor class Measurements: """Represents a set of measurements in UWFORMAT.""“ def __init__(self, filename): """Create a Measurements object from the given file. Assumes the first line of the file is column names.""" datafile = open(filename) rawcolumns = zip(*[row.split() for row in datafile]) self.columns = dict([(col[0], col[1:]) for col in rawcolumn]) def tofloat(self, columnname): """Convert each value in the given iterable to a float""" return [float(x) for x in self.columns[columnname]] def STplot(self): """Generate a scatter plot comparing salinity and temperature""" xs = tofloat(self.columns, "salt") ys = tofloat(self.columns, "temp") plt.plot(xs, ys) plt.show() def minimumO2(self): """Return the minimum value of the oxygen measurement""" return min(tofloat(self.columns, "o2"))