05 Array-Based Sequences

Slides:

Advertisements

Similar presentations

Stacks. 2 Outline and Reading The Stack ADT (§4.2.1) Applications of Stacks (§4.2.3) Array-based implementation (§4.2.2) Growable array-based stack.

Advertisements

Lists and Iterators CSC311: Data Structures 1 Chapter 6 Lists and Iterators Objectives Array Lists and Vectors: ADT and Implementation Node Lists: ADT.

Stacks. 2 Outline and Reading The Stack ADT (§2.1.1) Applications of Stacks (§2.1.1) Array-based implementation (§2.1.1) Growable array-based stack (§1.5)

© 2004 Goodrich, Tamassia Vectors1 Lecture 03 Vectors, Lists and Sequences Topics Vectors Lists Sequences.

1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.

C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.

Stacks. week 2a2 Outline and Reading The Stack ADT (§4.1) Applications of Stacks Array-based implementation (§4.1.2) Growable array-based stack Think.

Unit III : Introduction To Data Structures and Analysis Of Algorithm 10/8/ Objective : 1.To understand primitive storage structures and types 2.To.

TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.

Dynamic Data Structures and Generics Chapter 10. Outline Vectors Linked Data Structures Introduction to Generics.

© 2004 Goodrich, Tamassia Vectors1 Vectors and Array Lists.

Array Lists1 © 2010 Goodrich, Tamassia. Array Lists2 The Array List ADT  The Array List ADT extends the notion of array by storing a sequence of arbitrary.

Object-Oriented Programming © 2013 Goodrich, Tamassia, Goldwasser1Object-Oriented Programming.

Lists1 © 2010 Goodrich, Tamassia. Position ADT  The Position ADT models the notion of place within a data structure where a single object is stored 

Sets and Maps Chapter 9.

CSE373: Data Structures & Algorithms Priority Queues

Lecture 6 of Computer Science II

Lists and Iterators 5/3/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,

Lecture 5 of Computer Science II

Chapter 2 Memory and process management

Course Developer/Writer: A. J. Ikuomola

CHP - 9 File Structures.

Ch7. List and Iterator ADTs

COMP 53 – Week Eleven Hashtables.

Data Structure and Algorithms

CSCI 210 Data Structures and Algorithms

Big-O notation.

Sequences and Iterators

Linked Lists Linked Lists 1 Sequences Sequences 07/25/16 10:31

Fundamentals of Programming II Working with Arrays

Introduction to Algorithms

Lists in Lisp and Scheme

13 Text Processing Hongfei Yan June 1, 2016.

© 2013 Goodrich, Tamassia, Goldwasser

Hash functions Open addressing

Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.

Data Structures (CS212D) Overview & Review.

CS313D: Advanced Programming Language

Array Lists, Node Lists & Sequences

Ch7. List and Iterator ADTs

Lists and Iterators 3/9/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia,

Vectors 11/23/2018 1:03 PM Growing Arrays Vectors.

Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.

Objective of This Course

" A list is only as strong as its weakest link. " - Donald Knuth

Main Memory Background Swapping Contiguous Allocation Paging

Array-Based Sequences

CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.

Lecture 3: Main Memory.

Cs212: Data Structures Computer Science Department Lecture 7: Queues.

CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.

Recall What is a Data Structure Very Fundamental Data Structures

Data Structures (CS212D) Overview & Review.

CS202 - Fundamental Structures of Computer Science II

Cs212: Data Structures Computer Science Department Lecture 6: Stacks.

Fundamentals of Python: First Programs

Python Primer 1: Types and Operators

Advanced Implementation of Tables

Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures

Introduction to Data Structure

Sets and Maps Chapter 9.

How to use hash tables to solve olympiad problems

Data Structures & Algorithms

Data Structures – Week #7

CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT

Vectors and Array Lists

Stacks and Linked Lists

Presentation transcript:

05 Array-Based Sequences Hongfei Yan Mar. 23, 2016

Contents 01 Python Primer (P2-51) 02 Object-Oriented Programming (P57-103) 03 Algorithm Analysis (P111-141) 04 Recursion (P150-180) 05 Array-Based Sequences (P184-224) 06 Stacks, Queues, and Deques (P229-250) 07 Linked Lists (P256-294) 08 Trees (P300-352) 09 Priority Queues (P363-395) 10 Maps, Hash Tables, and Skip Lists (P402-452) 11 Search Trees (P460-528) 12 Sorting and Selection (P537-574) 13 Text Processing (P582-613) 14 Graph Algorithms (P620-686) 15 Memory Management and B-Trees (P698-717) ISLR_Print6

05 Array-Based Sequences 5.1 Python’s Sequence Types 5.2 Low-Level Arrays 5.3 Dynamic Arrays and Amortization 5.4 Efficiency of Python’s Sequence Types 5.5 Using Array-Based Sequences 5.6 Multidimensional Data Sets

5.1 Python Sequence Classes Python has built-in types, list, tuple, and str. Each of these sequence types supports indexing to access an individual element of a sequence, using a syntax such as A[i] Each of these types uses an array to represent the sequence. An array is a set of memory locations that can be addressed using consecutive indices, which, in Python, start with index 0. A 1 2 i n

Public Behaviors A proper understanding of the outward semantics for a class is a necessity for a good programmer. While the basic usage of lists, strings, and tuples may seem straightforward, there are several important subtleties regarding the behaviors associated with these classes (such as what it means to make a copy of a sequence, or to take a slice of a sequence). Having a misunderstanding of a behavior can easily lead to inadvertent bugs in a program. Therefore, we establish an accurate mental model for each of these classes. These images will help when exploring more advanced usage, such as representing a multidimensional data set as a list of lists. mental model 构思模型

Implementation Details A focus on the internal implementations of these classes seems to go against our stated principles of object-oriented programming. we emphasized the principle of encapsulation, noting that the user of a class need not know about the internal details of the implementation. While it is true that one only needs to understand the syntax and semantics of a class’s public interface in order to be able to write legal and correct code that uses instances of the class, the efficiency of a program depends greatly on the efficiency of the components upon which it relies.

Asymptotic and Experimental Analysis In describing the efficiency of various operations for Python’s sequence classes, we will rely on the formal asymptotic analysis notations. We will also perform experimental analyses of the primary operations to provide empirical evidence that is consistent with the more theoretical asymptotic analyses.

5.2 Low-Level Arrays To accurately describe the way in which Python represents the sequence types, we must first discuss aspects of the low-level computer architecture. The primary memory of a computer is composed of bits of information, and those bits are typically grouped into larger units that depend upon the precise system architecture. Such a typical unit is a byte, which is equivalent to 8 bits. A computer system will have a huge number of bytes of memory, and to keep track of what information is stored in what byte, the computer uses an abstraction known as a memory address.

Using the notation for asymptotic analysis, we say that any individual byte of memory can be stored or retrieved in O(1) time. In general, a programming language keeps track of the association between an identifier and the memory address in which the associated value is stored. For example, identifier x might be associated with one value stored in memory, while y is associated with another value stored in memory.

A group of related variables can be stored one after another in a contiguous portion of the computer’s memory. We will denote such a representation as an array. We describe this as an array of six characters, even though it requires 12 bytes of memory. We will refer to each location within an array as a cell, and will use an integer index to describe its location within the array, with cells numbered starting with 0, 1, 2, and so on. Each cell of an array must use the same number of bytes. This requirement is what allows an arbitrary cell of the array to be accessed in constant time based on its index.

a programmer can envision a more typical high-level abstraction of an array of characters as diagrammed in Figure 5.3.

5.2.1 Referential Arrays Python represents a list or tuple instance using an internal storage mechanism of an array of object references. At the lowest level, what is stored is a consecutive sequence of memory addresses at which the elements of the sequence reside. A high-level diagram of such a list is shown. Although the relative size of the individual elements may vary, the number of bits used to store the memory address of each element is fixed (e.g., 64-bits per address). In this way, Python can support constant-time access to a list or tuple element based on its index.

The fact that lists and tuples are referential structures is significant to the semantics of these classes. A single list instance may include multiple references to the same object as elements of the list, and it is possible for a single object to be an element of two or more lists, as those lists simply store references back to that object.

Vectors 11/12/2018 6:00 AM 5.2.2 Compact Arrays Primary support for compact arrays is in a module named array. That module defines a class, also named array, providing compact storage for arrays of primitive data types. Compact arrays have several advantages over referential structures in terms of computing performance. Most significantly, the overall memory usage will be much lower for a compact structure because there is no overhead devoted to the explicit storage of the sequence of memory references (in addition to the primary data). Another important advantage to a compact structure for high-performance computing is that the primary data are stored consecutively in memory. import array Primes = array.array(‘I’, [2,3,5,7,11,13,17,19])

Creating compact arrays of various types Vectors 11/12/2018 6:00 AM Creating compact arrays of various types The constructor for the array class requires a type code as a first parameter, which is a character that designates the type of data that will be stored in the array. import array Primes = array.array(‘I’, [2,3,5,7,11,13,17,19])

Type Codes in the array Class Python’s array class has the following type codes:

5.2 Dynamic Arrays and Amortization

5.3.1 Implementing a Dynamic Array Python list class provides a highly optimized implementation of dynamic arrays, upon which we rely for the remainder of this book, it is instructive to see how such a class might be implemented. The key is to provide means to grow the array A that stores the elements of a list. Of course, we cannot actually grow that array, as its capacity is fixed.

If an element is appended to a list at a time when the underlying array is full, we perform the following steps:

Code Fragment 5.3: An implementation of a DynamicArray class, using a raw array from the ctypes module as storage.

5.4 Efficiency of Python’s Sequence Types

Insertion In an operation add(i, o), we need to make room for the new element by shifting forward the n - i elements A[i], …, A[n - 1] In the worst case (i = 0), this takes O(n) time A 1 2 i n A 1 2 i n A o 1 2 i n

Element Removal In an operation remove(i), we need to fill the hole left by the removed element by shifting backward the n - i - 1 elements A[i + 1], …, A[n - 1] In the worst case (i = 0), this takes O(n) time A 1 2 n o i A 1 2 n i A 1 2 n i

Performance In an array based implementation of a dynamic list: The space used by the data structure is O(n) Indexing the element at I takes O(1) time add and remove run in O(n) time in worst case In an add operation, when the array is full, instead of throwing an exception, we can replace the array with a larger one…

Growable Array-based Array List In an add(o) operation (without an index), we could always add at the end When the array is full, we replace the array with a larger one How large should the new array be? Incremental strategy: increase the size by a constant c Doubling strategy: double the size Algorithm add(o) if t = S.length  1 then A  new array of size … for i  0 to n-1 do A[i]  S[i] S  A n  n + 1 S[n-1]  o

Comparison of the Strategies compare the incremental strategy and the doubling strategy by analyzing the total time T(n) needed to perform a series of n add(o) operations assume that we start with an empty stack represented by an array of size 1 call amortized time of an add operation the average time taken by an add over the series of operations, i.e., T(n)/n

Incremental Strategy Analysis We replace the array k = n/c times The total time T(n) of a series of n add operations is proportional to n + c + 2c + 3c + 4c + … + kc = n + c(1 + 2 + 3 + … + k) = n + ck(k + 1)/2 Since c is a constant, T(n) is O(n + k2), i.e., O(n2) The amortized time of an add operation is O(n)

Doubling Strategy Analysis We replace the array k = log2 n times The total time T(n) of a series of n add operations is proportional to n + 1 + 2 + 4 + 8 + …+ 2k = n + 2k + 1 - 1 = 3n - 1 T(n) is O(n) The amortized time of an add operation is O(1) geometric series 1 2 4 8

5.3.3 Python’s List Class

5.5 Using Array-Based Sequences 5.5.1 Storing High Scores for a Game 5.5.2 Sorting a Sequence 5.5.3 Simple Cryptography

5.5.1 Storing High Scores for a Game

5.5.1 Storing High Scores for a Game Vectors 11/12/2018 6:00 AM 5.5.1 Storing High Scores for a Game Python: How to print a class or objects of class using print()? http://stackoverflow.com/questions/1535327/python-how-to-print-a-class-or-objects-of-class-using-print >>> class Test: ... def __repr__(self): ... return "Test()" ... def __str__(self): ... return "member of Test" ... >>> t = Test() >>> t Test() >>> print t member of TestThe __str__ method is what happens when you print it, and the __repr__ method is what happens when you use the repr() function (or when you look at it with the interactive prompt). If this isn't the most Pythonic method, I apologize, because I'm still learning too - but it works. If no __str__ method is given, Python will print the result of __repr__ instead. If you define __str__ but not __repr__, Python will use what you see above as the __repr__, but still use __str__ for printing.

5.5.2 Sorting a Sequence

5.5.2 Sorting a Sequence

5.5.3 Simple Cryptography An interesting application of strings and lists is cryptography, the science of secret messages and their applications. This field studies ways of performing encryption, which takes a message, called the plaintext, and converts it into a scrambled message, called the ciphertext. Likewise, cryptography also studies corresponding ways of performing decryption, which takes a ciphertext and turns it back into its original plaintext. Arguably the earliest encryption scheme is the Caesar cipher, which is named after Julius Caesar, who used this scheme to protect important military messages. The Caesar cipher involves replacing each letter in a message with the letter that is a certain number of letters after it in the alphabet.

5.6 Multidimensional Data Sets For example, the two-dimensional data might be stored in Python as follows. data = [ [22, 18, 709, 5, 33], [45, 32, 830, 120, 750], [4, 880, 45, 66, 61] ]

Constructing a Multidimensional List initialize a one-dimensional list, rely on a syntax such as data = [0] * n to create a list of n zeros. list comprehension syntax for a 2D list of integers data = [ [0] c for j in range(r) ]

Tic-Tac-Toe

Summary