Download presentation
Presentation is loading. Please wait.
Published byKelly Hopkins Modified over 8 years ago
1
CS 315 Data Structures Fall 2011 Instructor: B. (Ravi) Ravikumar Office: 116 I Darwin Hall Phone: 664 3335 E-mail: cs315spring11@gmail.com Course Web site: http://ravi.cs.sonoma.edu/cs315fa11
2
Textbook: Data Structures and Algorithm Analysis in C++. 3 rd edition by Mark Allen Weiss Course Schedule Lecture: M W 10:45 – 12, Salazar 2024 Lab: F 9 to 11:50 AM, Darwin 28
3
Data Structures – what is the main focus? (more) programming larger problems (than what was studied in CS 215) new data types (images, audio, text) (systematic) algorithm design performance issues comparison of algorithms time, storage requirements applications image processing compression web search board games
4
Data Structure – what is the main focus? How to organize data in memory so that the we can solve the problem most efficiently? Hard disk RAM CPU
5
Data Structure – what is the main focus? Topics of concern: software design, problem solving and applications online vs. offline phase of a solution preprocessing vs. updating trade-offs between operations efficiency
6
Course Goals Learn to use fundamental data structures: arrays, linked lists, stacks and queues hash table priority queue binary search tree etc. Improve programming skill recursion, classes, algorithm design, implementation build projects using different data structures Analytical and experimental analysis quantitative reasoning about the performance of algorithms (time, storage, etc.) comparing different data structures
7
Course Goals Applications: image storage, manipulation compression (text, image, video etc.) audio processing web search algorithms for networking and communication routing protocols computer interconnections
8
Data Structures – key to software design Data structures play a key role in every type of software. Data structure deals with how to store the data internally while solving a problem in order to Optimize the overall running time of a program the response time (for queries) the memory requirements other resources (e.g. band-width of a network) Simplify software design make solution extendible, more robust
9
Abstract vs. concrete data structures w Abstract data structure (sometimes called ADT -> Abstract Data Type) is a collection of data with a set of operations supported to manipulate the structure w Examples: stack, queue insert, delete priority queue insert, deleteMin Dictionary insert, search, delete w Concrete data structures are the implementations of abstract data structures: Arrays, linked lists, trees, heaps, hash table w A recurring theme: Find the best mapping between abstract and concrete data structures.
10
Abstract Data Structure (ADT) container supporting operations Dictionary search insert primary operations Delete deleteMin Range search Successor secondary operations Merge Priority queue Insert deleteMin Merge, split etc. Secondary operations primary operations
11
Linear data structures key properties of the (1-dim.) array: a sequence of items are stored in consecutive physical memory locations. main advantage: array provides a constant time access to k-th element for any k. (access the element by: Element[k].) other operations are expensive: Search Insert delete
12
2-dim. arrays w Used to store images, tables etc. w Given row number r, and column number s, the element in A[r, s] can be accessed in one clock cycle. w (usually row major or column major order is used.) w Other operations are expensive. w Sparse array representation Used to compress images Trade-offs between storage and time
13
Linked lists w Linked lists: Storing a sequence of items in non- consecutive locations of the memory. Not easy to search for a key (even if sorted). Inserting next to a given item is easy. Array vs. linked list: Don’t need to know the number of items in advance. (dynamic memory allocation) disadvantages order is important
14
stacks and queues stacks: insert and delete at the same end. equivalently, last element inserted will be the first one to be deleted. very useful to solve many problems Processing arithmetic expressions queues: insert at one end, deletion at the other end. equivalently, first element inserted is the first one to be deleted.
15
Non-linear data structures w Various versions of trees Binary search trees Height-balanced trees etc. Lptr key Rptr 15 Main purpose of a binary search tree support dictionary operations efficiently
16
Priority queue w Max priority key is the one that gets deleted next. Equivalently, support for the following operations: insert deleteMin w Useful in solving many problems fast sorting (heap-sorting) shortest-path, minimum spanning tree, scheduling etc.
17
Hashing Supports dictionary operations very efficiently (most of the time). Main advantages: Simple to design, implement on average very fast not good in the worst-case.
18
Applications arithmetic expression evaluation data compression (Huffman coding, LZW algorithm) image segmentation, image compression backtrack searching finding the best path to route in a network geometric problems (e.g. rectangle area)
19
What data structure to use? Example 1: There are more than 1 billion web pages. When you type on google search page something like: You get instantaneous response. What kind of data structure is used here? The details are quite complicated, but the main data structure used is quite simple.
20
Data structure used - inverted index Array of lists – each array entry contains a word and a pointer to all the web pages that contain that word: Data structure 3897 145 297 This list is kept sorted 876 Question: How do we access the array index from key word? Hashing is used.
21
Example 2: The entire landscape of the world is being digitized (there is a whole new branch that combines information technology and geography called GIS – Geographic Information System). What kind of data structure should be used to store all this information? Snapshot from Google earth
22
Some general issues related to GIS How much memory do we need? Can this be stored in one computer? Building the database is done in the background (off-line processing) How fast can the queries be answered? Response to query is called the on-line processing Suppose each square mile is represented by a 1024 by 1024 pixel image, how much storage do we need to store the map of the United States?
23
Calculate the memory needed Very rough estimate of the memory needed: Area of USA is 4 x 10 6 sq miles (roughly) Each square mile needs 10 6 pixels (roughly) Each pixel requires 32 bits usually. Thus the total memory needed = 4 x 10 6 x 32 x 10 6 = 168 x 10 12 = 168000 Giga bits (A standard desk top has ~ 200 Giga bits of memory.) Need about 800 such computers to store the data
24
What data structure to store the images? each 1024 x 1024 image can be stored in a two- dimensional array. (standard way to store all kinds of images – bmp, jpg, png etc.) The actual images are stored in a secondary memory (hard disks on several servers either in a central location or distributed). The number of images would be roughly 4 x 10 6. A set of pointers to these images can be stored in a 1 (or 2) dimensional array. When you click on a point on the map, its index in the array is calculated. From that index, the image is accessed and sent by a network to the requesting client.
25
Some projects from past semesters Generate all the poker hands More generally, given a set of N items and a number k<= N, generate all possible combinations or permutations of k items. (concept: recursion, arrays, lists) Image manipulation: (concept: arrays, library, algorithm analysis) 1)
26
image manipulation: (concept: arrays, library, analysis of algorithm)
27
Bounding box construction: OCR is one of the early success stories in software applications. Scan a printed page and recognize the characters in it. First step: bounding box construction.
28
Input: Output: “In 1830 there were but twenty-three miles of railroad in operation in the United States, and in that year Kentucky took … “ Final step:
29
Spelling checker: Given a text file T, identify all the misspelled words in T. Idea: build a hash table H of all the words in a dictionary, and search for each word of the text T in the table H. For each misspelled word, suggest the correct spelling. (hashing, strings, vectors)
30
Peg solitaire (backtracking, recursion, hash table) Find a sequence of moves that leaves exactly one peg on the board. (starting position can be specified. In some cases, there may be no solution.)
31
Geometric computation problem – given a set of rectangles, determine the total area covered by them. Trace the contour, report all intersections etc. Data structure: binary search tree.
32
Given two photographs of the same scene taken from two different positions, combine them into a single image.
33
Image compression original (compressed x10) (compressed x 50) (Quadtree data structure)
34
Index generation for a document Index contains the list of all the words appearing in a document, with the line numbers in which they appear. Typical index for a book looks: Data structure binary search tree, hash table
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.