FEN 2011-02-05UCN T&B: IT Technology1 Session 11: Data Structures and Collections Lists ( Array based, linked) Sorting and Searching Hashing Trees System.Collections.Generic.

Slides:



Advertisements
Similar presentations
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Advertisements

Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Binary Trees, Binary Search Trees COMP171 Fall 2006.
TTIT33 Algorithms and Optimization – Lecture 5 Algorithms Jan Maluszynski - HT TTIT33 – Algorithms and optimization Lecture 5 Algorithms ADT Map,
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
A balanced life is a prefect life.
Universal Hashing When attempting to foil an malicious adversary, randomize the algorithm Universal hashing: pick a hash function randomly when the algorithm.
BST Data Structure A BST node contains: A BST contains
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
1 abstract containers hierarchical (1 to many) graph (many to many) first ith last sequence/linear (1 to 1) set.
Hashing General idea: Get a large array
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
Chapter Tow Search Trees BY HUSSEIN SALIM QASIM WESAM HRBI FADHEEL CS 6310 ADVANCE DATA STRUCTURE AND ALGORITHM DR. ELISE DE DONCKER 1.
FEN 2012UCN Technology - Computer Science 1 Data Structures and Collections Principles revisited.NET: –Two libraries: System.Collections System.Collections.Generics.
1 Trees Tree nomenclature Implementation strategies Traversals –Depth-first –Breadth-first Implementing binary search trees.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Final Review Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Chapter 10 A Algorithm Efficiency. © 2004 Pearson Addison-Wesley. All rights reserved 10 A-2 Determining the Efficiency of Algorithms Analysis of algorithms.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
Chapter 21 Priority Queue: Binary Heap Saurav Karmakar.
CS121 Data Structures CS121 © JAS 2004 Tables An abstract table, T, contains table entries that are either empty, or pairs of the form (K, I) where K is.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
CSC 211 Data Structures Lecture 13
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Starting at Binary Trees
1 Searching Searching in a sorted linked list takes linear time in the worst and average case. Searching in a sorted array takes logarithmic time in the.
Hashing Hashing is another method for sorting and searching data.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Dynamic Array. An Array-Based Implementation - Summary Good things:  Fast, random access of elements  Very memory efficient, very little memory is required.
Java Methods Big-O Analysis of Algorithms Object-Oriented Programming
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Interface: (e.g. IDictionary) Specification class Appl{ ---- IDictionary dic; dic= new XXX(); application class: Dictionary SortedDictionary ----
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
Data Structures and Collections Principles.NET: –Two libraries: System.Collections System.Collections.Generics FEN 2014UCN Teknologi/act2learn1 Deprecated.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
1 Principles revisited.NET: Two libraries: System.Collections System.Collections.Generics Data Structures and Collections.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
1 Priority Queues (Heaps). 2 Priority Queues Many applications require that we process records with keys in order, but not necessarily in full sorted.
Algorithms Design Fall 2016 Week 6 Hash Collusion Algorithms and Binary Search Trees.
Multiway Search Trees Data may not fit into main memory
Hashing CSE 2011 Winter July 2018.
DATA STRUCTURES AND OBJECT ORIENTED PROGRAMMING IN C++
Lecture 22 Binary Search Trees Chapter 10 of textbook
Hashing Exercises.
Binary Trees, Binary Search Trees
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Binary Trees, Binary Search Trees
Advanced Implementation of Tables
Binary Trees, Binary Search Trees
Presentation transcript:

FEN UCN T&B: IT Technology1 Session 11: Data Structures and Collections Lists ( Array based, linked) Sorting and Searching Hashing Trees System.Collections.Generic

Lists A data structure where elements are organised by position (index). ArrayList (List) and LinkedList Sometimes lists are called sequences. FEN UCN T&B: IT Technology2 One fixed size segment in memory. Each element has a reference to the next element. Hence elements may be allocated at different memory locations. numList

ArrayList Array-based: –Fixed size (statically allocated). –Always occupies maximum memory. –May grow or shrink dynamically, but that requires halting the application and allocation of a new array. Direct access to elements by position (index), otherwise searching is required. Inserting and deleting in the middle of the list requires moving (many) elements. FEN UCN T&B: IT Technology3

Linked Lists (LinkedList) A linked list consists of nodes representing elements. Each node contains a value (or value reference) and a reference (pointer) to the next element: FEN UCN T&B: IT Technology4

The list it self is represented by a reference to the first element, often called head The next-reference of the last element is usually null The linked list is dynamic in size: it grows and shrinks as needed. Access by position is slow (may require traversing the hole list). See this Java Example.Java Example FEN UCN T&B: IT Technology5 Linked Lists (LinkedList)

Figure 4.1 a) A linked list of integers; b) insertion; c) deletion FEN UCN T&B: IT Technology6

Implementation private class Node { private object val; private Node next; public Node(object v, Node n) { val= v; next= n; } FEN UCN T&B: IT Technology7 public object Val { get{return val;} set{val= value;} } public Node Next { get{return next;} set{next= value;} } Class Node

Linked Implementation of ADT list class LinkedList { private class Node //… Node head,tail; int n;//number of elements public LinkedList() { head= null; tail= null; n= 0; } public int Count { get { return n; } } FEN UCN T&B: IT Technology8 public void AddFront(object o) { Node tmp = new Node(o, null); if (Count == 0)//list is empty tail = tmp; else tmp.Next = head; head = tmp; n++; }

public void Print() {//for debugging... Node p = head; //start of list while (p != null) //while not end of list { Console.WriteLine(p.Val); //print current value p = p.Next; //set p to next element of the list } FEN UCN T&B: IT Technology9 Traversing a Linked List tail headp

public int FindPos(object o) { //Returns the position of o in the list (counting from 0). //If o is not contained, -1 is return. bool found = false; int i = 0; Node p = head; while (!found && p != null){ if (p.Val.Equals(o)) found = true; else{ p = p.Next; i++; } if (found) return i; else return -1; } FEN UCN T&B: IT Technology10 Finding a Position in a Linked List

Dynamic vs. Static Data Structures Array-Based Lists: –Fixed (static) size (waste of memory). –May be able to grown and shrink (ArrayList), but this is very expensive in running time (O(n)) –Provides direct access to elements from index (O(1)) –May be sorted. Hence binary search gives fast access (O(log n)) Linked List Implementations: –Uses only the necessary space (grows and shrinks as needed). –Overhead to references and memory allocation –Only sequential access: access by index requires searching (expensive: O(n)) FEN UCN T&B: IT Technology11 numList

Linked List - Variants Using a tail-reference FEN UCN T&B: IT Technology12

Using a dummy head node FEN UCN T&B: IT Technology13

Circular FEN UCN T&B: IT Technology14

Doubly Linked List FEN UCN T&B: IT Technology 15

…operations become more complicated … FEN UCN T&B: IT Technology16

The Full Monty…. (LinkedList) FEN UCN T&B: IT Technology17

Search Trees: Dynamic Data Structures with Fast Search Binary Trees Binary Search Trees General Trees (Composite Pattern) Balanced Search Trees (2-3 Trees etc.) B- Trees (external, database index) FEN UCN T&B: IT Technology18

FEN UCN T&B: IT Technology19 Terminology General trees: –leaf/external node/terminal –root –internal node –siblings, children, parents, ancestors, descendents –sub trees –the depth or height of a node = number of ancestors –the depth or height of a tree = max depth/height for any leaf

FEN UCN T&B: IT Technology20 Binary Trees A binary tree can be defined recursively by –Either the tree is empty –Or the tree is composed by a root with left and right sub trees, which are binary trees themselves Note: contrary to general trees binary trees –have ordered sub trees (left and right) –may be empty

FEN UCN T&B: IT Technology21 Reference Based Implementation

FEN UCN T&B: IT Technology22 Figure 10.9 Figure 10.9 Traversals of a binary tree: a) preorder; b) inorder; c) postorder

FEN UCN T&B: IT Technology23 Binary Search Trees Value based container: –The search tree property: For any internal node: the value in the root is greater than the value in the left child For any internal node: the value in the root is less than the value in the right child –Note the recursive nature of this definition: It implies that all sub trees themselves are search trees Every operation must ensure that the search tree property is maintained (invariant)

FEN UCN T&B: IT Technology24 Example: A Binary Search Tree Holding Names

FEN UCN T&B: IT Technology25 Balance Problems (skewed tree): Values are inserted in sorted order

FEN UCN T&B: IT Technology26 InOrder: Traversal Visits Nodes in Sorted Order

FEN UCN T&B: IT Technology27 Efficiency insert retrieve delete –All depends on the depth of the tree –If insertions and deletions are uniformly distributed, then the tree will eventually grow skewed O(log n) / O(n)

FEN UCN T&B: IT Technology 28 Solution: Balanced Search Trees Trading time for space: –In worst case additional space in O(n) is required; but: –retrieve, insert and delete in O(log n) – also w.c.. Principle: –A node may hold several keys (n) and has several children (n+1) –A node must be at least half filled (n/2 keys) –Insert and delete can be performed, so the tree is kept balanced in O(logn) 2-3-tree: k = 2

FEN UCN T&B: IT Technology Trees (n=2)

FEN UCN T&B: IT Technology 30 Retrieve Search using the same principle as in binary search trees: –Search the root –If not found, the search recursively in the appropriate sub tree –Performance is proportional to the height of the tree –Since the tree is balanced: O(log n)

FEN UCN T&B: IT Technology 31 Insertion The insert algorithm must ensure that the 2-3-tree properties are conserved. It goes like this: –Search down through the tree to the appropriate leaf node and insert –If there is room in the leaf, then we are done –Otherwise split the leaf node into two new leafs and move the middle value up into the parent node –If there is no room in the parent, then continue recursive until a node with room is reached, or –Eventually the root is reached. If there is no room in the root, then a new root is created, and the height of the tree is increased –Performance depends on the height of the tree (searching down through the tree + in worst case a trip from the leaf to the root rebalancing on the way up) –That is: O(log n)

FEN UCN T&B: IT Technology 32 Inserting 39 (there is room)

FEN UCN T&B: IT Technology 33 Inserting 38 (there is no room in the leaf) Insert any way, Split leaf and Move middle value up

FEN UCN T&B: IT Technology 34 Inserting 37 (there is room)

FEN UCN T&B: IT Technology 35 Inserting 36 (there is no room) Split and move up

FEN UCN T&B: IT Technology 36 Inserting 35, 34 and 33 (there is room)

FEN UCN T&B: IT Technology 37 Deletion Like insertion – just the other way around:-) –find the node with the value to be deleted –If this is not a leaf, the swap with its inorder successor (which is always a leaf - why?), and remove the value –If there now is too few values (< n/2) in the leaf, then merge the node with a sibling and pull down a value from the parent node –If there now is too few values in the parent, then continue recursively until there are enough values or the root is reached –If the root becomes empty, the remove it and the height of the tree is decreased –Performance: once again: down and up through the tree : O(log n)

FEN UCN T&B: IT Technology 38 Balanced Search Trees Variants: –2-3-trees –2-3-4-trees –Red-Black-trees –AVL-trees –Splay-trees…. Is among other used for realisation of the map/dictionary/table ADT In Java.Collections: TreeMap and TreeSet

An Alternative to Sorting and Searching: Hashing Keys are converted to indices in an array. A hash function, h maps a key to an integer, the hash code. The hash code is divided by the array size and the remainder is used as index If two or more keys gives the same index, we have a collision. FEN UCN T&B: IT Technology 39

Collision Handling Avoiding collisions: –Use a prime as the size of the array: Trying to store keys with hash codes 200, 205, 210, 215, 220,.., 595 in an array of size 100 yields three collisions for each key. But an array with size 101 results in no collision. –Choose a good hash function: this is a (mathematical) discipline of its own FEN UCN T&B: IT Technology 40

Collision Handling Probing is searching for a near by free slot in the array. Probing may be: –Linear (h(x)+1, +2, +3, +4,…) –Quadratic (h(x)+1, +2, +4, +8,…) –Double hashing –…–… FEN UCN T&B: IT Technology 41

Chaining The array doesn’t hold the element itself, but a reference to a collection (a linked list for instance) of all colliding elements. On search that list must be traversed FEN UCN T&B: IT Technology 42

Efficiency of Hashing Worst case (maximum collisions): –retrieve, insert, delete all O(n) Average number of collisions depends on the load factor, λ, not on table size λ = (number of used entries)/(table size) –But not on n. Typically (linear probing): numberOfCollisions avg = 1/(1 - λ) Example: 75% of the table entries in use: –λ = 0.75: 1/(1-0.75) = 4 collisions in average (independent of the table size). FEN UCN T&B: IT Technology 43

When Hashing Is Inefficient Traversing in key order. Find smallest/largest key. Range-search (Find all keys between high and low). Searching on something else than the designated primary key. FEN UCN T&B: IT Technology 44 See this Java ExampleJava Example

FEN UCN T&B: IT Technology 45.NET 2: System.Collections.Generics ICollection IList LinkedList IDictionary List Dictionary SortedDictionary Index able Array-based Balanced search tree Hashtabel (key, value) -pair

interface: (i.e. Dictionary) Specification class Appl{ ---- IDictionary d; m= new XXXDictionary(); Application class: Dictionary SortedDictionary ---- ADT Data Structures and Algorithms Select and use ADT, i.e.: Dictionary Select and use data structure, i.e. SortedDictionary Knowledge of. Read and write (use) specifications Learning Goals FEN UCN T&B: IT Technology 46

Exercises Consider some of our programmes (Banking, Forest, AndersenAndAsp, for instance). Would it be better to use some other collection instead of List? Try to chance the implementation in one or more of your programs, so, for instance a hash table is used. Implement InsertAt(int index, object element) and RemoveAt(int index) on the linked list.linked list FEN UCN T&B: IT Technology 47

48 Time Complexity – Big-”O” Investigation of the use of time and/or space of an algorithm Normally one looks at –Worst-case (easer to determine) –Only growth rates – not exact measures –Counts the number of some “basic operations” (a computation, a comparison of to elements etc.). FEN UCN T&B: IT Technology

49 Big-O notation: The complexity of an algorithm is notated with “Big-O” –O(f(n)), n is the size of the problem (number of input elements, for instance), f is a function that indicates the efficiency of the algorithm, for instance n (the running time is linear in problem size) –Big-O: is asymptotic (only holds for large values of n) –Big-O: only regards most significant term –Big-O: ignores constants FEN UCN T&B: IT Technology

50 Examples public int sum (int a, b) { int sum; sum = a + b; return sum; } What is the basic operation? public int sum (int[] a) { int sum= 0; for(int i= 0; i<a.length; i++) sum= sum+a[i]; return sum; } What is the basic operation? O(1) O(n) FEN UCN T&B: IT Technology

51 Searching Linear search in a sequence with n elements: O(n) (why?) Binary search in a sorted sequence with n elements: O(log n) (why?) What about sweep algorithms? Complexity O(n) FEN UCN T&B: IT Technology

52 Constant and Linear complexity Consider an algorithm working on a sequence of length n: –If running time is independent of n, then the time complexity is constant or O(1) –If we (in worst case) has to do some thing to every element, then the time complexity is linear or O(n) –There are other possibilities: Quadratic O(n 2 ) (some sorting algorithms), O(nlogn) (better sorting algorithms, logarithmic O(log n) (binary search), exponential O(2 n ) (“difficult” problems like the Towers of Hanoi – more on 3 rd semester ) FEN UCN T&B: IT Technology

53 Does it matter…? “år” means “year” “døgn” means “day ” NOTE Assuming one basic operation in 1 ns (one billion operations pr. sec. – GHz) FEN UCN T&B: IT Technology

54 A Rule of Thumb For each nested loop the complexity must be multiplied with a factor n: for(int i = 0; i < n; i++)O(n) {…} for(int i = 0; i < n; i++) { for(int j = 0; j < n; j++)O(n 2 ) {…} } FEN UCN T&B: IT Technology

55 O(1) public add(int n) { lastIndex++; data[lastIndex] = n; } Both statements are basic and their performance is independent of the size of the array FEN UCN T&B: IT Technology

56 O(n)O(n) public void insert(int i, int newInt) { // make room for newInt for(int j = data.length; j > i; j++) data[j] = data[j-1]; data[i] = newInt;//insert newInt } The for-loop indicates a time complexity of O(n) FEN UCN T&B: IT Technology

57 O(n2)O(n2) public void sort() { for (int j = 0; j < numbers.size(); j++){ for (int i = 0; i < numbers.size()-1; i++){ if (numbers.get(i) > numbers.get(i+1)) swap(i,i+1);//swaps elements i and i+1 }//end for }//end sort Nested for-loops suggestO(n 2 ) FEN UCN T&B: IT Technology