Collection types/Anders Børjesson

Slides:



Advertisements
Similar presentations
Dictionaries, Hash Tables, Collisions Resolution, Sets Svetlin Nakov Telerik Corporation
Advertisements

Data Structures and Collections
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
 2006 Pearson Education, Inc. All rights reserved Searching and Sorting.
Collection types Collection types.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
FEN 2012UCN Technology - Computer Science 1 Data Structures and Collections Principles revisited.NET: –Two libraries: System.Collections System.Collections.Generics.
Week 2 CS 361: Advanced Data Structures and Algorithms
Recursion, Complexity, and Searching and Sorting By Andrew Zeng.
HASHING Section 12.7 (P ). HASHING - have already seen binary and linear search and discussed when they might be useful (based on complexity)
Recursion, Complexity, and Sorting By Andrew Zeng.
Chapter 12 Recursion, Complexity, and Searching and Sorting
Data structures and algorithms in the collection framework 1 Part 2.
1 Lecture 16: Lists and vectors Binary search, Sorting.
C++ Programming: From Problem Analysis to Program Design, Second Edition Chapter 19: Searching and Sorting.
Hashing Hashing is another method for sorting and searching data.
Interface: (e.g. IDictionary) Specification class Appl{ ---- IDictionary dic; dic= new XXX(); application class: Dictionary SortedDictionary ----
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Dictionaries, Hash Tables, Collisions Resolution, Sets Svetlin Nakov Telerik Corporation
Data Structures and Collections Principles.NET: –Two libraries: System.Collections System.Collections.Generics FEN 2014UCN Teknologi/act2learn1 Deprecated.
1 Principles revisited.NET: Two libraries: System.Collections System.Collections.Generics Data Structures and Collections.
Chapter 23 Sorting Jung Soo (Sue) Lim Cal State LA.
Prof. U V THETE Dept. of Computer Science YMA
Sections 10.5 – 10.6 Hashing.
Chapter 13 Recursion Copyright © 2016 Pearson, Inc. All rights reserved.
16 Searching and Sorting.
19 Searching and Sorting.
Sorting and "Big Oh" ASFA AP Computer Science A SortingBigOh.
Sort & Search Algorithms
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Advanced .NET Programming I 2nd Lecture
Computing with C# and the .NET Framework
Data Structures I (CPCS-204)
Combining Data Structures
CSCI 210 Data Structures and Algorithms
Data Structures and Algorithms for Information Processing
Introduction to Algorithms
Algorithm Analysis CSE 2011 Winter September 2018.
Hashing Exercises.
Teach A level Computing: Algorithms and Data Structures
Efficiency add remove find unsorted array O(1) O(n) sorted array
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Hashing CS2110 Spring 2018.
Advanced Associative Structures
CS313D: Advanced Programming Language
Building Java Programs
Hashing II CS2110 Spring 2018.
Lecture 10 List Richard Gesick.
Hashing CS2110.
Unit-2 Divide and Conquer
Data Structures and Algorithms
Winter 2018 CISC101 12/2/2018 CISC101 Reminders
Data Structures Review Session
Lesson 6. Types Equality and Identity. Collections.
MSIS 655 Advanced Business Applications Programming
Generics in C# / Anders Børjesson
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Searching CLRS, Sections 9.1 – 9.3.
24 Searching and Sorting.
Sub-Quadratic Sorting Algorithms
Building Java Programs
Chapter 4.
Introduction to Data Structures
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Algorithms: Design and Analysis
Fundaments of Game Design
Advanced .NET Programming I 3rd Lecture
Lecture-Hashing.
Presentation transcript:

Collection types/Anders Børjesson

Collection types/Anders Børjesson What is collections? Collections are containers That is objects which contains other objects The API of modern programming languages contains a number of collections, like Array, lists, sets, etc. The collections API includes some algorithms working on the collections Sorting, searching, etc. Collection types/Anders Børjesson

Generic vs. non-generic collections Generic collection (new) Non-generic collection (old) List<T> and LinkedList<T> ArrayList Dictionary<TKey, TValue> and SortedDictionary<TKey, TValue> HashTable Queue<T> Queue Stack<T> Stack SortedList<TKey, TValue> HashSet<T> and SortedSet<T> Array [] Collection types/Anders Børjesson

Collection interfaces Collection types/Anders Børjesson

Collection types/Anders Børjesson Array [] Class System.Array Memory layout The elements in an array neighbors in memory. An array has a fixed size It cannot grow or shrink Arrays are not generic Array implement a number of interfaces IEnumerable (non-generic) ICollection (non-generic) IList (non-generic) Collection types/Anders Børjesson

Implementation overview General purpose implementations Interfaces Resizable array Linked list Hash table IList<T> List<T> LinkedList<T> ISet<T> HashSet<T> IDictionary<T> Dictionary<T> Collection types/Anders Børjesson

Collection types/Anders Børjesson Lists A collection of objects that can be individually accessed by index. Interface: List IList<String> MyList; MyList[3] = “Anders”; String str = MyList[2] Classes List Elements are kept in a array: Elements are neighbors in memory Get is faster than LinkedList List will grow as needed: Create new array + move elements to new array. Takes a lot of time! Tuning parameter: new List(int initialize) LinkedList Elements are kept in a linked list: One element links to the next element Add + remove (at the beginning / middle) is generally faster than List OrderedList Elements are kept in sorting order Elements must implement the interface IComparable<T> Collection types/Anders Børjesson

Collection types/Anders Børjesson Sets Sets does not allow duplicate elements. The Equals(…) methods is used to check if an element is already in the Set Interface: ISet<T> bool Add(T element) Returns false if element is already in the set Set operations like IntersectWith(…), UnionWith(…), ExceptionWith(…) Classes HashSet Uses a hash table to keep the elements. The method element.GetHashCode() is used to find the position in the hash table SortedSet Elements are kept in sorting order Elements must implement the interface IComparable<T> Collection types/Anders Børjesson

Collection types/Anders Børjesson Dictionary Keeps (key, value) pairs Values is found by key. Keys must be unique Interface: IDictionary<TKey, TValue> Add(TKey key, TValue value) IDictionary<String, Student> st; st[“0102”] = SomeStudent; AnotherStudent = st[“0433”] Classes Dictionary Stores data in a hash table. The method key.GetHashCode() is used to find the position in the hash table SortedDictionary Sorted by key Collection types/Anders Børjesson

Collection types/Anders Børjesson Foreach loop Iterating a collection is usually done with a foreach loop List<String> names = … foreach (String name in names) { doSomething(name); } Is equivalent to Enumerator<String> enumerator = names.GetEnumerator(); while (enumerator.MoveNext()) { String name = enumerator.Current; doSomething(name); } Example: CollectionsTrying Collection types/Anders Børjesson

Iterating a Dictionary object A dictionary has (key, value) pairs Two ways to iterate The slow, but easy to write Get the set of keys and iterate this set Foreach (TKey key in dictionary.Keys) { doSomething(key); } The faster, but harder to write Iterate the set of (key, value) pair Foreach (KeyValuePair<TKey, TValue> pair in dictionary) { doSomething(pair); } KeyValuePair is a struct (not a class) Example: CollectionsTrying Collection types/Anders Børjesson

Collection types/Anders Børjesson Copy constructors A copy constructor is (1) a constructor that (2) copies elements from an existing object into the newly created object. Collection classes have copy constructors The copy constructors generally has a parameter (the existing object) of type IEnumerable. List(IEnumerable existingCollection) Queue(IEnumerable existingCollection) Etc. Dictionary(IDictionary existingDictionary) Collection types/Anders Børjesson

Collection types/Anders Børjesson Sorted collections SortedSet Set where elements are kept sorted SortedList List of (key, value) pairs. Sorted by key SortedDictionary (key, value) pairs. Keys are unique. Sorted by key Sorted collections are generally slower than un-sorted collections Sorting has a price: Only use the sorted collections if you really need them Elements must implement the interface IComparable<T> Or the constructor must have an IComparer<T> object as a parameter. Collection types/Anders Børjesson

Read-only collections New feature, .NET 4.5 Sometimes you want to return a read-only view of a collection from a method Example: GenericCatalog.GetAll() IReadOnlyCollection IEnumerable + Count property IReadOnlyList IReadOnlyDictionary Collection types/Anders Børjesson

Mutable collections vs. read-only collections Figures from http://msdn.microsoft.com/en-us/magazine/jj133817.aspx Collection types/Anders Børjesson

ReadOnlyCollection: Decorator design pattern ReadOnlyCollection<T> implements IList<T> Some interface as any other List<T> and LinkedList<T>, but mutating operations throws NotSupportedOperationException ReadOnlyCollection<T> aggregates ONE IList<T> object This IList<T> object will be decorated Example: CollectionsTrying Easy to use, but bad design Having a lot of public methods throwing NotSupportedOperationException Collection types/Anders Børjesson

Thread safe collections Ordinary collections Thread safe collections List<T>, ordered collection none ConcurrentBag<T>, not an ordered collection Stack<T> ConcurrentStack<T> Queue<T> ConcurrentQueue<T> Dictionary<TKey, TValue> ConcurrentDictionary<TKey, TValue> Collection types/Anders Børjesson

Algorthm complexity: Big O Big O indicates an upper bound on the computational resources (normally time) required to execute an algorithm O(1) constant time The time required does not depend on the amount of data This is very nice! O(n) linear time The time required depends on the amount of data. Example: Double data => double time O(n^2) quadratic time The time required depends (very much) on the amount of data Example: Double data => 4 times more time The is very serious!! O(log n) Better then O(n) O(n*log N) O(1) < O(log n) < O(n) < O(n*log n) < O(n^2) Collection types/Anders Børjesson

Collection types/Anders Børjesson Sorting in the C# API Sorted collections SortedSet, SortedList, etc. Keeps elements sorted as they are inserted. Sorting arrays Array.Sort(someArray) Uses the natural order (IComparable implemented on the element type) Array.Sort(someArray, IComparer) Uses QuickSort which is O(n * log n) Sorting lists List.Sort() method Converts the list to an array and uses Array.Sort(…) Simple sorting Uses O(n ^ 2) Example: CollectionsTrying Collection types/Anders Børjesson

Collection types/Anders Børjesson QuickSort Choose a random element (called the pivot) {or just pick the middle element} Divide the elements into two smaller sub-problems Left: elements < pivot Right elements >= pivot Do it again … QuickSort is the sorting algorithm used in the List<T>.Sort() When the problem size is < 16 it uses insertion sort Collection types/Anders Børjesson

Collection types/Anders Børjesson Searching in the C# API Binary search Searching a sorted list. Algorithmic outline: Searching for an element E Find the middle element If (E < middle Element) search the left half of the list Else search the right half of the list Using ONE if statement we get rid of half the data: That is efficient O(log n) Array.BinarySearch() + Array.BinarySearch(IComparer) List.BinarySearch() + List.BinarySearch(Icomparer) Example: CollectionsTrying Linear search Works on un-sorted lists. Start from the end (simple for loop) and continue till you find E or reach the end of the list. On the average you find E in the middle of the list – or continue to the end to conclude that E is not in the list O(n) Collection types/Anders Børjesson

Divide and conquer algorithms Recursively break down the problem into two (or more) sub-problems until the problem becomes simple enough to be solved directly. The solution to the sub-problems are then combined to give the solution to the original (big) problem. Examples: Binary search “Decrease and conquer” Quick sort Picks a random pivot (an element): Breaks the problem into two sub-problems: Left: smaller than pivot Right: larger than pivot Source: http://en.wikipedia.org/wiki/Divide_and_conquer_algorithms Collection types/Anders Børjesson

Collection types/Anders Børjesson Hashing Binary search is O(log n) We want something better: O(1) Idea: Compute a number (called the “hash value”) from the data are searching for Use the hash value as an index in an array (called the “hash table”) Every element in the array holds a “bucket” of elements If every bucket holds few elements (preferably 1) then hashing is O(1) Collection types/Anders Børjesson

Collection types/Anders Børjesson Hash function A good hash function distributes elements evenly in the hash table The worst hash function always return 0 (or another constant) Example Hash table with 10 slots Hash(int i) { return I % 10} % is the remainder operator Generally Hash table with N slots Hash(T t) { return operation(t) % N; } The operation should be fast and distribute elements well C#, class Object Public virtual int GetHashCode() Every object has this method Virtual: You can (and should) override the methods in you classes GetHashCode() and Equals() If the GetHashCode() send you to a bucket with more than ONE element, Equals() is used to find the right element in the bucket A.Equals(b) is true ⇒ a.GetHashCode() == b.GetHashCode() A.GetHashCode() == b.GetHashCode() ⇒ a.Equals(b) not necessarily A.GetHashCode() != b.GetHashCode() ⇒ a.Equals(b) is false Collection types/Anders Børjesson

Collection types/Anders Børjesson Hash table A hash table is basically an array. 2 elements computes the same hash value (same array index) Called a collision More elements in the same bucket Searching is no longer O(1) Problem If a hash table is almost full we get a lot of collisions. The load factor should be < 75% Solution: Re-hashing Create a larger hash table (array) + update hash function + move elements to the new hash table That takes a lot of time!! Collection types/Anders Børjesson

References and further readings MSDN Collections (C# and Visual Basic) http://msdn.microsoft.com/en-us/library/ybcx56wz.aspx John Sharp: Microsoft Visual C# 2012 Step by Step, Chapter 8 Using Collections, page 419-439 Bart De Smet: C# 5.0 Unleashed, Sams 2013 Chapter 16 Collection Types, page 755-787 Landwert: What’s new in the .NET4.5 Base Class Library Read-Only Collection Interfaces http://msdn.microsoft.com/en-us/magazine/jj133817.aspx Collection types/Anders Børjesson