Outline This topic discusses the insertion sort We will discuss:

Slides:



Advertisements
Similar presentations
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Advertisements

ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada ece.uwaterloo.ca.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo, Ontario, Canada ece.uwaterloo.ca.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
1 Linked Lists Assignment What about assignment? –Suppose you have linked lists: List lst1, lst2; lst1.push_front( 35 ); lst1.push_front( 18 ); lst2.push_front(
1 Graph theory Outline A graph is an abstract data type for storing adjacency relations –We start with definitions: Vertices, edges, degree and sub-graphs.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Outline This topic covers merge sort
Outline In this topic, we will: Define a binary min-heap
Sort & Search Algorithms
The mechanics of recursion
Stack.
Topological Sort In this topic, we will discuss: Motivations
Outline In this topic we will look at:
Outline The bucket sort makes assumptions about the data being sorted
Algorithm Analysis CSE 2011 Winter September 2018.
Multiway Search Trees.
unweighted path lengths
2008/12/03: Lecture 20 CMSC 104, Section 0101 John Y. Park
Recursive binary search
Templates.
Static variables.
The structured programming theorem
Poisson distribution.
Binary search.
Open addressing.
Node-based storage with arrays
All-pairs Shortest Path
All-pairs shortest path
Searching and Sorting Topics Sequential Search on an Unordered File
Outline In this topic we will look at quicksort:
Sorted arrays.
Break statements.
Sorting … and Insertion Sort.
Logical operators.
Selection sort.
UMBC CMSC 104 – Section 01, Fall 2016
Bucket sort.
Sorting algorithms.
Dynamically allocating arrays
Insertion sort.
A list-size member variable
Selection sort.
Insertion sort.
Templates.
Insertion sort.
Sorted arrays.
Sorting algorithms.
Counting sort.
Selection sort.
Searching and sorting arrays
An array class: constructor and destructor
The structured programming theorem
Algorithms and templates
Presentation transcript:

Outline This topic discusses the insertion sort We will discuss: the algorithm an example run-time analysis worst case average case best case

Background Consider the following observations: 8.2 A list with one element is sorted In general, if we have a sorted list of k items, we can insert a new item to create a sorted list of size k + 1

Background 8.2 For example, consider this sorted array containing of eight sorted entries Suppose we want to insert 14 into this array leaving the resulting array sorted 5 7 12 19 21 26 33 40 14 9 18 2

Background 8.2 Starting at the back, if the number is greater than 14, copy it to the right Once an entry less than 14 is found, insert 14 into the resulting vacancy

The Algorithm For any unsorted list: 8.2.1 For any unsorted list: Treat the first element as a sorted list of size 1 Then, given a sorted list of size k – 1 Insert the kth item in the unsorted list into the sorted list The sorted list is now of size k

The Algorithm Recall the five sorting techniques: 8.2.1 Recall the five sorting techniques: Insertion Exchange Selection Merging Distribution Clearly insertion falls into the first category

The Algorithm Code for this would be: 8.2.2 for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.2 This would be embedded in a function call such as template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.3 Let’s do a run-time analysis of this code template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.3 The Q(1)-initialization of the outer for-loop is executed once template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.3 This Q(1)- condition will be tested n times at which point it fails template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.3 Thus, the inner for-loop will be executed a total of n – 1 times template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.3 In the worst case, the inner for-loop is executed a total of k times template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.3 The body of the inner for-loop runs in Q(1) in either case template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } Thus, the worst-case run time is

Implementation and Analysis 8.2.3 Problem: we may break out of the inner loop… template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.3 Recall: each time we perform a swap, we remove an inversion template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.3 As soon as a pair array[j - 1] <= array[j], we are finished template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; }

Implementation and Analysis 8.2.3 Thus, the body is run only as often as there are inversions template <typename Type> void insertion_sort( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { for ( int j = k; j > 0; --j ) { if ( array[j - 1] > array[j] ) { std::swap( array[j - 1], array[j] ); } else { // As soon as we don't need to swap, the (k + 1)st // is in the correct location break; } If the number of inversions is d, the run time is Q(n + d)

Consequences of Our Analysis 8.2.3 A random list will have d = O(n2) inversions The average random list has d = Q(n2) inversions Insertion sort, however, will run in Q(n) time whenever d = O(n) Other benefits: The algorithm is easy to implement Even in the worst case, the algorithm is fast for small problems Considering these run times, it appears to be approximately 10 instructions per inversion Size Approximate Time (ns) 8 175 16 750 32 2700 64 8000

Consequences of Our Analysis 8.2.3 Unfortunately, it is not very useful in general: Sorting a random list of size 223 ≈ 8 000 000 would require approximately one day Doubling the size of the list quadruples the required run time An optimized quick sort requires less than 4 s on a list of the above size

Consequences of Our Analysis 8.2.3 The following table summarizes the run-times of insertion sort Case Run Time Comments Worst Q(n2) Reverse sorted Average O(d + n) Slow if d = w(n) Best Q(n) Very few inversions: d = O(n)

The Algorithm 8.2.4 Now, swapping is expensive, so we could just temporarily assign the new entry speeds up the algorithm by a factor of two

Implementation and Analysis 8.2.4 A reasonably good* implementation template <typename Type> void insertion( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { Type tmp = array[k]; for ( int j = k; k > 0; --j ) { if ( array[j - 1] > tmp ) { array[j] = array[j - 1]; } else { array[j] = tmp; goto finished; } array[0] = tmp; // only executed if tmp < array[0] finished: ; // empty statement * we could do better with pointers

A Comment about Using goto 8.2.5 Why use goto? Isn’t goto “evil”?! template <typename Type> void insertion( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { Type tmp = array[k]; for ( int j = k; k > 0; --j ) { if ( array[j - 1] > tmp ) { array[j] = array[j - 1]; } else { array[j] = tmp; goto finished; } array[0] = tmp; // only executed if tmp < array[0] finished: ; // empty statement

A Comment about Using goto 8.2.5 Even xkcd knows it is evil! http://xkcd.com/292/

A Comment about Using goto 8.2.5 Unfortunately, any other fix appears to be less than elegant… template <typename Type> void insertion( Type *const array, int const n ) { for ( int k = 1; k < n; ++k ) { Type tmp = array[k]; for ( int j = k; k > 0; --j ) { if ( array[j - 1] > tmp ) { array[j] = array[j - 1]; } else { array[j] = tmp; break; } if ( array[0] > tmp ) { array[0] = tmp; // only executed if tmp < array[0]

A Comment about Using goto 8.2.5 The excessive use of the goto statement resulted in spaghetti code Edsger Dijkstra’s 1968 commentary “Go To Statement Considered Harmful” started what (against Dijkstra’s better judgment) took on the form of a religious crusade against using goto Donald Knuth gives a more reasoned approach in his 1972 paper “Structured Programming with go to Statements” Unfortunately, in 2011, one student saw this and immediately used a goto that resulted in spaghetti code and claimed it was “reasonable” In general, use a goto if and only if it is going forward to an obvious point in the routine

Summary Insertion Sort: Insert new entries into growing sorted lists Run-time analysis Actual and average case run time: O(n2) Detailed analysis: Q(n + d) Best case (O(n) inversions): Q(n) Memory requirements: Q(1)

References [1] Donald E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd Ed., Addison Wesley, 1998, §5.2.1, p.80-82. [2] Cormen, Leiserson, and Rivest, Introduction to Algorithms, McGraw Hill, 1990, p.2-4, 6-9. [3] Weiss, Data Structures and Algorithm Analysis in C++, 3rd Ed., Addison Wesley, §8.2, p.262-5. [4] Edsger Dijkstra, Go To Statement Considered Harmful, Communications of the ACM 11 (3), pp.147–148, 1968.  [5] Donald Knuth, Structured Programming with Goto Statements, Computing Surveys 6 (4): pp.261–301, 1972. 

References Wikipedia, http://en.wikipedia.org/wiki/Insertion_sort [1] Donald E. Knuth, The Art of Computer Programming, Volume 3: Sorting and Searching, 2nd Ed., Addison Wesley, 1998, §5.2.1, p.80-82. [2] Cormen, Leiserson, and Rivest, Introduction to Algorithms, McGraw Hill, 1990, p.2-4, 6-9. [3] Weiss, Data Structures and Algorithm Analysis in C++, 3rd Ed., Addison Wesley, §8.2, p.262-5. [4] Edsger Dijkstra, Go To Statement Considered Harmful, Communications of the ACM 11 (3), pp.147–148, 1968.  [5] Donald Knuth, Structured Programming with Goto Statements, Computing Surveys 6 (4): pp.261–301, 1972.  These slides are provided for the ECE 250 Algorithms and Data Structures course. The material in it reflects Douglas W. Harder’s best judgment in light of the information available to him at the time of preparation. Any reliance on these course slides by any party for any other purpose are the responsibility of such parties. Douglas W. Harder accepts no responsibility for damages, if any, suffered by any party as a result of decisions made or actions based on these course slides for any other purpose than that for which it was intended.