CSC 172 DATA STRUCTURES.

CSC 172 DATA STRUCTURES

THEORETICAL BOUND Many good sorting algorithms run in O(nlogn) time.
Can we do better? Can we reason about algorithms not yet invented?

THEORETICAL BOUND We can think of any comparison based sorting algorithm as a decision tree

THEORETICAL BOUND A binary tree of depth d has at most 2^d leaves.
A binay tree with L leaves must have depth at lease log(L) There are n! Arrangements of n items A tree with n! Leaves must have depth at least log(n!) log(n!) is at least Ω(n log n)

THEORETICAL BOUND log(n!) = log(n(n-1)(n-2)...(2)(1))
= log(n) + log(n-1) log(2) + log(1) >= log(n) + log(n-1) log(n/2) >= (n/2) log(n/2) >=(n/2) log(n) – (n/2) = Ω(n log n)

EXAMPLE A post office routes mail
letters are sorted into separate bags for different geographical areas, each of these bags is itself sorted into batches for smaller sub-regions, and so on until they are delivered. Radix Sort

RADIX SORT Radix sort considers the structure of the keys
Assume keys are represented in base M Sorting is done by comparing bits in the same position

RADIX EXCHANGE SORT 1XXX 0XXX 1XXX 0XXX 0XXX 1XXX 1XXX 1XXX 0XXX 1XXX
Examine bits from left to right Sort array with the respect to the leftmost bit 1XXX 0XXX 1XXX 0XXX 0XXX 1XXX 1XXX 1XXX 0XXX 1XXX

Scanning Partition REPEAT ; scan top down to find a “1” scan bottom up to find a “0” exchange UNTIL: scans cross 1XXX 0XXX 0XXX 1XXX 1XXX 0XXX 0XXX 0XXX 1XXX 1XXX 1XXX 1XXX 0XXX 1XXX 1XXX

Examine bits from left to right Sort array with the respect to the leftmost bit 1XXX 0XXX 0XXX 1XXX 0XXX 0XXX 0XXX 1XXX 1XXX 1XXX 1XXX 0XXX 1XXX 1XXX 1XXX

RADIX EXCHANGE SORT TIME: O(b N) 1XXX 0XXX 0XXX 1XXX 0XXX 0XXX 0XXX
Examine bits from left to right Sort array with the respect to the leftmost bit 1XXX 0XXX 0XXX 1XXX 0XXX 0XXX 0XXX 1XXX 1XXX 1XXX 1XXX 0XXX 1XXX 1XXX 1XXX TIME: O(b N)

Mergesort Quicksort Maximum Subsequence Fast Fourier Transform
DIVIDE AND CONQUER Mergesort Quicksort Maximum Subsequence Fast Fourier Transform

DIVIDE AND CONQUER Divide: Solve a subproblem by recursively calling on a subset of the data Conquer: The solution to the larger problem is formed from the solutions to the sub problems.

Example One dimensional pattern recognition
Input: a vector x of n floating point numbers Output: the maximum sum found in any contiguous subvector of the input. X[2..6] or 187 84 -23 -93 97 58 -53 26 59 -41 31

Obvious solution //check all pairs int sum; int maxsofar = 0;
for (int i = 0; i<x.length;i++) for (int j = i; j<x.length;j++){ sum = 0; for (int k = i;k<=j;k++) sum += x[k]; maxsofar = max(sum,maxsofar); }

A better solution // check all pairs int sum; int maxsofar = 0;
for (int i = 0; i<x.length;i++) { sum = 0; for (int j = i; j<x.length;j++){ sum += x[k]; // the sum of x[i..j] maxsofar = max(sum,maxsofar); }

Divide & Conquer To solve a problem of size n, recursively solve two sub-problems of size of size n/2, and combine their solutions to yield a solution to the complete problem.

D&C for LCS x a b ma mb ma , mb or: mc

Recursive D&D LCS public int LCS(int[] x){ return LCS(x,0,x.length-1);
} public int LCS(int[] x, int low, high ){ // the hard part

Recursive D&C LCS public int LCS(int[] x, int low, int high){
if (low>high) return 0; if (low == high) return max(0,x[low]); int mid = (low + high) /2; return max(LCS(x,low,mid), LCS(x,mid+1,high)); }// still need to do “mc”

How to find mc? Note that mc consists of two parts
The part starting at the boundary and reaching up The part ending at the boundary and reaching down The sum of these is mc mc mclower mcup

public int LCS(int[]x,int low,int high){
if (low>high) return 0; if (low == high) return max(0,x[low]); int mid = (low + high) /2; int umax = findUmax(x,mid+1,upper); int lmax = findLmax(x,low,middle); return max(LCS(x,lower,middle), LCS(middle+1,upper), lmax + umax); }

findLmax int findLmax(int[]x,int low,int mid){
double lmax = 0, sum = 0; for (int j = mid;j>=low;j--){ sum+=x[j]; lmax = max(lmax,sum); } return lmax; } // Run Time? In terms of middle-lower?

findUmax int findLmax(int[]x, int mid1,int high){
int umax = 0, sum = 0; for (int j = midd1;j<=high;j++){ sum+=x[j]; umax = max(lmax,sum); } return umax; } // Run Time? In terms of high-mid1?

Runtime of Div&Conq Please : Read p 449 & 450 of Weiss

RUNTIME T(N) = 2T(N/2) + O(N) is O(N log N)

In General 𝑇 𝑁 =𝑎𝑇 𝑁 𝑏 +𝑂 𝑁 𝑘 ;𝑎≥1,𝑏>1
𝑇 𝑁 =𝑎𝑇 𝑁 𝑏 +𝑂 𝑁 𝑘 ;𝑎≥1,𝑏>1 𝑇 𝑁 = 𝑂 𝑁 log 𝑏 𝑎 𝑖𝑓𝑎> 𝑏 𝑘 𝑂 𝑁 𝑘 log𝑁 𝑖𝑓𝑎= 𝑏 𝑘 𝑂 𝑁 𝑘 𝑖𝑓𝑎< 𝑏 𝑘

because by telescoping
𝑇 𝑁 =𝑎𝑇 𝑁 𝑏 +𝑂 𝑁 𝑘 𝑇 𝑁 =𝑇 𝑏 𝑚 = 𝑎 𝑚 𝑖=0 𝑚 𝑏 𝑘 𝑎 𝑖 𝑇 𝑁 = 𝑂 𝑁 log 𝑏 𝑎 𝑖𝑓𝑎> 𝑏 𝑘 𝑂 𝑁 𝑘 log 𝑏 𝑁 𝑖𝑓𝑎= 𝑏 𝑘 𝑂 𝑁 𝑘 𝑖𝑓𝑎< 𝑏 𝑘 by telescoping

CSC 172 DATA STRUCTURES.

Similar presentations

Presentation on theme: "CSC 172 DATA STRUCTURES."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSC 172 DATA STRUCTURES.

Similar presentations

Presentation on theme: "CSC 172 DATA STRUCTURES."— Presentation transcript:

Similar presentations

About project

Feedback