Download presentation
Presentation is loading. Please wait.
Published byAshlee Richard Modified over 7 years ago
1
: PROC SORT: C O O P R R S T Jayson Shurgold, Health Canada Ottawa Area SAS Users Society November 17th |
2
Disclaimer The views expressed in this presentation are the personal views of the presenting staff and do not necessarily represent the views of Health Canada or the Government of Canada. The presentation is provided for general information purposes intended only as an academic resource and does not constitute professional advice. Information has been summarized and paraphrased for presentation purposes and the examples are theoretical and have been provided for illustration purposes only.
3
About me: B.Sc in Health Sciences at Brock University (2007)
Civil Service since (PHAC ; HC 2016) Vacationed in Vancouver between 2012 and 2015 Father of two boys, aged two and 5 months Interests Bass guitar Boxing Whiskey Sleep
4
Introduction to Sorting
Sorting – systematic arrangement of things – is something common to us all Natural Sort Orders – Numbers, Alphabet (simple – one class) Logic Sort Order – Cards (more complex – multiple class) Learned Sort Order – Social Hierarchies (most complex?) A process often taken for granted.
5
Introduction to Sorting
Dewey Decimal Classification (DDC) - A classification system published by Melvil Dewey in 1876 - The order in which books are sorted onto shelves Exercise 1 : Arrange the books in the correct DDC order. 1.1 1.2 1.3 1.5 1.4
6
Introduction to Sorting
Dewey Decimal Classification (DDC) - A classification system published by Melvil Dewey in 1876 - The order in which books are sorted onto shelves Exercise 1 : Arrange the books in the correct DDC order. 1.1 1.2 1.3 1.5 1.4 1.1 1.2 1.3 1.4 1.5
7
Introduction to Sorting
Dewey Decimal Classification (DDC) - A classification system published by Melvil Dewey in 1876 - The order in which books are sorted onto shelves Exercise 1 : Arrange the books in the correct DDC order. 1.1 1.2 1.3 1.5 1.4 1.1 1.2 1.3 1.4 1.5
8
The Insertion Sort A simple procedure that constructs an array one item at a time Intuitive in small numbers, but inefficient in large numbers 1.1 1.2 1.3 1.5 1.4 Starting Positions Core Logic Ending Positions 1.1 1.2 1.3 1.4 1.5
9
The Insertion Sort A simple procedure that constructs an array one item at a time Intuitive in small numbers, but inefficient in large numbers Starting Positions Core Logic Ending Positions 1.4 1.1 1.2 1.3 1.5 Compare reference against value Swap reference with value if reference < value Iterate n times 1.1 1.2 1.3 1.4 1.5
10
Automated Machine Sorting
1880 US Census – Handwritten Reports Resident population of 50,189,209 Data processing of 8 years by hand 1890 US Census – Punch Cards / Tabulating Machine Resident population of 62,947,714 Data processing of 6 years The company responsible for these machines grew to become IBM
11
Automated Machine Sorting
The tabulating machine sorted by constructing an array by least significant digit, and repeating the process to most significant digit. This methodology is called a Radix Sort. 45 802 170 090 045 Core Logic Iteration 1 90 1. Group keys based on least significant digit 2. Repeat for next least significant digit Iteration 2 Iteration 3 Starting Order
12
Automated Machine Sorting
The tabulating machine sorted by constructing an array by least significant digit, and repeating the process to most significant digit. This methodology is called a Radix Sort. 45 802 170 090 045 Core Logic Starting Order Iteration 1 90 1. Group keys based on least significant digit 2. Repeat for next least significant digit Iteration 2 Iteration 3 0 | 0 | 2 | 5 0 | 4 | 7 | 9 0 | 0 | 1 | 8
13
Automated Computer Sorting
First Place: Electronic Discrete Variable Computer (EDVAC) Initialized in 1946 Based on a decimal system Second Place: Electronic Numerical Integrator and Computer (ENIAC) Initialized in 1949 Based on binary EDVAC ENIAC John von Neumann
14
Automated Computer Sorting
John von Neumann: Merge-Sort Algorithm Core Logic 1. Divide unsorted list into sub-lists 2. Sort sub-lists 3. Merge sorted sub-lists into master sub-list Starting Order Iteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6
15
Sorting Logic – What does it look like?
Example 1: Insertion Sort
16
Sorting Logic – What does it look like?
Example 2: Radix Sort
17
Sorting Logic – What does it look like?
Example 3: Merge Sort
19
Sorting Data in SAS Value 5 6 3 4 7 2 1
Exercise 2: Considering the following data, recommend a methodology compatible with SAS that could achieve an ascending sorted list? Your Data Value 5 6 3 4 7 2 1
20
Sorting Data in SAS Value 5 6 3 4 7 2 1
Exercise 2: Considering the following data, how would you manipulate the data to achieve an ascending sorted list? Your Data Answer 1 Value 5 6 3 4 7 2 1
21
Sorting Data in SAS Value 5 6 3 4 7 2 1
Exercise 2: Considering the following data, how would you manipulate the data to achieve an ascending sorted list? Your Data Answer 1 Answer 2 Value 5 6 3 4 7 2 1
22
How to use Proc Sort - Simple
SAS Code Description Specify the Library and Dataset Name Specify the Variable to sort the Dataset by Input Dataset: Library.Data Result: Library.Data Variable Result 3 P . NA 1 F 2 Variable Result . NA 1 F 2 P 3
23
How to use Proc Sort – Less Simple
SAS Code Description Output results to a new dataset Only write variables a, x, and z to output dataset, and rename x to y Output only one observation per by variable combination Sort ascending by variable z, then sub-sort descending by variable a Only consider observations where variable z is not missing Input Dataset: Lib.Old Output Dataset: Lib.New z b a x 3 T F 0.9 . M 1.1 1 5.9 4.6 2 3.9 2.2 z a y 1 M 2.2 F 5.9 2 3.9 3 0.9
24
Proc Sort- What’s Actually Happening?
Proc Sort is a powerful and commonly used tool, but not many people know exactly what’s going on behind the scenes. My Question: What is the specific algorithm used by SAS to sort data? Looking back: Example 1: Insertion Sort Example 2: Radix Sort Example 3: Merge Sort Internet theories: Proprietary Software – Unpublished Multiple sorting algorithms depending on complexity and resources Magic, especially since version 9 Very different approaches Different strengths and weaknesses
25
Proc Sort- What’s Actually Happening?
Proc Sort is a powerful and commonly used tool, but not many people know exactly what’s going on behind the scenes. My Question: What is the specific algorithm used by SAS to sort data?
26
What is Heap Sort? Invented in 1964, Heap Sort is a two step process: Create the Heap Arrange data into a binary tree formation Compare daughters against parents Swap parent with daughter if daughter is higher (recursive) Sort the Heap Swap the highest order of the heap with the lowest order of the heap Remove the highest order from the heap Update heap | Swap parent with daughter if daughter is higher (recursive)
27
Heap Sort – Visualized
28
Questions?
29
References
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.