Alex Chaplin October 30th, 2018

Slides:



Advertisements
Similar presentations
Start First step Create a new blank database Create a database using the option that will enable you to build your database using pre-set options. Save.
Advertisements

Create a new blank database First step SUBMITTry again.
Exploring Microsoft Excel 2002 Chapter 7 Chapter 7 List and Data Management: Converting Data to Information By Robert T. Grauer Maryann Barber Exploring.
BA271 Week 6 Lecture Database Overview Dave Sullivan.
Microsoft Access 2003 Introduction To Microsoft Access 2003.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 1 1 Microsoft Access 2003 Tutorial 1 – Introduction To Microsoft Access 2003.
October 2003Bent Thomsen - FIT 3-21 IT – som værktøj Bent Thomsen Institut for Datalogi Aalborg Universitet.
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
1 Data List Spreadsheets or simple databases - a different use of Spreadsheets Bent Thomsen.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
XP New Perspectives on Microsoft Access 2002 Tutorial 1 1 Microsoft Access 2002 Tutorial 1 – Introduction To Microsoft Access 2002.
Computer Literacy BASICS: A Comprehensive Guide to IC 3, 5 th Edition Lesson 19 Organizing and Enhancing Worksheets 1 Morrison / Wells / Ruffolo.
XP New Perspectives on Microsoft Access 2002 Tutorial 1 1 Microsoft Access 2002 Tutorial 1 – Introduction To Microsoft Access 2002.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
BACS 287 Structured Query Language 1. BACS 287 Visual Basic Table Access Visual Basic provides 2 mechanisms to access data in tables: – Record-at-a-time.
Microsoft Access. Microsoft access is a database programs that allows you to store retrieve, analyze and print information. Companies use databases for.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Copyright 2007, Information Builders. Slide 1 Scaling Large HTML Reports With Active Cache Mark Nesson,Vashti Ragoonath June 2008.
Laboratory 1. Introduction to SAS u Statistical Analysis System u Package for –data entry –data manipulation –data storage –data analysis –reporting.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
BA271 Week 6 Lecture Dave Sullivan. Goal for today… Status Report – Review where we are … Status Report – Review where we are … Begin learning about Microsoft.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
Gold – Crystal Reports Introductory Course Cortex User Group Meeting New Orleans – 2011.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Aliya Farheen October 29,2015.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
SAUSAG 69 – 20 Feb 2014 Smarter Sorts Jerry Le Breton (Softscape Solutions) & Doug Lean (DHS) Beyond the Obvious.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
A Guide to SQL, Eighth Edition Chapter Four Single-Table Queries.
Chapter 11: Sequential File Merging, Matching, and Updating Programming Logic and Design, Third Edition Comprehensive.
Chapter 23: Selecting Efficient Sorting Strategies 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
21 Copyright © 2009, Oracle. All rights reserved. Working with Oracle Business Intelligence Answers.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata RDBMS Concepts.
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Copyright 2009 The Little Engine That Could: Using EXCEL LIBNAME Engine Options to Enhance Data Transfers between SAS® and Microsoft® Excel Files William.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
Session 1 Retrieving Data From a Single Table
Fundamentals of DBMS Notes-1.
Microsoft Visual Basic 2010: Reloaded Fourth Edition
IST 220 – Intro to Databases
Creating Oracle Business Intelligence Interactive Dashboards
Applied Business Forecasting and Regression Analysis
GO! with Microsoft Office 2016
THE SORT STATEMENT for files (chp. 14)
BASIC INFORMATION ABOUT DATABASE MANAGEMENT SOFTWARE
GO! with Microsoft Access 2016
Former Chapter 23: Selecting Efficient Sorting Strategies
Microsoft Office Access 2003
Instructor: Raul Cruz-Cano
Chapter 4: Sorting, Printing, Summarizing
Tutorial 1 – Introduction To Microsoft Access 2003
MODULE 7 Microsoft Access 2010
Creating and Modifying Queries
Tutorial 3 – Querying a Database
Chapter 3 The DATA DIVISION.
EPARTConnection Setup and Marketing.
Tutorial 1 – Introduction To Microsoft Access 2003
Access Lesson 2 Creating a Database
Microsoft Office Access 2003
Chapter 14 Sorting and Merging.
Access: SQL Participation Project
From and Report.
Spreadsheets, Modelling & Databases
Grauer and Barber Series Microsoft Access Chapter One
Data Manipulation (with SQL)
Bent Thomsen Institut for Datalogi Aalborg Universitet
Lesson 13 Working with Tables
Presentation transcript:

Alex Chaplin October 30th, 2018 PROC SORT revisited Alex Chaplin October 30th, 2018

Demo data originates from SASHELP.SHOES and SASHELP.STOCK Proc sort revisited Demo data originates from SASHELP.SHOES and SASHELP.STOCK Sort Demo.sas has the examples used in the presentation Change libname to point to your personal SAS data library If running under SAS On Demand for Academics libname demo "/home/<your SAS ODA user id>"; Otherwise libname demo '<your pathname>'; Alternatively remove all references to libname demo and create outputs as work files

Saving space using compress and reuse Proc sort revisited What I will cover Filtering records Directing output Renaming fields Saving space using compress and reuse Eliminating duplicate keys and records Formatting output Useful system options

What I won't cover but covered in the references Proc sort revisited What I won't cover but covered in the references Using proc sql and hash tables instead of sort Reducing memory usage using tagsort Collating sequences for international alphabets

Single proc sort step. No data steps required. Proc sort revisited Request 1 Pull records for certain regions or where sales > $10,000 by descending Region and ascending number of stores. Solution Single proc sort step. No data steps required. proc sort data=shoes_demo1(where=(Region in ('Pacific','Middle East') or Region like('%Europe') or Sales > 10000)) out=shoes_demo2; by descending Region Stores; run;

Use a sort step to pull records and drop / rename columns Proc sort revisited Request 2 Pull records for Pacific and Middle East with product name changed from "Slipper" to "Indoor Footwear" in same sort order as before. Solution Use a data step to create a second product column that has "Slipper" renamed to "Indoor Footwear" Use a sort step to pull records and drop / rename columns proc sort data=shoes_demo2(where=(Region in ('Pacific','Middle East')) rename=(product2=product) drop=product) out=shoes_demo3; by descending Region Stores; run;

Use compress and reuse options Proc sort revisited Request 3 Save storage space Solution Use compress and reuse options Data level option overrides system option options compress=yes reuse=yes; /* Compress=binary best when mostly numeric fields */ proc sort data=demo.shoes out=shoes_demo5(compress=binary reuse=yes); by Region Product Subsidiary; run;

In this example compression does not save space Proc sort revisited Request 3 Save storage space Output In this example compression does not save space Compression rates can be over 90% NOTE: There were 395 observations read from the data set DEMO.SHOES. NOTE: The data set WORK.SHOES_DEMO5 has 395 observations and 8 variables. NOTE: Compressing data set WORK.SHOES_DEMO5 decreased size by 0.00 percent. Compressed is 1 pages; un-compressed would require 1 pages.

Eliminate duplicate keys for Region and Subsidiary Solution Proc sort revisited Request 4 Eliminate duplicate keys for Region and Subsidiary Solution Use nodupkey option Save records with duplicate keys proc sort data=demo.shoes(rename=(product2=product) drop=product) out=shoes_demo6 dupout=shoes_demo6_dups nodupkey; by Region Subsidiary; run;

Eliminate duplicate keys for Region and Subsidiary Proc sort revisited Request 4 - continued Eliminate duplicate keys for Region and Subsidiary Results 53 unique key values saved to output file 342 duplicate key values saved to duplicates file Beware Nodupkey eliminates records with duplicate keys not duplicate records Exception is when using by _ALL_ which treats the entire record as a key If you will reference values in non key fields be mindful of the effect of eliminating records with duplicate keys

Eliminate duplicate records when sorting by Region and Subsidiary Proc sort revisited Request 5 Eliminate duplicate records when sorting by Region and Subsidiary Solution Use noduprecs option Save duplicate records proc sort data=demo.shoes(rename=(product2=product) drop=product) out=shoes_demo7 dupout=shoes_demo7_dups noduprecs; by Region Subsidiary; run;

Eliminate duplicate records when sorting by Region and Subsidiary Proc sort revisited Request 5 - continued Eliminate duplicate records when sorting by Region and Subsidiary Results All 395 input records saved to output file No records saved to duplicates file

The effect of nodupkey vs noduprec Proc sort revisited The effect of nodupkey vs noduprec sort_option Region Subsidiary store_ct sales_am return_am nodupkey Africa Addis Ababa 12 $29,761.00 $769.00 noduprec 65 $467,429.00 $13,370.00 Algiers 21 $21,297.00 $710.00 101 $395,600.00 $12,763.00 Cairo 20 $4,846.00 $229.00 88 $738,198.00 $22,477.00 File de-duped using nodupkey has lower values for aggregated amounts because more records eliminated as duplicates than with noduprec

Show regions as continents Show sales formatted as dollars and cents Proc sort revisited Request 6 Show regions as continents Show sales formatted as dollars and cents Show best sellers in descending order and the continent

Use a custom format to show continents from regions Proc sort revisited Request 6 - Solution Use a custom format to show continents from regions Apply a dollar format to sales to show dollars and cents Use label and proc print proc format; value $continent 'Africa'='Africa' 'Asia','Middle East','Pacific'='Asia' 'Canada','Central America/Caribbean','South America','United States'='Americas' 'Eastern Europe','Western Europe'='Europe'; run; proc sort data=demo.shoes out=shoes_demo8; format region $continent. sales dollar15.2; label region = 'Continent'; by descending sales; proc print data=shoes_demo8 label;

Request 6 – Sample output from proc print Region labeled as Continent Proc sort revisited Request 6 – Sample output from proc print Region labeled as Continent Continent format applied to Region Total Sales formatted as dollars and cents Total Sales displayed in descending order Continent Product Subsidiary Number of Stores Total Sales Total Inventory Total Returns Asia Men's Casual Tel Aviv 11 $1,298,717.00 $2,881,005 $57,362 Americas Kingston 28 $576,112.00 $1,159,556 $20,005 Europe Women's Casual Copenhagen 26 $502,636.00 $1,110,412 $17,448 Africa Cairo 25 $360,209.00 $1,063,251 $9,424

Get Microsoft stock price between September 2005 and December 2005 Proc sort revisited Request 7 Get Microsoft stock price between September 2005 and December 2005 Show date formatted as YYYY-MM-DD Order by date

Use SAS date literals in where statement Proc sort revisited Request 7 - Solution Use SAS date literals in where statement Use between to select date range Apply date format to date field proc sort data=sashelp.stocks(where=(stock='Microsoft' and date between '01sep05'd and '01dec05'd)) out=demo.stocks; format date yymmdd10.; by date; run;

Date formatted as YYYY-MM-DD Proc sort revisited Request 7 - Output Date formatted as YYYY-MM-DD Selected date range between September 2005 and December 2005 Ordered by ascending date Stock Date Open High Low Close Volume AdjClose Microsoft 2005-09-01 $27.38 $27.39 $25.12 $25.73 66,976,476 $25.47 2005-10-03 $25.71 $25.80 $24.25 $25.70 72,132,475 $25.44 2005-11-01 $25.61 $28.25 $27.68 71,469,194 $27.48 2005-12-01 $27.73 $28.10 $26.10 $26.15 62,892,384 $25.96

Useful system options proc options; run; Proc sort revisited Useful system options proc options; run; FULLSTIMER Writes all available performance statistics to the SAS log. COMPRESS Specifies the type of compression to use for observations in output SAS data sets. REUSE Specifies whether SAS reuses space when observations are added to a compressed SAS data set. NOSORTVALIDATE SORT does not verify whether a data set is sorted according to the variables in the BY statement. SORTDUP Specifies whether PROC SORT removes duplicate variables based on the DROP and KEEP options or on all data set variables. SORTEQUALS PROC SORT maintains the relative position in the output data set for observations with identical BY-variable values. SORTSEQ Specifies a language-specific collating sequence for the SORT and SQL procedures. SORTSIZE Specifies the amount of memory that is available to the SORT procedure THREADS Uses threaded processing for SAS applications that support it.

Proc sort revisited References Hamilton, Jack. The Problem with NODUPLICATES, SUGI 25 http://www2.sas.com/proceedings/sugi25/25/po/25p221.pdf Hughes, Troy Martin. Sorting a Bajillion Records: Conquering Scalability in a Big Data World, SESUG 2016 https://support.sas.com/resources/papers/proceedings16/11888-2016.pdf KelseyBassett, Britta. The SORT Procedure – Beyond the Basics, SUGI 31 http://www2.sas.com/proceedings/sugi31/030-31.pdf Morgan, Derek. PROC SORT (then and) NOW, MWSUG 2017 https://www.mwsug.org/proceedings/2017/SA/MWSUG-2017-SA04.pdf

SAS® Press. Base SAS® 9.4 Procedures Guide, Seventh Edition Proc sort revisited References SAS® Press. Base SAS® 9.4 Procedures Guide, Seventh Edition  https://documentation.sas.com/?docsetId=proc&docsetTarget=p1nd17xr6wof4sn19zkmid81p926.htm&docsetVersion=9.4&locale=en SAS® OnDemand for Academics http://support.sas.com/software/products/ondemand-academics/#s1=1 Thewussen, Henri. Do not waste too many resources to get your data in a specific sequence, SAS Global Forum 2011 http://support.sas.com/resources/papers/proceedings11/242-2011.pdf

Proc sort revisited QUESTIONS