Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Use of Disk Space in SAS® Application Programs

Similar presentations


Presentation on theme: "Efficient Use of Disk Space in SAS® Application Programs"— Presentation transcript:

1 Efficient Use of Disk Space in SAS® Application Programs
Thomas E. Billings; MUFG Union Bank, N.A., San Francisco, California (Working remote from Florida) Speaker bio: Thomas Billings has used SAS since the mid-1970’s in multiple industries and applications. He currently works in Banking, building and analyzing databases. A list of the author’s SAS-related papers, including URLs for free access, is available at: Author contact

2 Efficient Use of Disk Space in SAS® Application Programs
Thomas E. Billings MUFG Union Bank, N.A., This work by Thomas E. Billings is licensed (2018) under a Creative Commons Attribution 4.0 International License.

3 Disclaimer The contents of the paper herein are solely the author’s thoughts and opinions, which do not represent those of MUFG Union Bank N.A. The bank does not endorse, recommend, or promote any of the computing architectures, platforms, software, programming techniques or styles referenced in this paper.

4 Setting Expectations This paper and presentation are a high-level overview: Multiple factors relevant to space management Multiple SAS tools for space management Techniques for efficient space management and other related topics Many topics so coverage of each topic is limited; this is a 20 minute talk and not a 2 hour hands-on-workshop.

5 Basic Housekeeping: File Cleanup
Reasons to save files: Regulatory, legal, or audit requirements; Plan or expectation to reuse the files/programs in the future. Files to save can be categorized: active now or recent past, or expected active in future files that won’t be used in the near future, but a copy should be saved. Also: versioning, roll off old files

6 Backup: same environment
Use formal IT backup process for production files, regulatory, legal, audit files. Less formal processes: Internal servers; your own workstation File format: SAS ® data sets: is native .sas7bdat safe, long-term (i.e., OS won’t change or CEDA will cover changes?) Compiled objects – backup source code to create Formats – convert to data sets; source code to create; catalog migration - PROC CPORT/CIMPORT

7 Backup: same/cross-environment
Programs – zip, tar, or uncompressed/as-is if small files Graphs – PROC CPORT, source code to regenerate Reproducible research – zip, tar Cross-environment: Transport format works across systems PROC CPORT, CIMPORT; also XPORT data engine CEDA = cross-environment data access. This makes some cross-environment transfers easier.

8 SAS Tools for File Management (1)
PROC DELETE: easiest way to delete SAS data set: proc delete data=a.b1 a.b2; run; PROC DATASETS: more versatile, powerful proc datasets lib=####; delete filename1 filename2 … / mtype=data; delete viewname1 viewname2 … / mtype=view; quit; NOLIST, NOWARN options can be useful.

9 SAS Tools for File Management (2)
SQL DROP statement: TABLE, VIEW, INDEX SAS DATA step functions (more advanced) Can use functions to get list of files FOPTNAME – Unix/Linux permissions FDELETE – if you have permission

10 File Compression Important and effective way to reduce use of disk space Small CPU penalty, but reduced I/O – overall more efficient In SAS: System option and data set option COMPRESS=YES|CHAR|BINARY|NO COMPRESS=YES|CHAR for character dominant data COMPRESS=BINARY for numeric dominant data with many repeated values

11 PROC FORMAT: variable-level compression
Context: files with many long text variables, that have limited number of values. PROC FORMAT can be used to recode long text values into shorter strings, reducing disk space. Need 2 formats for each variable: Long string  short Short string  long (reports/apps typically need the long form) Users need access to, knowledge of formats.

12 Logical deletion of rows
PROC SQL, FEDSQL support logical deletion of rows DATA step MODIFY, UPDATE # physical rows >> # logical rows == unused space Identify situation by comparing # physical rows to # logical rows in a SAS data set (dictionary tables) Tables with large amount of unused space should be regenerated – copied in a way that frees the unused space while preserving sorts, indexes, constraints REUSE=YES option relevant here

13 Programming techniques
Replace multiple DATA/SORTs with SQL or vice-versa. Replace multiple small DATA steps with fewer, larger (in scope) DATA steps, VIEWs. Limit columns & rows with KEEP=, DROP=, WHERE= data set options and similar statements PROC DS2 can replace some DATA steps, run in-database VIEWs plus more normalized data structure: build large denormalized extracts as needed for reporting

14 Managing the SAS WORK library
WORK space is a shared resource, finite. Jobs that hang can lock up WORK space Jobs that sequentially create large files: insert code to delete intermediate files USER= option to redirect files from WORK library to user-supplied libref. Applies to single token file names; DATA=B not DATA=A.B. (Option not supported in CAS) Similar functionality via macro-variables

15 Author contact- A list of the author’s SAS-related papers, including URLs for free access, is available at: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

16 Monday tips Simple, basic housekeeping: delete redundant SAS files and move files that are not in active use to backup. This is a best practice and promotes efficient use of disk space. SAS file compression via use of DATA step and/or system options is a best practice and promotes efficient use of disk space. Logical deletion of rows in a SAS data set without similar physical deletion can waste disk space. This situation can be detected by comparing the number of logical vs. physical rows in a file. A list of the author’s SAS-related papers, including URLs for free access, is available at: Author contact-


Download ppt "Efficient Use of Disk Space in SAS® Application Programs"

Similar presentations


Ads by Google