Efficient Use of Disk Space in SAS® Application Programs

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

© Vera Castleman Software Grade 10. What is software? A program is a collection of instructions to do a job. Programs are collectively known as SOFTWARE.
Backing Up a Hard Disk CGS2564. Why Backup Programs? Faster Optimized to copy files Can specify only files that have changed Safer Can verify backed up.
Institute for Clinical and Translational Science (ICTS) Fred McClurg Neil Nuehring New Features and Improvements in REDCap
Guide to extract/download multiple databases from Mainframe Tapes to PC using SAS PC Fereydoun J. Foroudian Blue Cross of California SAS is a registered.
Brian Alderman | MCT, CEO / Founder of MicroTechPoint Pete Harris | Microsoft Senior Content Publisher.
© 2008 Kroll Ontrack Inc.| Ontrack PowerControls 5.1 The ultimate “power tool” for SharePoint administrators.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
What is Asset Bank? Asset Bank is an enterprise-scale Digital Asset Management system A fully searchable, categorised library of digital images, videos.
© 2009 Kroll Ontrack Inc.| Ontrack PowerControls 6.0 for SharePoint™ A Better Way to Search and Restore.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
® IBM Software Group © 2012 IBM Corporation OPTIM Data Studio – Jon Sayles, IBM/Rational November, 2012.
Multiple Uses for a Simple SQL Procedure Rebecca Larsen University of South Florida.
%rtf2data: A utility macro to convert RTF Table to SAS® dataset
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Chris Wright Senior Systems Engineer, Lucity MOVING TO ONE DATABASE FOR SQL SERVER.
DB Zip Expert Portable database backup and export/import Copyright © SoftTree Technologies, Inc.
© 2011 Autodesk How to Excel at Data Extraction Martin Duke CADD Manager – Aurecon - Queensland.
SQL Injection Jason Dunn. SQL Overview Structured Query Language For use with Databases Purpose is to retrieve information Main Statements Select Insert.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
What is a Package? A package is an Oracle object, which holds other objects within it. Objects commonly held within a package are procedures, functions,
Copyright © 2005, SAS Institute Inc. All rights reserved. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries.
SQLintersection Putting the "Squeeze" on Large Tables Improve Performance and Save Space with Data Compression Justin Randall Tuesday,
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
0 © Copyright 2013 Wellesley Information Services, Inc. All rights reserved. HANDS ON LAB: Exploring SAP HANA Capabilities and SAP HANA Modeling.
ERDDAP The Next Generation of Data Servers Bob Simons DOC / NOAA / NMFS / SWFSC / ERD Monterey, CA Disclaimer: The opinions expressed.
DAY 14: ACCESS CHAPTER 1 RAHUL KAVI October 8,
Introduction to File Processing with PHP. Review of Course Outcomes 1. Implement file reading and writing programs using PHP. 2. Identify file access.
Content Management System (CMS) Introduction for the Prospective Students site.
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
 CONACT UC:  Magnific training   
Database Planning Database Design Normalization.
SQL Basics Review Reviewing what we’ve learned so far…….
Dynamic SQL Writing Efficient Queries on the Fly ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Build your Metadata with PROC CONTENTS and ODS OUTPUT Louise S. Hadden Abt Associates Inc.
Better Metadata Through SAS® II: %SYSFUNC, PROC DATASETS, and Dictionary Tables.
Agenda for Today  DATABASE Definition What is DBMS? Types Of Database Most Popular Primary Database  SQL Definition What is SQL Server? Versions Of SQL.
Database Design: Solving Problems Before they Start! Ed Pollack Database Administrator CommerceHub.
Unit 4 – Technology literacy
Storage and File Organization
Session 1 Retrieving Data From a Single Table
Tips for Mastering Relational Databases Using SAS/ACCESS®
Practical Database Design and Tuning
Databases.
Dynamic SQL Writing Efficient Queries on the Fly
Poster Title Author #1 name, ABC Corporation, City, Country Author #2 name, ABC Corporation, City, Country Abstract A brief abstract at the beginning summarizes.
A brief introduction to the topic
BIF713 Managing Disk Space.
Chapter 2: Getting Data into SAS
SAS Programming Introduction to SAS.
Introduction to SQL Server Management for the Non-DBA
Introduction of Week 3 Assignment Discussion
Relational Algebra Chapter 4, Part A
ECONOMETRICS ii – spring 2018
Poster Title Author #1 name, ABC Corporation, City, Country Author #2 name, ABC Corporation, City, Country Abstract A brief abstract at the beginning summarizes.
Chapter 18: Modifying SAS Data Sets and Tracking Changes
SQL 2014 In-Memory OLTP What, Why, and How
SAS and all other SAS Institute Inc
PROC DOC III: Self-generating Codebooks Using SAS®
Practical Database Design and Tuning
Welcome ! Excel 2013/2016 Data Consolidation (Lab Format)
Architecture + system-based How to assign passwords
USER MANUAL - WORLDSCINET
PubMed Database Interface (Basic Course: Module 4)
USER MANUAL - WORLDSCINET
Presentation transcript:

Efficient Use of Disk Space in SAS® Application Programs Thomas E. Billings; MUFG Union Bank, N.A., San Francisco, California (Working remote from Florida) Speaker bio: Thomas Billings has used SAS since the mid-1970’s in multiple industries and applications. He currently works in Banking, building and analyzing databases. A list of the author’s SAS-related papers, including URLs for free access, is available at: https://goo.gl/uCUHoa Author contact email: tebillings@gmail.com

Efficient Use of Disk Space in SAS® Application Programs Thomas E. Billings MUFG Union Bank, N.A., This work by Thomas E. Billings is licensed (2018) under a Creative Commons Attribution 4.0 International License.

Disclaimer The contents of the paper herein are solely the author’s thoughts and opinions, which do not represent those of MUFG Union Bank N.A. The bank does not endorse, recommend, or promote any of the computing architectures, platforms, software, programming techniques or styles referenced in this paper.

Setting Expectations This paper and presentation are a high-level overview: Multiple factors relevant to space management Multiple SAS tools for space management Techniques for efficient space management and other related topics Many topics so coverage of each topic is limited; this is a 20 minute talk and not a 2 hour hands-on-workshop.

Basic Housekeeping: File Cleanup Reasons to save files: Regulatory, legal, or audit requirements; Plan or expectation to reuse the files/programs in the future. Files to save can be categorized: active now or recent past, or expected active in future files that won’t be used in the near future, but a copy should be saved. Also: versioning, roll off old files

Backup: same environment Use formal IT backup process for production files, regulatory, legal, audit files. Less formal processes: Internal servers; your own workstation File format: SAS ® data sets: is native .sas7bdat safe, long-term (i.e., OS won’t change or CEDA will cover changes?) Compiled objects – backup source code to create Formats – convert to data sets; source code to create; catalog migration - PROC CPORT/CIMPORT

Backup: same/cross-environment Programs – zip, tar, or uncompressed/as-is if small files Graphs – PROC CPORT, source code to regenerate Reproducible research – zip, tar Cross-environment: Transport format works across systems PROC CPORT, CIMPORT; also XPORT data engine CEDA = cross-environment data access. This makes some cross-environment transfers easier.

SAS Tools for File Management (1) PROC DELETE: easiest way to delete SAS data set: proc delete data=a.b1 a.b2; run; PROC DATASETS: more versatile, powerful proc datasets lib=####; delete filename1 filename2 … / mtype=data; delete viewname1 viewname2 … / mtype=view; quit; NOLIST, NOWARN options can be useful.

SAS Tools for File Management (2) SQL DROP statement: TABLE, VIEW, INDEX SAS DATA step functions (more advanced) Can use functions to get list of files FOPTNAME – Unix/Linux permissions FDELETE – if you have permission

File Compression Important and effective way to reduce use of disk space Small CPU penalty, but reduced I/O – overall more efficient In SAS: System option and data set option COMPRESS=YES|CHAR|BINARY|NO COMPRESS=YES|CHAR for character dominant data COMPRESS=BINARY for numeric dominant data with many repeated values

PROC FORMAT: variable-level compression Context: files with many long text variables, that have limited number of values. PROC FORMAT can be used to recode long text values into shorter strings, reducing disk space. Need 2 formats for each variable: Long string  short Short string  long (reports/apps typically need the long form) Users need access to, knowledge of formats.

Logical deletion of rows PROC SQL, FEDSQL support logical deletion of rows DATA step MODIFY, UPDATE # physical rows >> # logical rows == unused space Identify situation by comparing # physical rows to # logical rows in a SAS data set (dictionary tables) Tables with large amount of unused space should be regenerated – copied in a way that frees the unused space while preserving sorts, indexes, constraints REUSE=YES option relevant here

Programming techniques Replace multiple DATA/SORTs with SQL or vice-versa. Replace multiple small DATA steps with fewer, larger (in scope) DATA steps, VIEWs. Limit columns & rows with KEEP=, DROP=, WHERE= data set options and similar statements PROC DS2 can replace some DATA steps, run in-database VIEWs plus more normalized data structure: build large denormalized extracts as needed for reporting

Managing the SAS WORK library WORK space is a shared resource, finite. Jobs that hang can lock up WORK space Jobs that sequentially create large files: insert code to delete intermediate files USER= option to redirect files from WORK library to user-supplied libref. Applies to single token file names; DATA=B not DATA=A.B. (Option not supported in CAS) Similar functionality via macro-variables

Author contact-email: tebillings@gmail.com A list of the author’s SAS-related papers, including URLs for free access, is available at: https://goo.gl/uCUHoa SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

Monday tips Simple, basic housekeeping: delete redundant SAS files and move files that are not in active use to backup. This is a best practice and promotes efficient use of disk space. SAS file compression via use of DATA step and/or system options is a best practice and promotes efficient use of disk space. Logical deletion of rows in a SAS data set without similar physical deletion can waste disk space. This situation can be detected by comparing the number of logical vs. physical rows in a file. A list of the author’s SAS-related papers, including URLs for free access, is available at: https://goo.gl/uCUHoa Author contact-email: tebillings@gmail.com