Download presentation
Presentation is loading. Please wait.
1
PDX SUG 2010
2
Efficiently Manipulating the Meta Data of a Dataset
Presented by: Sumner Williams
3
Introduction Our efficiency tool will be PROC DATASETS
4
CAPABILITIES Obtain the contents of datasets Transform variables
SAS Library Management Manipulate dataset attributes
5
SYNTAX PROC DATASETS supports run group processing
Some statements form implied run groups Use a RUN statement to process previous statements Use a QUIT statement to exit PROC DATASETS
6
DATASET CONTENTS PROC DATASETS LIBRARY=SASHELP ;
CONTENTS DATA=ZIPCODE DETAILS SHORT CENTILES OUT=TEST OUT2 = TEST2 ; The library is specified with the datasets statement A request for contents is made on the data zipcode NODETAILS|DETAILS option to print details to listing window SHORT prints abbreviated output (if used with DETAILS) CENTILES prints the centiles for indexes OUT gives the same contents as PROC CONTENTS OUT2 will give details on the integrity constraints and indexes
7
DATASET CONTENTS PROC DATASETS LIBRARY=SASHELP; CONTENTS DATA=ZIPCODE
NODETAILS SHORT CENTILES OUT=WORK.TEST OUT2 = WORK.TEST2 ; and are specifying the work directory. Otherwise CONTENTS are dumped into the specified library
8
DATASET CONTENTS The OUT= data set shown above is similar to the proc contents data set The OUT2= option produces the index (and integrity constraint) data set
9
DATASET COPYING PROC DATASETS; COPY IN=SASHELP OUT=WORK;
SELECT ZIPCODE ; MODIFY ZIPCODE (PW=PW); RUN; is copying from the SASHELP library to the WORK library if the SELECT statement is not used, the entire library is copied a password is being added to protect the dataset
10
DATASET PASSWORDS PROC DATASETS LIBRARY=WORK; MODIFY ZIPCODE (PW=PW1) ; RUN; Several levels to passwords on datasets each listed by hierarchy: PW= This sets all of the passwords ALTER= Allows the user to ALTER the dataset and read it READ= Allows the user to READ the dataset PW= in a modify statement is setting an unset password PW= in a modify statement is also giving the password so that the dataset can be modified PW=sets the ALTER, READ, and PW passwords
11
DATASET PASSWORDS All three passwords are being set.
PROC DATASETS; COPY IN=SASHELP OUT=WORK; SELECT ZIPCODE ; MODIFY ZIPCODE (PW=PW ALTER=ALTER READ=READ) ; RUN; All three passwords are being set. PROC DATASETS NOLIST; DELETE ZIPCODE (PW=PW); DELETE ZIPCODE (ALTER=ALTER); RUN; If ALTER is different than PW, then to make changes to a data set the ALTER password needs to be used
12
DATASET PASSWORDS Notice the passwords are in plaintext for the world to see SAS passwords on datasets are very insecure Built in to prevent inopportune data handling proc pwencode in=‘PW' method=sas001; run; Passwords can be encoded using PROC PWENCODE This prevents casual observation of plaintext passwords It is still not secure as any competent hacker could break the encoding See SAS 9.2 Online Documentation for more on passwords
13
DATASET PASSWORDS Changes the password from PW1 to PW2
PROC DATASETS LIBRARY=WORK; MODIFY ZIPCODE (PW=PW1/PW2 ) ; RUN; Changes the password from PW1 to PW2 I was unable to figure out how to set an encoded password on a SAS dataset using PROC DATASETS
14
DATASET MOVING PROC DATASETS NOLIST; COPY IN=SASHELP OUT=WORK MOVE ; SELECT ZIPCODE ; MODIFY ZIPCODE (PW=PASSWORD) ; RUN; is moving ZIPCODE from the SASHELP library to the WORK library SELECT the ZIPCODE dataset (moves entire library otherwise) adds a new password to ZIPCODE
15
DATASET RENAME PROC DATASETS; CHANGE ZIPCODE = ZIPCODE2 /
ALTER = PASSWORD ; RUN; will rename the dataset ZIPCODE to ZIPCODE2 gives the password set in the previous slide.
16
AGING DATASETS proc datasets; copy in = sashelp out=work;
select zipcode; age zipcode zip1-zip5; run; Get a copy of the zip code dataset into the work directory Age the zipcode dataset Zip5 is deleted and replaced by zip4 Zip3 – Zip4, Zip2 – Zip3, etc. Zipcode becomes Zip1 Zipcode no longer exists and must be regenerated
17
AGING DATASETS Use for temporary datasets spanning a time period
Example: the invoices for the last several days It would be faster to query a small subset rather than the entire database
18
VARIABLE MANIPULATIONS
PROC DATASETS; MODIFY ZIPCODE (ALTER=PASSWORD); RENAME COUNTYNM = COUNTYNAME STATE = STATENUM; LABEL COUNTYNM = "COUNTY NAME" STATE = "STATE NUMBER"; FORMAT Y 6.3 X 6.3; RUN; will open up ZIPCODE2 to be MODIFY’D will RENAME two variables will put LABELs on two different variables will FORMAT two variables
19
VARIABLE MANIPULATIONS
PROC DATASETS; ATTR ZIPCODE (ALTER=PASSWORD); RENAME COUNTYNM = COUNTYNAME STATE = STATENUM; LABEL COUNTYNM = "COUNTY NAME" STATE = "STATE NUMBER"; FORMAT Y 6.3 X 6.3; RUN; ATTR(IBUTE) can be used instead of MODIFY
20
DATASET APPEND PROC SQL; CREATE TABLE ZIPCODE3 LIKE ZIPCODE2
(PW=PASSWORD); QUIT; PROC DATASETS; APPEND BASE=ZIPCODE3 DATA=ZIPCODE2 (PW=PASSWORD) FORCE ; RUN; This will copy all of the attributes of ZIPCODE2 into ZIPCODE3 (no rows are copied) APPEND the data in ZIPCODE2 to ZIPCODE3 FORCE the append even if the variable formats/informats are different
21
DATASET COPY IN THE SAME LIBRARY
PROC DATASETS; APPEND BASE=NEW DATA=ZIPCODE (PW=PASSWORD); RUN; if NEW had not existed the ZIPCODE2 just copies If NEW did exist, would append. Make sure it does not exist first.
22
REPAIRING SAS DATASETS
PROC DATASETS LIB=SHW_TEST ; REPAIR ZIPCODE4 ; RUN; How does a dataset get damaged Moving a dataset into an area without enough disk space I/O error while moving a dataset A system failure during an update a repair can not be done on a temporary directory will attempt to repair the dataset
23
EXCHANGING SAS DATASETS
PROC DATASETS; EXCHANGE ZIPCODE2 = ZIPCODE3; RUN; This is really just EXCHANGE(ING) the names at the top of the datasets
24
INDEXING SAS DATASETS Data set will not need to be sorted for many procedures operating on the index Index can be seen in the OUT2 dataset from the CONTENTS statement Should only be used if the index can extract a small subset of the data
25
DELETING INDEXING IN SAS DATASETS
PROC DATASETS; MODIFY ZIPCODE (ALTER=PASSWORD); INDEX DELETE ZIP ; INDEX DELETE _ALL_ ; RUN; This will delete one index This will delete _ALL_ indexes
26
CREATING INDEXING IN SAS DATASETS
PROC DATASETS; MODIFY ZIPCODE3 (ALTER=PASSWORD); INDEX CREATE ZIP ; INDEX CREATE statecity = (STATECODE CITY) ; RUN; This will create an index based on one variable This will create one index name statecity which is based on two variables In order to see which indexes exist, use OUT2 on the CONTENTS statement
27
INTEGRITY CONSTRAINTS FOR SAS DATASETS
ICs prevent data from being out of range (negative age, pregnant men, government surplus, etc.) ICs are used with modify run groups ICs can be simple or complex
28
DELETING INTEGRITY CONSTRAINTS FOR SAS DATASETS
PROC DATASETS; MODIFY ZIPS; IC DELETE _ALL_ ; RUN; A MODIFY statement is required to alter ICs will delete all ICs
29
IC Creation Introduced
The integrity constraint can not already exist. ERROR: An integrity constraint named NNCITY with the same definition already exists for file WORK.ZIPCODE6.DATA. The integrity constraint will receive an automatic name if left unnamed Default name Constraint Type _NMxxxx_ Not Null _UNxxxx_ Unique _CKxxxx_ Check _PKxxxx_ Primary key _FKxxxx_ Foreign key
30
CREATING INTEGRITY CONSTRAINTS FOR SAS DATASETS
PROC DATASETS; MODIFY ZIPCODE5; IC CREATE NNCITY = NOT NULL(CITY) ; IC CREATE UNZIP = UNIQUE(ZIP) ; RUN; This will create two integrity constraints will require the variable city to not have a null value will require that all ZIP values be unique DISTINCT could be used in place of UNIQUE
31
CREATING INTEGRITY CONSTRAINTS FOR SAS DATASETS
PROC DATASETS; MODIFY ZIPCODE5; IC CREATE CKSTATECODE = CHECK(WHERE =(STATENUM > 0)) ; IC CREATE PKZIP = PRIMARY KEY(ZIP) ; RUN; is a check integrity constraint where values must meet the criteria set by the WHERE clause. creates a primary key for the dataset Note that the primary key and unique were used on the same variable (ZIP) The primary key has to be unique, but doing both is not required.
32
FOREIGN KEY INTEGRITY CONSTRAINTS FOR SAS DATASETS
Build a dataset that has a primary key data zips; set zipcode5 (keep = zip rename = (zip = newname)) end = eof; run; PROC DATASETS; MODIFY ZIPS; IC CREATE PK = PRIMARY KEY(NEWNAME); RUN; Same as what was done in previous slide
33
FOREIGN KEY INTEGRITY CONSTRAINTS FOR SAS DATASETS
PROC DATASETS; MODIFY ZIPCODE6; IC CREATE FKLOC = FOREIGN KEY (zip2) references zips on update Cascade on delete restrict ; RUN; The variable ZIP2 in the ZIPCODE6 dataset is being referenced The just built ZIPS data set has the foreign key The ZIPS dataset must have a primary key already established
34
FOREIGN KEY INTEGRITY CONSTRAINTS FOR SAS DATASETS
PROC DATASETS; MODIFY ZIPCODE6; IC CREATE FKLOC = FOREIGN KEY (zip2) references zips on update Cascade on delete restrict ; RUN; Changes to the current dataset will cascade after an update ON DELETE will restrict deletions
35
FOREIGN KEY INTEGRITY CONSTRAINTS FOR SAS DATASETS
PROC DATASETS; MODIFY ZIPCODE6; IC CREATE FKLOC = FOREIGN KEY (zip2) references zips on update Cascade on delete restrict ; RUN; and are not required Changes can not be made to the dataset that is being referenced Very complex topic (or at least SAS’s implementation of it) Examples are not well laid out by SAS 9.2 Online Documentation
36
RECAP OF TOPICS PROC DATASETS is very powerful and has many uses for manipulating SAS datasets Uses RUN GROUP processing Some of its functionality includes: COPY libraries/data sets/variables/ICs/Indexes DELETE libraries/data sets/variables/ICs/Indexes EXCHANGE data sets CREATE data sets/ICs/Indexes Password handling
37
MORE CAN BE FOUND PROC DATASETS can do much more More can be found at:
SAS 9.2 Online Documentation PROC DATASETS; The Swiss Army Knife of SAS© Procedures Cody’s Data Cleaning Techniques Using SAS© Plus many of the SUGI/SGF papers that have been written over the years
38
REFERENCES SAS 9.2 ONLINE DOCUMENTATION FOR PROC DATASETS
RIATHEL, M.A.: PROC DATASETS; THE SWISS ARMY KNIFE OF SAS© PROCEDURES PAPER SAS GLOBAL FORUM 2010 CODY, R.: CODY’S DATA CLEANING TECHNIQUES USING SAS© SAS PUBLISHING, 2008
39
CONTACT INFORMATION Sumner Williams 315 SW 5th Ave Portland, OR 97204
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.