Download presentation
Presentation is loading. Please wait.
Published byErin Perry Modified over 9 years ago
1
The New SAS Engine For CRSP Benefits of using SAS engine for CRSP Setting up the interface between SAS and CRSP Examples of use Performance Benchmarks
2
Benefits Access native CRSPAccess Databases. Most SAS Dataset options work without modification (including keep, rename, and drop). Native SAS datasets and SASECRSP dataset can be used in the same statements. Procedures can directly access SASECRSP datasets. Supports random access on PERMNO. Supports global & PERMNO specific date restrictions. Supports functions and formats specific to CRSP trading calendars. Often requires no intermediate files. Saving coding time, CPU time, I/O, & Disk space.
3
Setting up the Interface After CRSP Stock Data and SAS/ETS TM 8.2 are installed independently. Two notes on installation - 1. 1. CRSPDB_SASCAL environment variable must be set to you latest CRSP data. 2. 2. SASECRSP is currently only available for Windows and HP/UX. After that, the interface can be accessed using simple extensions to the SAS libname statement. For instance, To access CRSP daily or monthly sample database respectively libname dstk sasecrsp ‘c:\dasample\' setid=10; libname mstk sasecrsp ‘c:\masample\' setid=20; The data files being accessed are the same ones that are accessed by FORTRAN, ‘C’, ts_print, or stk_print. Therefore disk space is only needed for one installation regardless of how users access the data. This saves not only disk space, but labor each time new data is shipped.
4
Printing Headers Example 1: Use of keep statement, and direct use of the proc print procedure proc print data = mstk.stkhead (keep = permno hcomnam begdt enddt); Obs PERMNO BEGDT ENDDT HCOMNAM 1 10107 19860331 20010731 MICROSOFT CORP 2 10823 19251231 19980529 KEYSPAN ENERGY CORP 3 12079 19251231 20010731 GENERAL MOTORS CORP 4 12490 19251231 20010731 INTERNATIONAL BUSINESS MACHS COR 5 14218 19251231 19890428 R J R NABISCO INC 6 14322 19251231 20010731 SEARS ROEBUCK & CO 7 15069 19251231 20010731 U S X MARATHON GROUP 8 18948 19300430 19961231 ROCKWELL INTERNATIONAL CORP 9 24360 19500731 20010731 KEYSPAN CORP 10 28353 19620131 20010731 RALSTON PURINA GROUP 11 52070 19710331 19841031 ELECTRONIC DATA SYS CORP 12 66931 19841031 19960628 GENERAL MOTORS CORP 13 68451 19851231 20010731 GENERAL MOTORS CORP 14 75228 19850830 20010731 IVAX CORP 15 75390 19881031 19990831 DEVON ENERGY CORP 16 76597 19910328 20001229 NABISCO GROUP HOLDING CORP 17 78946 19930226 19970530 DEAN WITTER DISCOVER & CO 18 79323 19930630 20010731 ALLSTATE CORP 19 79348 19930831 19950531 RALSTON CONTINENTAL BAKING GROUP 20 81191 19950131 20001229 NABISCO HOLDINGS CORP 21 83596 19960628 20010731 ELECTRONIC DATA SYSTEMS CORP 22 84381 19961231 20010731 ROCKWELL INTERNATIONAL CORP NEW 23 85349 19971031 20010731 ARVINMERITOR INC 24 86496 19990129 20010731 CONEXANT SYSTEMS INC 25 87137 19990831 20010731 DEVON ENERGY CORP NEW
5
Average Returns Example 2: Demonstrates date restriction on a libname, direct use of proc means, and output of native SAS datasets It is easy to restrict data to a specified date range before passing it to SAS for calculation. To Calculate Average monthly returns for all the securities in the sample database for the period of January, 1996 – December, 2000, use the following commands. libname mth5yr sasecrsp 'c:\masample\' range='19960131-20001229' setid=20; proc means data=mth5yr.ret noprint; var ret; by permno; output mean=avgret out=avgret; run;
6
Works with Proc SQL Example 3: Demonstrates SASECRSP and native SAS dataset in the same statement. /* Join with the company names and restrict to securities with exactly five years (60 months) of non-missing returns, and sort output by the avgret. */ proc sql; select a.permno, a.hcomnam, b.avgret from mth5yr.stkhead a, avgret b where a.permno=b.permno order and b._FREQ_ = 60 by avgret; run; PERMNO Company Name Header Returns __________________________________________________________________________ 14322 SEARS ROEBUCK & CO 0.004601 12079GENERAL MOTORS CORP 0.010279 28353RALSTON PURINA GROUP 0.013447 15069U S X MARATHON GROUP 0.013855 79323ALLSTATE CORP 0.018369 68451GENERAL MOTORS CORP 0.022518 24360KEYSPAN CORP 0.024243 75228IVAX CORP 0.026936 12490INTERNATIONAL BUSINESS MACHS COR 0.027325 10107MICROSOFT CORP 0.031976
7
Performance Metrics Example 4: The SASECRSP engines performs at near CRSPAccess ‘C’ native speed. This was run on the full December 2000 daily database on a local drive of Pentium III 733 MHz running Windows 2000. This example reads over 50 million return observations, and calculates the minimum, maximum, mean, and standard deviation in approximately 2 ½ minutes. libname dlyall sasecrsp 'c:\da200012\' setid=10; proc means data=dlyall.ret; var ret; run; N Mean Std Dev Minimum Maximum ___________________________________________________________________ 51398971 0.000845599 0.0442346 -0.9716981 12.7777777 ___________________________________________________________________ (Reads 205 MB and over 51 million observations in 2 minutes and 34 seconds) WARNING: Defaulting to selecting all PERMNOs in CRSP database. NOTE: There were 52365201 observations read from the dataset DLYALL.RET. NOTE: PROCEDURE MEANS used: real time 2:34.78 cpu time 2:31.12
8
Performance Metrics Example 5: This example demonstrates that random access by PERMNO is very efficient. Accessing the same underlying 51,000,000 observation dataset as the previous example, accessing just a single permno, takes just a faction of a second. libname dlyibm sasecrsp 'c:\da200012\' setid=10 permno=12490; proc means data=dlyibm.ret; var ret; run; N Mean Std Dev Minimum Maximum __________________________________________________________________ 9691 0.000533690 0.0159335 -0.2296296 0.1316364 __________________________________________________________________ (Reads one security (IBM) and the 9,695 observations in 0.07 seconds) NOTE: There were 9695 observations read from the dataset DLYIBM.RET. NOTE: PROCEDURE MEANS used: real time 0.07 seconds cpu time 0.04 seconds
9
Performance Metrics Example 6: This example demonstrates how a full pass of the header data can be made to restrict the data and then used to create an input list with date ranges. This input list is then used to restrict access to the 51,000,000 observation dataset from the first performance example, and it access just the securities and dates requested. Creating the list of securities, takes less than two seconds. The restricted list takes a little more than ½ minute. /* This example has two parts. First it does a subset to just companies that delist due to mergers and creates a dataset of PERMNO, delist date, and 200 trading days before the delist date. The second part on the next screen, then then subsets based on that universe of perms */ data mrgdt (keep=permno date1 date2); set dlyall.stkhead; where dlstcd >= 200 and dlstcd <= 299; crspenddt = crspdicd(enddt,0); /* convert enddt to a CRSP calendar date */ date1 = crspdcid(crspenddt - 200); /* calculate 200 trading days prior to enddt and store in YYYYMMDD format */ if date1 ne. and date1 >= begdt; /* subsetting if - ignores companies that do not have 200 trading days of CRSP data available. */ date2 = enddt; run; NOTE: There were 7790 observations read from the dataset DLYALL.STKHEAD. WHERE ((dlstcd>=200 and dlstcd<=299)); NOTE: The dataset WORK.MRGDT has 7640 observations and 3 variables. NOTE: DATA statement used: real time 1.21 seconds cpu time 0.42 seconds
10
libname dlymrg sasecrsp 'c:\da200012\' setid=10 inset=‘mrgdt,PERMNO,PERMNO,date1,date2’; proc means data=dlymrg.ret; var ret; run; N Mean Std Dev Minimum Maximum ______________________________________________________ 1532194 0.0022860 0.0400804 -0.8750000 2.5000000 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ NOTE: There were 1535640 observations read from the dataset DLYMRG.RET. NOTE: PROCEDURE MEANS used: real time 34.18 seconds cpu time 11.19 seconds Performance Metrics Example 6 – Continued A new libname needs to be created using the mrgdt dataset created in the previous step. Any datasets accessed using the dlymrg libname will automatically be restricted using random access to PERMNOs in the mrgdt dataset, and then after being read, but before being processed by SAS, restricted to the security specific date range defined by date1 and date2.
11
CRSP User Support Phone: 773.834.1025 9:30 A.M. – 2:00 P.M. Central Time Monday-Friday, Business Days EmailEmail: support@crsp.uchiccago.edu User Support Web Page FormUser Support Web Page Form: http://www.crsp.uchicago.edu/support/tech_hlp.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.