Time Series Data Processes by Tai Yu April 15, 2013
61 %macro get_data(mn_yr0,mn_yr1,mn_yr2,mn_yr3,mn_yr4,mn_yr5, mn_yr6); data asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4. asof_&mn_yr3. asof_&mn_yr2. asof_&mn_yr1.; 64 set wfpf_delinq_data; if daysdelq<1 THEN DPD=0; else if daysdelq>180 THEN DPD=180; else DPD=daysdelq; if "28&mn_yr1.:00:00:00"dt<=ASOF_DT<"01&mn_yr0.:00:00:00"dt then do;dpd_1=dpd; output asof_&mn_yr1.; end; 69 else if "28&mn_yr2.:00:00:00"dt<=ASOF_DT<"01&mn_yr1.:00:00:00"dt then do; dpd_2=dpd; output asof_&mn_yr2.; end; 70 else if "28&mn_yr3.:00:00:00"dt<=ASOF_DT<"01&mn_yr2.:00:00:00"dt then do; dpd_3=dpd; output asof_&mn_yr3.; end; 71 else if "28&mn_yr4.:00:00:00"dt<=ASOF_DT<"01&mn_yr3.:00:00:00"dt then do; dpd_4=dpd; output asof_&mn_yr4.; end; 72 else if "28&mn_yr5.:00:00:00"dt<=ASOF_DT<"01&mn_yr4:00:00:00"dt then do; dpd_5=dpd; output asof_&mn_yr5.; end; 73 else if "28&mn_yr6.:00:00:00"dt<=ASOF_DT<"01&mn_yr5.:00:00:00"dt then do; dpd_6=dpd; output asof_&mn_yr6.; end; run; WHAT ARE YOU DOING??? 2
84 data _asof_&mn_yr1.; set asof_&mn_yr1.; drop dpd_2--dpd_6; run; 85 data _asof_&mn_yr2.; set asof_&mn_yr2.; drop dpd_1 dpd_3--dpd_6; run; 86 data _asof_&mn_yr3.; set asof_&mn_yr3.; drop dpd_1--dpd_2 dpd_4--dpd_6; run; 87 data _asof_&mn_yr4.; set asof_&mn_yr4.; drop dpd_1--dpd_3 dpd_5--dpd_6; run; 88 data _asof_&mn_yr5.; set asof_&mn_yr5.; drop dpd_1--dpd_4 dpd_6 ; run; 89 data _asof_&mn_yr6.; set asof_&mn_yr6.; drop dpd_1--dpd_5 ; run; data Cohort_12mn_dpd_&mn_yr1.; 92 merge _asof_&mn_yr6. _asof_&mn_yr5. _asof_&mn_yr4. _asof_&mn_yr3. _asof_&mn_yr2. _asof_&mn_yr1. (in=a); 93 by acct_id; 94 if a; 95 run; %mend; get_data(Sep2006, Aug2006,Jul2006,Jun2006,May2006,Apr2006, Mar2006,); WHAT ARE YOU DOING??? 3
61 %macro get_data(mn_yr0,mn_yr1,mn_yr2,mn_yr3,mn_yr4,mn_yr5, mn_yr6, 62 mn_yr6, mn_yr7,mn_yr8,mn_yr9,mn_yr10,mn_yr11, mn_yr12); 63 data asof_&mn_yr12. asof_&mn_yr11. asof_&mn_yr10. asof_&mn_yr9. asof_&mn_yr8. asof_&mn_yr7. 64 asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4. asof_&mn_yr3. asof_&mn_yr2. asof_&mn_yr1. ; 65 set wfpf_delinq_data; if daysdelq 180 THEN DPD=180; else DPD=daysdelq; if "28&mn_yr1.:00:00:00"dt<=ASOF_DT<"01& asof_&mn_yr0. :00:00:00"dt then do;dpd_1=dpd; output asof_&mn_yr1.; end; 70 else if "28&mn_yr2.:00:00:00"dt<=ASOF_DT<"01&mn_yr1.:00:00:00"dt then do; dpd_2=dpd; output asof_&mn_yr2.; end; 71 else if "28&mn_yr3.:00:00:00"dt<=ASOF_DT<"01&mn_yr2.:00:00:00"dt then do; dpd_3=dpd; output asof_&mn_yr3.; end; 72 else if "28&mn_yr4.:00:00:00"dt<=ASOF_DT<"01&mn_yr3.:00:00:00"dt then do; dpd_4=dpd; output asof_&mn_yr4.; end; 73 else if "28&mn_yr5.:00:00:00"dt<=ASOF_DT<"01&mn_yr4:00:00:00"dt then do; dpd_5=dpd; output asof_&mn_yr5.; end; 74 else if "28&mn_yr6.:00:00:00"dt<=ASOF_DT<"01&mn_yr5.:00:00:00"dt then do; dpd_6=dpd; output asof_&mn_yr6.; end; 75 else if "28&mn_yr7.:00:00:00"dt<=ASOF_DT<"01&mn_yr6.:00:00:00"dt then do; dpd_7=dpd; output asof_&mn_yr7.; end; 76 else if "28&mn_yr8.:00:00:00"dt<=ASOF_DT<"01&mn_yr7.:00:00:00"dt then do; dpd_8=dpd; output asof_&mn_yr8.; end; 77 else if "28&mn_yr9.:00:00:00"dt<=ASOF_DT<"01&mn_yr8.:00:00:00"dt then do; dpd_9=dpd; output asof_&mn_yr9.; end; 78 else if "28&mn_yr10.:00:00:00"dt<=ASOF_DT<"01&mn_yr9.:00:00:00"dt then do; dpd_10=dpd; output asof_&mn_yr10.; end; 79 else if "28&mn_yr11.:00:00:00"dt<=ASOF_DT<"01&mn_yr10.:00:00:00"dt then do;dpd_11=dpd; output asof_&mn_yr11.;end; 80 else if "28&mn_yr12.:00:00:00"dt<=ASOF_DT<"01&mn_yr11.:00:00:00"dt then do;dpd_12=dpd; output asof_&mn_yr12.; end; 81 run; WHAT ARE YOU DOING??? 4
82 83 data _asof_&mn_yr1.; set asof_&mn_yr1.; drop dpd_2--dpd_12; run; 84 data _asof_&mn_yr2.; set asof_&mn_yr2.; drop dpd_1 dpd_3--dpd_12; run; 85 data _asof_&mn_yr3.; set asof_&mn_yr3.; drop dpd_1--dpd_2 dpd_4--dpd_12; run; 86 data _asof_&mn_yr4.; set asof_&mn_yr4.; drop dpd_1--dpd_3 dpd_5--dpd_12; run; 87 data _asof_&mn_yr5.; set asof_&mn_yr5.; drop dpd_1--dpd_4 dpd_6--dpd_12; run; 88 data _asof_&mn_yr6.; set asof_&mn_yr6.; drop dpd_1--dpd_5 dpd_7--dpd_12; run; 89 data _asof_&mn_yr7.; set asof_&mn_yr7.; drop dpd_1--dpd_6 dpd_8--dpd_12; run; 90 data _asof_&mn_yr8.; set asof_&mn_yr8.; drop dpd_1--dpd_7 dpd_9--dpd_12; run; 91 data _asof_&mn_yr9.; set asof_&mn_yr9.; drop dpd_1--dpd_8 dpd_10--dpd_12; run; 92 data _asof_&mn_yr10.; set asof_&mn_yr10.; drop dpd_1--dpd_9 dpd_11--dpd_12; run; 93 data _asof_&mn_yr11.; set asof_&mn_yr11.; drop dpd_1--dpd_10 dpd_12; run; 94 data _asof_&mn_yr12.; set asof_&mn_yr12.; drop dpd_1--dpd_11 ; run; data Cohort_12mn_dpd_&mn_yr1.; 97 merge _asof_&mn_yr12. _asof_&mn_yr11. _asof_&mn_yr10. _asof_&mn_yr9. _asof_&mn_yr8. 98 _asof_&mn_yr7. _asof_&mn_yr6. _asof_&mn_yr5. _asof_&mn_yr4. _asof_&mn_yr3. _asof_&mn_yr2. 99 _asof_&mn_yr1.(in=a); 100 by acct_id; 101 if a; 102 run; %mend; get_data(Sep2006, Aug2006,Jul2006,Jun2006,May2006,Apr2006, Mar2006, Feb2006,Jan2006,Dec2005,Nov2005,Oct2005,Sep2005); WHAT ARE YOU DOING??? 5
What is Time Series Data? Definition of Time Series: A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. (by Australian Bureau of Statistics) An ordered sequence of values of a variable at equally spaced time intervals. (by Engineering Statistics Handbook) 6
What is Time Series Data? For example, the monthly delinquent status of a customer over 12-month period 7
Stages of Time Series Analysis 1.Analyze data to obtain an understanding of the underlying drivers that produced the collected data. 2.Develop model(s) to forecast possible outcomes through the collected data. 3.Compare monitoring results with predicted outcomes to make appropriate control process modification(s). 8
Applications of Time Series Data Stock Market Inventory WorkloadSales 9
A Typical Time Series Data Process Transpose 12 monthly delinquent status observations of account to a single account observation with 12-month delinquent status 10
Time Series Data by SAS Procedure 001%macro DLQ_12_Month(perf_obs_date); data perf_12_month; 004 set acct_basel_dpd; format perf_obs_dt date9.; perf_obs_dt = intnx("MONTH","&perf_obs_date."d,0,'END'); dlq_status_month = intck("MONTH", datepart(asof_dt)), perf_obs_dt ) + 1; if 1 <= dlq_status_month <= 12; 013 run; proc sort data = perf_12_month; 016 by acct_id perf_obs_dt dlq_status_month; 017 run;
Time Series Data by SAS Procedure After executing line 003 to line 018, the data set is now with two additional variables: Performance Observed Date (perf_obs_dt) Delinquent Status Month (dlq_status_month) 12
Time Series Data by SAS Procedure proc transpose data = perf_12_month 020 out = Cohort_12mn_dpd_&perf_obs_date 021 (drop = _name_ where=(dpd1 ^=.)) 022 prefix = dpd 023 ; 024 by acct_id perf_obs_dt; id dlq_status_month; var basel_dpd; 029 run; %mend; %DLQ_12_Month(01AUG2006); PROC TRANSPOSE: 1.Transposes the variable basel_dpd by acct_id perf_obs_dt 2.Creates new variables dpd_1 to dpd_12 by PREFIX and ID options 13
Time Series Data by SAS Procedure After executing line 019 to line 029, the new data set is a single account observation with 12-month delinquent status 14
Time Series Data by SAS Data Step 019 data Cohort_12mn_dpd_&perf_obs_date (keep = acct_id perf_obs_dt dpd1 - dpd12); 020 set perf_12_month; 021 by acct_id perf_obs_dt dlq_status_month; 022 array dpd[12] dpd1 - dpd12; 023 retain dpd1 - dpd12 j 0; if first.acct_id and first.perf_obs_dt 026 then do; 027 do i = 1 to 12; 028 dpd[i] = 0; 029 end; 030 j = 0; 031 end; j = j + 1; dpd[j] = basel_dpd; if last.acct_id and last.perf_obs_dt ; 038 run; 039 %mend; %DLQ_12_Month(01AUG2006); 15
Time Series Data by SAS Data Step SAS Data Step: 1.Declares ARRAY DPD to create new variable DPD_1 to DPD_12 2.Declares RETAIN to pass the values of variable DPD_1 to DPD_12 from one observation to the next observation 3.Initiates the values of variable DPD_1 to DPD_12 to 0s when the first account id and the first observation date are detected and neutralizes index J 4.Assigns the values of variable DPD_1 to DPD_12 by index J 5.Outputs the values of variable DPD_1 to DPD_12 to new data set only when the last account id and the last observation date is detected 16
Time Series Data by SAS Function 019 data Cohort_12mn_dpd_&perf_obs_date 020 (keep = acct_id perf_obs_dt dpd1 - dpd12); 021 set perf_12_month; 022 by acct_id perf_obs_dt dlq_status_month; 023 array dpd[12] dpd1 - dpd12;; %do i = 1 %to 12; 026 %let j = %eval(12 - &i); 027 dpd[&i] = lag&j(basel_dpd); 028 %end; if dlq_status_month = 12; 031 run; %mend; %DLQ_12_Month(01AUG2006); 17
Time Series Data by SAS Function SAS LAG Function: 1.Stores a value in a queue and returns a value stored previously in that queue. 2.Each occurrence of a LAGn function in a program generates its own queue of values. 3.When an occurrence of LAGn is executed, the value at the top of its queue is removed and returned, the remaining values are shifted upwards, and the new value of the argument is placed at the bottom of the queue. observation of the prior execution. 18 LAG0 LAG1 LAG2 LAG3 LAG4 LAG5 LAG6 LAG7 LAG8 LAG9 LAG10 LAG11
Time Series Data by SAS Function 019 data Cohort_12mn_dpd_&perf_obs_date 020 (keep = acct_id perf_obs_dt dpd1 - dpd12); 021 set perf_12_month; 022 by acct_id perf_obs_dt dlq_status_month; 023 array dpd[12] dpd1 - dpd12;; if dlq_status_month = 12; %do i = 1 %to 12; 028 %let j = %eval(12 - &i); 029 dpd[&i] = lag&j(basel_dpd); 030 %end; * if dlq_status_month = 12; 033 run; %mend; %DLQ_12_Month(01AUG2006); 19
Time Series Data by SAS Function SAS LAG Function: 1.The queue for each occurrence of LAGn is initialized with n missing values. 2.Missing values are returned for the first n executions of each occurrence of LAGn, after which the lagged values of the argument begin to appear. 3.Stores values at the bottom of the queue and returns values from the top of the queue occurs only when the function is executed. An occurrence of the LAGn function that is executed conditionally will store and return values only from the observations for which the condition is satisfied. 20
Time Series Data by SAS Function Special Case: When not all time series is populated. The sub-setting IF statement (if dlq_status_month = 12;) will never be satisfied. SAS returns no observation to the output dataset. 21
Time Series Data by SAS Function 019 proc sort data = perf_12_month; 020 by acct_id load_dt descending dlq_status_month; 021 run; data Cohort_12mn_dpd_&perf_obs_date (keep = acct_id load_dt dpd1 - dpd12); 024 set perf_12_month; 025 by acct_id load_dt descending dlq_status_month; array dpd[12] dpd1 - dpd12;; %do i = 1 %to 12; 030 %let j = %eval(&i - 1); 031 dpd[&i] = lag&j(basel_dpd); 032 %end; if dlq_status_month = 1; 035 run; 036 %mend; %DLQ_12_Month(01AUG2006); 22
Time Series Data by SAS Function Special Case: When not all time series is populated. By 1.Sorting the variable dlq_status_month in descending order 2.Conditioning the sub-setting IF statement is “True” when the beginning of the time series date period (if dlq_status_month = 1) is reached. 23
Weakness and Strength of Each Approach Approach Strength Weakness =================================================== PROC TRANSPOSE DATA STEP LAG FUNCTION 24 Easy CodingLimited Variable Flexible Manipulating Initialization Self Explanatory Conditional Execution