Time Series Data Processes by Tai Yu April 15, 2013.

Slides:



Advertisements
Similar presentations
The Dictionary ADT: Skip List Implementation
Advertisements

How SAS implements structured programming constructs
Axio Research E-Compare A Tool for Data Review Bill Coar.
Examples from SAS Functions by Example Ron Cody
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
Combining Lags and Arrays [and a little macro] Lynn Lethbridge SHRUG OCT 28, 2011.
Analysis of Algorithms CS Data Structures Section 2.6.
Heaps1 Part-D2 Heaps Heaps2 Recall Priority Queue ADT (§ 7.1.3) A priority queue stores a collection of entries Each entry is a pair (key, value)
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
VBA Modules, Functions, Variables, and Constants
Quick Data Summaries in SAS Start by bringing in data –Use permanent data set for these examples Proc Tabulate –Produces summaries very quickly and easily.
1 Software Testing and Quality Assurance Lecture 24 – Testing Interactions (Chapter 6)
1 Data Structures A program solves a problem. A program solves a problem. A solution consists of: A solution consists of:  a way to organize the data.
Basic And Advanced SAS Programming
Adding Automated Functionality to Office Applications.
Chapter 18: Modifying SAS Data Sets and Tracking Changes 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
The New SAS Engine For CRSP   Benefits of using SAS engine for CRSP   Setting up the interface between SAS and CRSP   Examples of use   Performance.
REPETITION STRUCTURES. Topics Introduction to Repetition Structures The while Loop: a Condition- Controlled Loop The for Loop: a Count-Controlled Loop.
SAS SQL SAS Seminar Series
Chapter 14: Generating Data with Do Loops OBJECTIVES Understand iterative DO loops. Construct a DO loop to perform repetitive calculations Use DO loops.
Lecture 5 Sorting, Printing, and Summarizing Your Data.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
1 Stacks Chapter 4 2 Introduction Consider a program to model a switching yard –Has main line and siding –Cars may be shunted, removed at any time.
Chapter 21 Reading Hierarchical Files Reading Hierarchical Raw Data Files.
1 Chapter 4: Introduction to Lookup Techniques 4.1 Introduction to Lookup Techniques 4.2 In-Memory Lookup Techniques 4.3 Disk Storage Techniques.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Learning to love the SAS LAG function Phuse 9-12 October 2011 Herman Ament, MSD, Oss NL Phuse 9-12 October 2011.
Chapter 15: Combining Data Horizontally 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Math 3400 Computer Applications of Statistics Lecture 1 Introduction and SAS Overview.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Use the UPDATE statement to: –update a master dataset with new transactions (e.g. a bank account updated regularly with deposits and withdrawals…). Not.
A Brief Introduction to PROC TRANSPOSE prepared by Voytek Grus for
Introduction to Using the Data Step Hash Object with Large Data Sets Richard Allen Peak Stat.
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
Project 1: Using Arrays and Manipulating Strings Essentials for Design JavaScript Level Two Michael Brooks.
BMTRY 789 Lecture 10: SAS MACRO Facility Annie N. Simpson, MSc.
Objectives At the end of the class, students are expected to be able to do the following: Understand the searching technique concept and the purpose of.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Writing and Reading XML files with SAS (Statistical Analysis System) What is SAS ? SAS Institute (or SAS, pronounced "sass") is an American developer of.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 5 Repetition Structures.
CSC 1010 Programming for All Lecture 4 Loops Some material based on material from Marty Stepp, Instructor, University of Washington.
Controlling Input and Output
Chapter 7: Macros in SAS  Macros provide for more flexible programming in SAS  Macros make SAS more “object-oriented”, like R 1 © Fall 2011 John Grego.
Supporting Time Series Data Group Name: WG2 Source: Qi Yu, Mitch Tseng- Huawei Technologies, Co. LTD. Meeting Date: Work Item :WI-0033.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
An Introduction to Proc Transpose David P. Rosenfeld HR Consultant, Workforce Planning & Data Management City of Toronto.
1 Chapter 3: Getting Started with Tasks 3.1 Introduction to Task Dialogs 3.2 Creating a Listing Report 3.3 Creating a Frequency Report 3.4 Creating a Two-Way.
Working with Loops, Conditional Statements, and Arrays.
“LAG with a WHERE” and other DATA Step Stories Neil Howard A.
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
LISA SHORT COURSE SERIES: INTRODUCTION TO SAS UNIVERSITY William DeShong Fall 2015.
XP Tutorial 3 New Perspectives on JavaScript, Comprehensive1 Working with Arrays, Loops, and Conditional Statements Creating a Monthly Calendar.
1 CSC103: Introduction to Computer and Programming Lecture No 17.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
Chapter 14: Combining Data Vertically 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Time Series Analysis By Tyler Moore.
Top 50 Data Structures Interview Questions
Chapter 6: Modifying and Combining Data Sets
Stacks Chapter 4.
Chapter 18: Modifying SAS Data Sets and Tracking Changes
Chapter 22 Reading Hierarchical Files
Discrete Mathematics CS 2610
Presentation transcript:

Time Series Data Processes by Tai Yu April 15, 2013

61 %macro get_data(mn_yr0,mn_yr1,mn_yr2,mn_yr3,mn_yr4,mn_yr5, mn_yr6); data asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4. asof_&mn_yr3. asof_&mn_yr2. asof_&mn_yr1.; 64 set wfpf_delinq_data; if daysdelq<1 THEN DPD=0; else if daysdelq>180 THEN DPD=180; else DPD=daysdelq; if "28&mn_yr1.:00:00:00"dt<=ASOF_DT<"01&mn_yr0.:00:00:00"dt then do;dpd_1=dpd; output asof_&mn_yr1.; end; 69 else if "28&mn_yr2.:00:00:00"dt<=ASOF_DT<"01&mn_yr1.:00:00:00"dt then do; dpd_2=dpd; output asof_&mn_yr2.; end; 70 else if "28&mn_yr3.:00:00:00"dt<=ASOF_DT<"01&mn_yr2.:00:00:00"dt then do; dpd_3=dpd; output asof_&mn_yr3.; end; 71 else if "28&mn_yr4.:00:00:00"dt<=ASOF_DT<"01&mn_yr3.:00:00:00"dt then do; dpd_4=dpd; output asof_&mn_yr4.; end; 72 else if "28&mn_yr5.:00:00:00"dt<=ASOF_DT<"01&mn_yr4:00:00:00"dt then do; dpd_5=dpd; output asof_&mn_yr5.; end; 73 else if "28&mn_yr6.:00:00:00"dt<=ASOF_DT<"01&mn_yr5.:00:00:00"dt then do; dpd_6=dpd; output asof_&mn_yr6.; end; run; WHAT ARE YOU DOING??? 2

84 data _asof_&mn_yr1.; set asof_&mn_yr1.; drop dpd_2--dpd_6; run; 85 data _asof_&mn_yr2.; set asof_&mn_yr2.; drop dpd_1 dpd_3--dpd_6; run; 86 data _asof_&mn_yr3.; set asof_&mn_yr3.; drop dpd_1--dpd_2 dpd_4--dpd_6; run; 87 data _asof_&mn_yr4.; set asof_&mn_yr4.; drop dpd_1--dpd_3 dpd_5--dpd_6; run; 88 data _asof_&mn_yr5.; set asof_&mn_yr5.; drop dpd_1--dpd_4 dpd_6 ; run; 89 data _asof_&mn_yr6.; set asof_&mn_yr6.; drop dpd_1--dpd_5 ; run; data Cohort_12mn_dpd_&mn_yr1.; 92 merge _asof_&mn_yr6. _asof_&mn_yr5. _asof_&mn_yr4. _asof_&mn_yr3. _asof_&mn_yr2. _asof_&mn_yr1. (in=a); 93 by acct_id; 94 if a; 95 run; %mend; get_data(Sep2006, Aug2006,Jul2006,Jun2006,May2006,Apr2006, Mar2006,); WHAT ARE YOU DOING??? 3

61 %macro get_data(mn_yr0,mn_yr1,mn_yr2,mn_yr3,mn_yr4,mn_yr5, mn_yr6, 62 mn_yr6, mn_yr7,mn_yr8,mn_yr9,mn_yr10,mn_yr11, mn_yr12); 63 data asof_&mn_yr12. asof_&mn_yr11. asof_&mn_yr10. asof_&mn_yr9. asof_&mn_yr8. asof_&mn_yr7. 64 asof_&mn_yr6. asof_&mn_yr5. asof_&mn_yr4. asof_&mn_yr3. asof_&mn_yr2. asof_&mn_yr1. ; 65 set wfpf_delinq_data; if daysdelq 180 THEN DPD=180; else DPD=daysdelq; if "28&mn_yr1.:00:00:00"dt<=ASOF_DT<"01& asof_&mn_yr0. :00:00:00"dt then do;dpd_1=dpd; output asof_&mn_yr1.; end; 70 else if "28&mn_yr2.:00:00:00"dt<=ASOF_DT<"01&mn_yr1.:00:00:00"dt then do; dpd_2=dpd; output asof_&mn_yr2.; end; 71 else if "28&mn_yr3.:00:00:00"dt<=ASOF_DT<"01&mn_yr2.:00:00:00"dt then do; dpd_3=dpd; output asof_&mn_yr3.; end; 72 else if "28&mn_yr4.:00:00:00"dt<=ASOF_DT<"01&mn_yr3.:00:00:00"dt then do; dpd_4=dpd; output asof_&mn_yr4.; end; 73 else if "28&mn_yr5.:00:00:00"dt<=ASOF_DT<"01&mn_yr4:00:00:00"dt then do; dpd_5=dpd; output asof_&mn_yr5.; end; 74 else if "28&mn_yr6.:00:00:00"dt<=ASOF_DT<"01&mn_yr5.:00:00:00"dt then do; dpd_6=dpd; output asof_&mn_yr6.; end; 75 else if "28&mn_yr7.:00:00:00"dt<=ASOF_DT<"01&mn_yr6.:00:00:00"dt then do; dpd_7=dpd; output asof_&mn_yr7.; end; 76 else if "28&mn_yr8.:00:00:00"dt<=ASOF_DT<"01&mn_yr7.:00:00:00"dt then do; dpd_8=dpd; output asof_&mn_yr8.; end; 77 else if "28&mn_yr9.:00:00:00"dt<=ASOF_DT<"01&mn_yr8.:00:00:00"dt then do; dpd_9=dpd; output asof_&mn_yr9.; end; 78 else if "28&mn_yr10.:00:00:00"dt<=ASOF_DT<"01&mn_yr9.:00:00:00"dt then do; dpd_10=dpd; output asof_&mn_yr10.; end; 79 else if "28&mn_yr11.:00:00:00"dt<=ASOF_DT<"01&mn_yr10.:00:00:00"dt then do;dpd_11=dpd; output asof_&mn_yr11.;end; 80 else if "28&mn_yr12.:00:00:00"dt<=ASOF_DT<"01&mn_yr11.:00:00:00"dt then do;dpd_12=dpd; output asof_&mn_yr12.; end; 81 run; WHAT ARE YOU DOING??? 4

82 83 data _asof_&mn_yr1.; set asof_&mn_yr1.; drop dpd_2--dpd_12; run; 84 data _asof_&mn_yr2.; set asof_&mn_yr2.; drop dpd_1 dpd_3--dpd_12; run; 85 data _asof_&mn_yr3.; set asof_&mn_yr3.; drop dpd_1--dpd_2 dpd_4--dpd_12; run; 86 data _asof_&mn_yr4.; set asof_&mn_yr4.; drop dpd_1--dpd_3 dpd_5--dpd_12; run; 87 data _asof_&mn_yr5.; set asof_&mn_yr5.; drop dpd_1--dpd_4 dpd_6--dpd_12; run; 88 data _asof_&mn_yr6.; set asof_&mn_yr6.; drop dpd_1--dpd_5 dpd_7--dpd_12; run; 89 data _asof_&mn_yr7.; set asof_&mn_yr7.; drop dpd_1--dpd_6 dpd_8--dpd_12; run; 90 data _asof_&mn_yr8.; set asof_&mn_yr8.; drop dpd_1--dpd_7 dpd_9--dpd_12; run; 91 data _asof_&mn_yr9.; set asof_&mn_yr9.; drop dpd_1--dpd_8 dpd_10--dpd_12; run; 92 data _asof_&mn_yr10.; set asof_&mn_yr10.; drop dpd_1--dpd_9 dpd_11--dpd_12; run; 93 data _asof_&mn_yr11.; set asof_&mn_yr11.; drop dpd_1--dpd_10 dpd_12; run; 94 data _asof_&mn_yr12.; set asof_&mn_yr12.; drop dpd_1--dpd_11 ; run; data Cohort_12mn_dpd_&mn_yr1.; 97 merge _asof_&mn_yr12. _asof_&mn_yr11. _asof_&mn_yr10. _asof_&mn_yr9. _asof_&mn_yr8. 98 _asof_&mn_yr7. _asof_&mn_yr6. _asof_&mn_yr5. _asof_&mn_yr4. _asof_&mn_yr3. _asof_&mn_yr2. 99 _asof_&mn_yr1.(in=a); 100 by acct_id; 101 if a; 102 run; %mend; get_data(Sep2006, Aug2006,Jul2006,Jun2006,May2006,Apr2006, Mar2006, Feb2006,Jan2006,Dec2005,Nov2005,Oct2005,Sep2005); WHAT ARE YOU DOING??? 5

What is Time Series Data? Definition of Time Series:  A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. (by Australian Bureau of Statistics)  An ordered sequence of values of a variable at equally spaced time intervals. (by Engineering Statistics Handbook) 6

What is Time Series Data? For example, the monthly delinquent status of a customer over 12-month period 7

Stages of Time Series Analysis 1.Analyze data to obtain an understanding of the underlying drivers that produced the collected data. 2.Develop model(s) to forecast possible outcomes through the collected data. 3.Compare monitoring results with predicted outcomes to make appropriate control process modification(s). 8

Applications of Time Series Data Stock Market Inventory WorkloadSales 9

A Typical Time Series Data Process Transpose 12 monthly delinquent status observations of account to a single account observation with 12-month delinquent status 10

Time Series Data by SAS Procedure 001%macro DLQ_12_Month(perf_obs_date); data perf_12_month; 004 set acct_basel_dpd; format perf_obs_dt date9.; perf_obs_dt = intnx("MONTH","&perf_obs_date."d,0,'END'); dlq_status_month = intck("MONTH", datepart(asof_dt)), perf_obs_dt ) + 1; if 1 <= dlq_status_month <= 12; 013 run; proc sort data = perf_12_month; 016 by acct_id perf_obs_dt dlq_status_month; 017 run;

Time Series Data by SAS Procedure After executing line 003 to line 018, the data set is now with two additional variables: Performance Observed Date (perf_obs_dt) Delinquent Status Month (dlq_status_month) 12

Time Series Data by SAS Procedure proc transpose data = perf_12_month 020 out = Cohort_12mn_dpd_&perf_obs_date 021 (drop = _name_ where=(dpd1 ^=.)) 022 prefix = dpd 023 ; 024 by acct_id perf_obs_dt; id dlq_status_month; var basel_dpd; 029 run; %mend; %DLQ_12_Month(01AUG2006); PROC TRANSPOSE: 1.Transposes the variable basel_dpd by acct_id perf_obs_dt 2.Creates new variables dpd_1 to dpd_12 by PREFIX and ID options 13

Time Series Data by SAS Procedure After executing line 019 to line 029, the new data set is a single account observation with 12-month delinquent status 14

Time Series Data by SAS Data Step 019 data Cohort_12mn_dpd_&perf_obs_date (keep = acct_id perf_obs_dt dpd1 - dpd12); 020 set perf_12_month; 021 by acct_id perf_obs_dt dlq_status_month; 022 array dpd[12] dpd1 - dpd12; 023 retain dpd1 - dpd12 j 0; if first.acct_id and first.perf_obs_dt 026 then do; 027 do i = 1 to 12; 028 dpd[i] = 0; 029 end; 030 j = 0; 031 end; j = j + 1; dpd[j] = basel_dpd; if last.acct_id and last.perf_obs_dt ; 038 run; 039 %mend; %DLQ_12_Month(01AUG2006); 15

Time Series Data by SAS Data Step SAS Data Step: 1.Declares ARRAY DPD to create new variable DPD_1 to DPD_12 2.Declares RETAIN to pass the values of variable DPD_1 to DPD_12 from one observation to the next observation 3.Initiates the values of variable DPD_1 to DPD_12 to 0s when the first account id and the first observation date are detected and neutralizes index J 4.Assigns the values of variable DPD_1 to DPD_12 by index J 5.Outputs the values of variable DPD_1 to DPD_12 to new data set only when the last account id and the last observation date is detected 16

Time Series Data by SAS Function 019 data Cohort_12mn_dpd_&perf_obs_date 020 (keep = acct_id perf_obs_dt dpd1 - dpd12); 021 set perf_12_month; 022 by acct_id perf_obs_dt dlq_status_month; 023 array dpd[12] dpd1 - dpd12;; %do i = 1 %to 12; 026 %let j = %eval(12 - &i); 027 dpd[&i] = lag&j(basel_dpd); 028 %end; if dlq_status_month = 12; 031 run; %mend; %DLQ_12_Month(01AUG2006); 17

Time Series Data by SAS Function SAS LAG Function: 1.Stores a value in a queue and returns a value stored previously in that queue. 2.Each occurrence of a LAGn function in a program generates its own queue of values. 3.When an occurrence of LAGn is executed, the value at the top of its queue is removed and returned, the remaining values are shifted upwards, and the new value of the argument is placed at the bottom of the queue. observation of the prior execution. 18 LAG0 LAG1 LAG2 LAG3 LAG4 LAG5 LAG6 LAG7 LAG8 LAG9 LAG10 LAG11

Time Series Data by SAS Function 019 data Cohort_12mn_dpd_&perf_obs_date 020 (keep = acct_id perf_obs_dt dpd1 - dpd12); 021 set perf_12_month; 022 by acct_id perf_obs_dt dlq_status_month; 023 array dpd[12] dpd1 - dpd12;; if dlq_status_month = 12; %do i = 1 %to 12; 028 %let j = %eval(12 - &i); 029 dpd[&i] = lag&j(basel_dpd); 030 %end; * if dlq_status_month = 12; 033 run; %mend; %DLQ_12_Month(01AUG2006); 19

Time Series Data by SAS Function SAS LAG Function: 1.The queue for each occurrence of LAGn is initialized with n missing values. 2.Missing values are returned for the first n executions of each occurrence of LAGn, after which the lagged values of the argument begin to appear. 3.Stores values at the bottom of the queue and returns values from the top of the queue occurs only when the function is executed. An occurrence of the LAGn function that is executed conditionally will store and return values only from the observations for which the condition is satisfied. 20

Time Series Data by SAS Function Special Case: When not all time series is populated. The sub-setting IF statement (if dlq_status_month = 12;) will never be satisfied. SAS returns no observation to the output dataset. 21

Time Series Data by SAS Function 019 proc sort data = perf_12_month; 020 by acct_id load_dt descending dlq_status_month; 021 run; data Cohort_12mn_dpd_&perf_obs_date (keep = acct_id load_dt dpd1 - dpd12); 024 set perf_12_month; 025 by acct_id load_dt descending dlq_status_month; array dpd[12] dpd1 - dpd12;; %do i = 1 %to 12; 030 %let j = %eval(&i - 1); 031 dpd[&i] = lag&j(basel_dpd); 032 %end; if dlq_status_month = 1; 035 run; 036 %mend; %DLQ_12_Month(01AUG2006); 22

Time Series Data by SAS Function Special Case: When not all time series is populated. By 1.Sorting the variable dlq_status_month in descending order 2.Conditioning the sub-setting IF statement is “True” when the beginning of the time series date period (if dlq_status_month = 1) is reached. 23

Weakness and Strength of Each Approach Approach Strength Weakness =================================================== PROC TRANSPOSE DATA STEP LAG FUNCTION 24 Easy CodingLimited Variable Flexible Manipulating Initialization Self Explanatory Conditional Execution