Data Preparation for Analytics Using SAS Gerhard Svolba, Ph.D. Reviewed by Madera Ebby, Ph.D.

Slides:



Advertisements
Similar presentations
Report and Ready Reckoner Guidance
Advertisements

Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.
Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
16b. Accessing Data: Means in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
GCSE Statistics Coursework Sets 1 & 2 February 2013.
SAS Savvy Focused Online SAS Resource and Training Search SAS Savvy Procedures, Topics, SAS Resources, Common FAQ SAS Technical Tips Forums: Knowledge.
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
Today: Run SAS programs on Saturn (UNIX tutorial) Runs SAS programs on the PC.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Chapter 9 Business Intelligence Systems
Biostatistical Methods II PubH 6415 Spring PubH 6415 – Biostatistics I Instructor: Susan Telke (office hours: lecture.
Chapter 3 Database Management
Physical Design CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 Physical Design Steps 1. Develop standards 2.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
1 The New York State Education Department New York State’s Student Reporting and Accountability System.
PROC SQL – Select Codes To Master For Power Programming Codes and Examples from SAS.com Nethra Sambamoorthi, PhD Northwestern University Master of Science.
SAS Programming SAS Data Mart. Outline Access different format of data for SAS SAS data mart SAS data manipulation 2.
Introduction & Conclusion Paragraphs
Chapter 4 Physical Database Layouts Database Processing Chapter 4.
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Biostatistics Analysis Center Center for Clinical Epidemiology and Biostatistics University of Pennsylvania School of Medicine Minimum Documentation Requirements.
Copyright © 2006, SAS Institute Inc. All rights reserved. Enterprise Guide 4.2 : A Primer SHRUG : Spring 2010 Presented by: Josée Ranger-Lacroix SAS Institute.
SAS SQL SAS Seminar Series
Introduction to SAS Essentials Mastering SAS for Data Analytics
SAS PROC REPORT PROC TABULATE
Systems analysis and design, 6th edition Dennis, wixom, and roth
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
INSERT BOOK COVER 1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
An Animated Guide©: Sending SAS files to Excel Concentrating on a D.D.E. Macro.
Introduction to SAS Essentials Mastering SAS for Data Analytics
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
Tips & Tricks MASUG02/18/2005. Multiple Graphs on One Page.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
Analyzing and Interpreting Quantitative Data
A Brief Introduction to PROC TRANSPOSE prepared by Voytek Grus for
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
The Project – Database Design. The following is the high mark band for the Database design: Analysed a given situation and produced and analysed a given.
Parallel Processing in SAS CPUCOUNT A comparison of Proc Means for the Project.
Lecture 5: Writing the Project Documentation Part III.
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
Sampling Error Estimation – SORS practice Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia.
Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
ME 142 Engineering Computation I Exam 2 Review VBA.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Title Page The title page is the first page of your psychology paper. In order to make a good first impression, it is important to have a well-formatted.
14b. Accessing Data Files in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
1 Chapter 3: Getting Started with Tasks 3.1 Introduction to Task Dialogs 3.2 Creating a Listing Report 3.3 Creating a Frequency Report 3.4 Creating a Two-Way.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
Topic 21: ANOVA and Linear Regression. Outline Review cell means and factor effects models Relationship between factor effects constraint and explanatory.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
SAS Programming Training Instructor:Greg Grandits TA: Textbooks:The Little SAS Book, 5th Edition Applied Statistics and the SAS Programming Language, 5.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 14 & 19 By Tasha Chapman, Oregon Health Authority.
Longitudinal Data Techniques: Looking Across Observations Ronald Cody, Ed.D., Robert Wood Johnson Medical School.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
Session 1 Retrieving Data From a Single Table
Analyzing and Interpreting Quantitative Data
Producing Descriptive Statistics
Finding Correlation Coefficient & Line of Best Fit
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Data Preparation for Analytics Using SAS Gerhard Svolba, Ph.D. Reviewed by Madera Ebby, Ph.D.

2 What is the purpose of this book? Introduces the reader to data preparation Why data preparation is not only important but a must prior to data analysis From data preparation process to data analytics

3 The Analysis Path: From raw data to results that can be implemented Data sourcesData PreparationAnalytic Modeling Results and Actions Different Data Sources Merges, Denormalization Modeling, Parameter Estimation, Tuning Usage of Results Relational Models, Star Schemes Derived Variables Transpositions, Aggregations Predictions, Classifications or Clustering Profiling Interpretations

4 The Analysis Path: From raw data to results that can be implemented ` Data availability Adequate Preparation Clever Modeling Good Results

5 Four Dimensions for Analytic Data Preparation Business and Process Knowledge Analytical Knowledge Efficient SAS coding Documentation and Maintenance Analytic Data Preparation

6 Business question: How did students who met the provincial standard in grade 3 perform in grade 6? Generates many other questions Work with people in other departments such as IT to carry out a data analytic process

7 Why is this author qualified or not qualified to address this topic?  He is an experienced SAS user as exemplified in the many Macros  He addresses issues by presenting examples from different background

8 What are the strengths or weaknesses of this book?  The book is written clearly and is easy to read  Provides the reader with a lot of examples of codes, input and outputs

9 Would you recommend this book? If so, who would you recommend it to and for what purpose?  Those who prepare data marts for statistics or data mining or time series analyses  Those who provide data used in creating data marts IT and data warehousing  Both new and experienced SAS users who perform data analyses using data marts  Those who prepare data in relational databases with SQL

10 Does the book achieve its purpose? Absolutely! It enables one to:  Understand the business environment in which data preparation occurs  Extract and structure your data  Create derived variables from different tables  Program SAS in an efficient way

11 What is the best tip or technique addressed in this book?  There are many new techniques that I learnt from this book. For example:  Examine the mean scores for math by board mident

12 Continued… Proc means data=datalib.boards noprint nway; class board_mident; var Math_score; output out=datalib.aggr_static(drop=_type_ _freq_) Mean= Sum= N= STD= MIN= MAX= /Autoname; run;

13 Continued… To run analysis by board_mident, we use a CLASS statement. A BY statement could also be used but data would have to be sorted by board_mident NWAY suppresses grand total mean and all other totals so that output data contains only rows for 5 boards which are the analysis subjects The NOPRINT in order to suppress the printed output from the log, which can be thousands of descriptive measures even for a small sample of 5 observations In the OUTPUT statement we specify the statistics that will be calculated. The AUTONAME option creates the new variable names in the form of VARIABLENAME_ STATISTIC If we want to calculate different statistics for different input variables we can specify it on the OUTPUT statement: e.g SUM(VARIABLE)=sum_variable In the OUTPUT statement we drop the _TYPE_ and _FREQ_vaiables, although we could keep the _FREQ_ and omit N from the statistics list. Chapter 18, Multiple Interval-Scaled Observations per subject, page 183.

14 CONTINUED…

15 Are there other books (or sources of information) available with similar content?  Yes, but tend to present bits and pieces of information  E.g. Resources on the internet  The Little SAS Book by Delwiche and Slaughter If so, how does this book compare?  Comprehensive, well illustrated presentation of material

16 What will your SAS log look like?

17 or

18 or

19 or