© OCS Consulting The flexible extension to your IT team 1 Jim Groeneveld, OCS Consulting, ´s Hertogenbosch, Netherlands. PhUSE 2011 Comparing dataset metadata.

Slides:



Advertisements
Similar presentations
Axio Research E-Compare A Tool for Data Review Bill Coar.
Advertisements

© OCS Consulting 1 SAS Macro Version Control Jim Groeneveld, OCS Consulting, Rosmalen, the Netherlands – SGF 2007.
Introduction to SAS Programming Christina L. Ughrin Statistical Software Consulting Some notes pulled from SAS Programming I: Essentials Training.
Statistics in Science  Introducing SAS ® software Acknowlegements to David Williams Caroline Brophy.
Chapter 11 Group Functions
XP Chapter 3 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Analyzing Data For Effective Decision Making.
Chapter 7 Data Management. Agenda Database concept Import data Input and edit data Sort data Function Filter data Create range name Calculate subtotal.
Introduction to SQL Session 1 Retrieving Data From a Single Table.
Basic And Advanced SAS Programming
Concepts of Database Management Sixth Edition
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
© OCS Biometric Support 1 Updating an MS SQL database from SAS Jim Groeneveld, OCS Biometric Support, ‘s Hertogenbosch, Netherlands. PhUSE 2010 – CC04.
Basic Concept of Data Coding Codes, Variables, and File Structures.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
11 Chapter 2: Working with Data in a Project 2.1 Introduction to Tabular Data 2.2 Accessing Local Data 2.3 Importing Text Files 2.4 Editing Tables in the.
Chapter 10:Processing Macro Variables at Execution Time 1 STAT 541 © Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 3 Single-Table Queries
GIS 1 GIS Lecture 4 Geodatabases. GIS 2 Outline Administrative Data Example Data Tables Data Joins Common Datasets Spatial Joins ArcCatalog Geodatabases.
© OCS Consulting The flexible extension to your IT team 1 Embedding equivalence t-test results in Bland Altman Plots visualising rater reliability Jim.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
Analyzing Data For Effective Decision Making Chapter 3.
Introduction to SAS. What is SAS? SAS originally stood for “Statistical Analysis System”. SAS is a computer software system that provides all the tools.
CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
PREPARING DATA FOR STATISTICAL ANALYSIS Data Cleaning Data Cleaning Dataset Preparation Dataset Preparation Documentation Documentation 9 September 2008.
Multiple Uses for a Simple SQL Procedure Rebecca Larsen University of South Florida.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
1 PhUSE 2011 Missing Values in SAS Magnus Mengelbier Director.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Using Special Operators (LIKE and IN)
Concepts of Database Management Seventh Edition
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
GIS 1 GIS Lecture 4 Geodatabases Copyright – Kristen S. Kurland, Carnegie Mellon University.
SQL Chapter Two. Overview Basic Structure Verifying Statements Specifying Columns Specifying Rows.
WHAT IS A DATABASE? A DATABASE IS A COLLECTION OF DATA RELATED TO A PARTICULAR TOPIC OR PURPOSE OR TO PUT IT SIMPLY A GENERAL PURPOSE CONTAINER FOR STORING.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
© OCS Biometric Support 1 SAS macro %_COUNT_ Jim Groeneveld, OCS Biometric Support, Leiden, the Netherlands. CC01 – PhUSE 2008.
1 Data Manipulation (with SQL) HRP223 – 2010 October 13, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Laboratory 1. Introduction to SAS u Statistical Analysis System u Package for –data entry –data manipulation –data storage –data analysis –reporting.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 5 Reading and Manipulating SAS ® Data Sets and Creating Detailed Reports Xiaogang Su Department of Statistics University of Central Florida.
© OCS Biometric Support 1 APPEND, EXECUTE and MACRO Jim Groeneveld, OCS Biometric Support, ‘s Hertogenbosch, Netherlands. PhUSE 2010 – CC05 PhUSE 2010.
Chapter 17: Formatting Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Chapter 1: Overview of SAS System Basic Concepts of SAS System.
AdMIT Custom Export (new) The new Custom Export utility features an easier to use interface and more fields.
Copyright © 2004, SAS Institute Inc. All rights reserved. SASHELP Datasets A real life example Barb Crowther SAS Consultant October 22, 2004.
CS 111 – Nov. 8 Databases Database Management Systems (DBMS) Structured Query Language (SQL) Commitment –Please review sections 9.1 – 9.2.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
1 Data Manipulation (with SQL) HRP223 – 2009 October 12, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
Basics in R part 2. Variable types in R Common variable types: Numeric - numeric value: 3, 5.9, Logical - logical value: TRUE or FALSE (1 or 0)
1 Checking Data with the PRINT and FREQ Procedures.
Based on Learning SAS by Example: A Programmer’s Guide Chapters 1 & 2
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapter 26 By Tasha Chapman, Oregon Health Authority.
SAS Certification Prep Guide Chapter 7 Creating and Applying User-Defined Formats.
Copyright 2009 The Little Engine That Could: Using EXCEL LIBNAME Engine Options to Enhance Data Transfers between SAS® and Microsoft® Excel Files William.
Working Efficiently with Large SAS® Datasets Vishal Jain Senior Programmer.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 5 & 6 By Ravi Mandal.
SAUSAG 71 – 21 Aug 2014 Tech Tips Jerry Le Breton On behalf of the SAUSAG Committee.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
Session 1 Retrieving Data From a Single Table
Excel AVERAGEIF Function
Objectives Query for top values Create a parameter query
Using SQL to Prepare Data for Analysis
Lab 3 and HRP259 Lab and Combining (with SQL)
Microsoft Excel 2007 – Level 2
Presentation transcript:

© OCS Consulting The flexible extension to your IT team 1 Jim Groeneveld, OCS Consulting, ´s Hertogenbosch, Netherlands. PhUSE 2011 Comparing dataset metadata

© OCS Consulting The flexible extension to your IT team 2 Comparing dataset metadata AGENDA / CONTENTS A.Comparing dataset data and metadata 1.PROC COMPARE 2.macro %CrossRef B.Dataset and variable attributes C.Example results (in dataset) 1.Dataset attributes 2.Variable attributes D.Application of macro %CrossRef E.Some technical information F.Future features

© OCS Consulting The flexible extension to your IT team 3 Comparing dataset metadata A. Comparing dataset data and metadata 1.PROC COMPARE a.data oriented (attributes: NOVALUES option) b.only 2 datasets (or variables in one) at a time c.cumbersome output (summary: OUT= dataset) d.may be tuned as desired, yet limited to pairs 2.SAS macro %CrossRef a.structure oriented: dataset & variable attributes b.any number of specified datasets (from 1) c.tabular summarisation (in result dataset only) d.columns: dataset names; rows: attributes e.user specification of desired attributes

© OCS Consulting The flexible extension to your IT team 4 Comparing dataset metadata B. Dataset and variable attributes 1.Dataset attributes a.MemName, MemLabel and LibName b.Creation and Modification date and time c.Number of variables and physical observations 2.Variable attributes a.Name (common name in first attribute column) b.Label:as value in above Name attribute record if no label then text: "-no label-" if no corresponding variable: empty c.optional variable’s Type and Length (combined) d.optional variable’s Informat and Format

© OCS Consulting The flexible extension to your IT team 5 Comparing dataset metadata C. Example results (in dataset) 1/2 Dataset attributes attributedatasetdataset dataset column123

© OCS Consulting The flexible extension to your IT team 6 Comparing dataset metadata C. Example results (in dataset) 2/2 Variable attributes attributedatasetdataset dataset column123

© OCS Consulting The flexible extension to your IT team 7 Comparing dataset metadata D. Application of macro %CrossRef 1.not with entirely different datasets but with a (limited) number of rather similar datasets to view differences a.master datasets and subsets of them b.different versions of datasets c.same datasets with different names d.similar datasets with different data 2.Goal: to see whether more datasets could be combined into one dataset (or ignored if the data are identical)

© OCS Consulting The flexible extension to your IT team 8 Comparing dataset metadata E. Some technical information 1.all fields are type character of length $256, first, attribute field has $36 2.internally SAS name literal variable names are applied a.OPTIONS VALIDVARNAME=ANY is set, and reset to the original state at the end of the macro b.variable names starting with an asterisk (*) or ending with an exclamation mark (!) and one digit. Avoid such names in your datasets and limit your variable name length to maximally 30 3.WORK dataset names start with __

© OCS Consulting The flexible extension to your IT team 9 Comparing dataset metadata F. Future features 1/2 1.comparing all datasets in one or more libraries using a wildcard (LibName.*) 2.optional aggregated data for both numerical and character variables a.(non-deleted) logical number of observations b.number of non-missing values c.number of missing values d.frequency distribution of a limited number of distinct (formatted) values (categories) e.minimum and maximum (formatted) value (first and last non-missing character value)

© OCS Consulting The flexible extension to your IT team 10 Comparing dataset metadata F. Future features 2/2 3.optional aggregated, univariate data for (mainly) numerical variables a.mean value b.median value (also approximate middle, non- missing, sorted, character value) c.(formatted) mode value (also most occurring non-missing character value) d.standard deviation e.various percentiles f.and more, e.g. distribution information and the statistics that PROC COMPARE can generate

© OCS Consulting The flexible extension to your IT team 11 Comparing dataset metadata QUESTIONS & ANSWERS

© OCS Consulting The flexible extension to your IT team 12 Q&A: Comparing dataset metadata SAS name literal A name expressed as a string within quotes, followed by the letter N. Applicable to variable names, statement labels and imported variable and table names from DBMS tables (e.g. Excel). Advantage: more compatibility. Example: = 'a SAS name literal'; More information in: SAS Language Reference: Concepts.

© OCS Consulting The flexible extension to your IT team Q&A: Comparing dataset metadata Straightforward inventory of metadata 1.save results of PROC CONTENTS (or of the CONTENTS statement of PROC DATASETS for one or more libraries) to datasets, 2.if desired keep the most important variables LibName, MemName, Name, Label, Type, Length, Format, FormatL, FormatD, Informat, InformL and InformD; 3.concatenate all metadata datasets (SET); 4.if desired sort by variable NAME. This generates all dataset and variable information in subsequent records. 13