A guide to the unknown…  A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time or space.

Slides:



Advertisements
Similar presentations
The INFILE Statement Reading files into SAS from an outside source: A Very Useful Tool!
Advertisements

Slide C.1 SAS MathematicalMarketing Appendix C: SAS Software Uses of SAS  CRM  datamining  data warehousing  linear programming  forecasting  econometrics.
I OWA S TATE U NIVERSITY Department of Animal Science Modifying and Combing SAS Data Sets (Chapter in the 6 Little SAS Book) Animal Science 500 Lecture.
Livelihoods analysis using SPSS. Why do we analyze livelihoods?  Food security analysis aims at informing geographical and socio-economic targeting 
25-Jun-15 JavaScript Language Fundamentals II. 2 Exception handling, I Exception handling in JavaScript is almost the same as in Java throw expression.
The Relational Database Model. 2 Objectives How relational database model takes a logical view of data Understand how the relational model’s basic components.
3 1 Chapter 3 The Relational Database Model Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Pet Fish and High Cholesterol in the WHI OS: An Analysis Example Joe Larson 5 / 6 / 09.
PROC SQL – Select Codes To Master For Power Programming Codes and Examples from SAS.com Nethra Sambamoorthi, PhD Northwestern University Master of Science.
Lecture 7: Matrix-Vector Product; Matrix of a Linear Transformation; Matrix-Matrix Product Sections 2.1, 2.2.1,
In a not gate, if the input is on(1) the output is off (0) and vice versa.
Topics in Data Management SAS Data Step. Combining Data Sets I - SET Statement Data available on common variables from different sources. Multiple datasets.
Welcome to SAS…Session..!. What is SAS..! A Complete programming language with report formatting with statistical and mathematical capabilities.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
CPS120: Introduction to Computer Science Arrays. Arrays: A Definition A list of variables accessed using a single identifier May be of any data type Can.
Introduction to SAS BIO 226 – Spring Outline Windows and common rules Getting the data –The PRINT and CONTENT Procedures Manipulating the data.
Chapter 20 Creating Multiple Observations from a Single Record Objectives Create multiple observations from a single record containing repeating blocks.
CountryData Technologies for Data Exchange SDMX Information Model: An Introduction.
Chapter 16 Processing Variables with Arrays Objectives Group variables into one- and two-dimensional arrays Perform an action on array elements Create.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
Use the UPDATE statement to: –update a master dataset with new transactions (e.g. a bank account updated regularly with deposits and withdrawals…). Not.
A Brief Introduction to PROC TRANSPOSE prepared by Voytek Grus for
EPIB 698C Lecture 2 Notes Instructor: Raul Cruz 2/14/11 1.
Describe the Program Development Cycle. Program Development Cycle The program development cycle is a series of steps programmers use to build computer.
The Relational Database Model
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Lesson 2 Topic - Reading in data Chapter 2 (Little SAS Book)
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
SINGULAR VALUE DECOMPOSITION (SVD)
Chapter 16: Using Lookup Tables to Match Data 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
1 Using the Magical Keyword “INTO” in PROC SQL Thiru Satchi Blue Cross and Blue Shield of Massachusetts Boston Area SAS Users Group April 5, 1999.
Chapter 22: Using Best Practices 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Database Systems, 9th Edition 1.  In this chapter, students will learn: That the relational database model offers a logical view of data About the relational.
3 1 Chapter 3 The Relational Database Model Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Latent Growth Modeling Byrne Chapter 11. Latent Growth Modeling Measuring change over repeated time measurements – Gives you more information than a repeated.
CPS120: Introduction to Computer Science Lecture 15 Arrays.
AP Computer Science edition Review 1 ArrayListsWhile loopsString MethodsMethodsErrors
Here’s another problem (see section 2.13 on page 54). A file contains two different types of records (say A’s and B’s) and we only want to read in the.
Chapter 4 concerns various SAS procedures (PROCs). Every PROC operates on: –the most recently created dataset –all the observations –all the appropriate.
Excel 2007 Part (3) Dr. Susan Al Naqshbandi
 2007 Pearson Education, Inc. All rights reserved C Arrays.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 3 The Relational Database Model.
Summer SAS Workshop Lecture 3. Summer SAS Workshop Website
A table is a set of data elements (values) that is organized using a model of vertical columns (which are identified by their name) and horizontal rows.
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Modifying and Combining Datasets For most tasks we need to work with multiple.
Lecture 4 Ways to get data into SAS Some practice programming
Structuring Data: Arrays ANSI-C. Representing multiple homogenous data Problem: Input: Desired output:
Review Sorting algorithms Selection Sort Insertion Sort Bubble Sort Merge Sort Quick Sort.
Computing with SAS Software A SAS program consists of SAS statements. 1. The DATA step consists of SAS statements that define your data and create a SAS.
Indexes Anthony Sealey University of Toronto This material is distributed under an Attribution-NonCommercial-ShareAlike 3.0 Unported Creative Commons License,
Chi-Square Analyses.
Quiz Week 8 Topical. Topical Quiz (Section 2) What is the difference between Computer Vision and Computer Graphics What is the difference between Computer.
Use the SET statement to: –create an exact copy of a SAS dataset –modify an existing SAS dataset by creating new variables, subsetting (using a subsetting.
Lesson 2 Topic - Reading in data Programs 1 and 2 in course notes –Chapter 2 (Little SAS Book)
Chapter 6: Modifying and Combining Data Sets  The SET statement is a powerful statement in the DATA step DATA newdatasetname; SET olddatasetname;.. run;
CS100A, Fall 1998, Lecture 201 CS100A, Fall 1998 Lecture 20, Tuesday Nov 10 More Matlab Concepts: plotting (cont.) 2-D arrays Control structures: while,
Online Programming| Online Training| Real Time Projects | Certifications |Online Classes| Corporate Training |Jobs| CONTACT US: STANSYS SOFTWARE SOLUTIONS.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 8, 13, & 24 By Tasha Chapman, Oregon Health Authority.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 3 & 4 By Tasha Chapman, Oregon Health Authority.
The Relational Database Model
Chapter 6: Modifying and Combining Data Sets
Lesson 2 Topic - Reading raw data into SAS
CITA 215 Section 3 Data Modeling.
Chapter 2: Getting Data into SAS
Chapter 3 The Relational Database Model
Instructor: Raul Cruz-Cano
Match-Merge in the Data Step
Databases and Structured Files: What is a database?
SQL set operators and modifiers.
Relational Database Operators
Presentation transcript:

A guide to the unknown…

 A dataset is longitudinal if it tracks the same type of information on the same subjects at multiple points in time or space. For example, part of a longitudinal dataset could contain specific students and their standardized test scores in six successive years.  One type of Longitudinal data is also known as “Panel data” and is data from a (usually small) number of observations over time on a (usually large) number of cross-sectional units like individuals, households, firms, or governments.

 Subset of hierarchical data — observations that are correlated because there is some tie to same unit.  E.g. in educational studies, where we observe student i in school u. Presumably there is some tie between the observations in the same school.  In such data, observe y j,u where u indicates a unit and j indicates the j’th observation drawn from that unit. Thus no relationship between y j,u and y j,u’ even though they have the same first subscript.  In true longitudinal data, t represents comparable time.

 One approach to working with longitudinal data sets is to restructure the data set- either going from one observation per subject to several or vice versa. For example, you may have several diagnosis codes in a single observation (visit) and want to compute frequencies of each possible diagnosis code. To do this, you will find it more convenient to have one observation for each diagnosis code, resulting in possibly several observations per subject.

 Data structure analysis includes making sure that all the components of the data structures are closely related, that closely related data are not in separate structures, and that the best type of data structure is being used. The data may be a lot easier to manage and understand when it is a representation which tries to abstract its relevant similarities.  Often, in data warehouses, data restructuring involves changing some aspects of the way wherein the database is logically or physically arranged.

 There are generally four types of data restructuring operations namely:  Trimming  Flattening  Stretching  Grafting  In trimming, the extracted data from the input is placed in the output without having to change any of the change in the hierarchical relationships but some unwanted components of the data removed.  In flattening, the operation produced a form from a structure branch of an input by extracting all information at the level of the values of the basic attributes of the branch.  The stretching operating can produce a data structure output which has hierarchical levels than the input.  Finally, a grafting operating involves combining two hierarchies horizontally to form a wider hierarchy by matching common values.

 In SPSS you go to data/restructure. This allows you to restructure your data from multiple variables(columns) in a single case to groups of related cases(rows) or vice versa, or you can choose to transpose your data.  SPSS SYNTAX:  VARSTOCASES  /ID=id  /MAKE trans1 FROM VAR00001 VAR00002 VAR00003 VAR00004  /INDEX=Index1(4)  /KEEP=  /NULL=KEEP. 

 You can create observations using an array staement and a do loop or you can simply transpose the existing data.  data neonatal;  infile 'F:\Thesis Docs\Data\neonatal.txt' delimiter='09'x truncover dsd missover obs= 104;  input location $ _1990_ _1991_ _1992_ _1993_ _1994_ _1995_ _1996_ _1997_ _1998_ _1999_ _2000_ _2001_ _2002_ _2003_ _2004_ _2005_ _2006_ _2007_;  run;  proc sort data=neonatal;  by location;  run;  proc transpose data=neonatal  out=neonatal2  name=year  prefix=neonatal;  by location;  var _1990_ _1991_ _1992_ _1993_ _1994_ _1995_ _1996_ _1997_ _1998_ _1999_ _2000_ _2001_ _2002_ _2003_ _2004_ _2005_ _2006_ _2007_;  run;  data neonatal3 (drop=neonatal2);  set neonatal2;  run;  proc print data=neonatal3 noobs;  run;

 Restructuring is fun!