2014 Nordic and Baltic Stata Users Group Metting Working sideways in Stata Jakob Hjort DataManager, MPH Department of Cardiology Aarhus University Hospital.

Slides:



Advertisements
Similar presentations
{ Advanced Stata Programming Andrew Hicks CCPR Statistics and Methods Core.
Advertisements

Stata Intro Practice Exercises Debby Kermer, George Mason University Libraries Data Services.
UNIT 8: Statistical Measures
INTERFACING SD CARD WITH MSP430F FEW COMMANDS IN SPI MODE  CMD0 :- Resets the sd card & used in initialization process.  CMD17 :- used to read.
Stata Intro Practice Exercises Debby Kermer, George Mason University Libraries Data Services.
Chapter 6: The Standard Deviation as a Ruler and the Normal Model.
Taking the pain out of looping and storing Patrick Royston Nordic and Baltic Stata Users’ meeting, Stockholm, 11 November 2011.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
Concatenation MATLAB lets you construct a new vector by concatenating other vectors: – A = [B C D... X Y Z] where the individual items in the brackets.
Warm-up 2.5 The Normal Distribution Find the missing midpoint values, then find mean, median and standard deviation.
Getting Started with your data
Finding the Mean & the Standard Deviation. Finding the mean & Standard Deviation Find the Mean and the Standard Deviation of 6,5,5,4,5,5,6,5 and 4 We.
STATA User Group September 2007 Shuk-Li Man and Hannah Evans.
1 CCPR Computing Services Workshop: Introduction to Stata June, 2006.
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
Niraj J. Pandya, Element Technologies Inc., NJ.  Summarize all possible combinations of class level variables even if few categories are altogether missing.
UNIT 8:Statistical Measures Measures of Central Tendency: numbers that represent the middle of the data Mean ( x ): Arithmetic average Median: Middle of.
Experiences with multiple propensity score matching Jan Hagemejer & Joanna Tyrowicz University of Warsaw & National Bank of Poland.
Key Data Management Tasks in Stata
STATA Mini Course Fall 2015 Jane Leber Herr Littauer 113 1Stata Mini Course – Spring 2015.
A Brief Introduction to PROC TRANSPOSE prepared by Voytek Grus for
L3: BIG STATA CONCEPTS Getting started with Stata Angela Ambroz May 2015.
Functional Databases for Longitudinal Analyses and Tips of the Trade: The Case of the NPHS in Canada. Amélie Quesnel-Vallée McGill University Émilie Renahy.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Measures of Center.
Summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability.
Foundations of Math I: Unit 3 - Statistics
Statistics and Modelling Topic 1: Introduction to statistical analysis Purpose – To revise and advance our understanding of descriptive statistics.
Dec-15H.S.1 Stata 8, Programing Hein Stigum Presentation, data and programs at:
1 Statistical Software Programming. STAT 6360 –Statistical Software Programming Sorting, Printing, Summarizing Data Now that we can input data and do.
Ch3: Exploring Your Data with Descriptives 15 Sep 2011 BUSI275 Dr. Sean Ho HW1 due tonight 10pm Download and open “02-SportsShoes.xls”02-SportsShoes.xls.
16a. Accessing Data: Means in SPSS ®. 16a. Accessing Data: Means in SSPS ® 1 Prerequisites Recommended modules to complete before viewing this module.
Customize SAS Output Using ODS Joan Dong. The Output Delivery System (ODS) gives you greater flexibility in generating, storing, and reproducing SAS procedure.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Data Management Research Methods Professional Development Institute December 4, 2015.
Lab 1 Writing Interactive Queries CISB514 Advanced Database Systems.
Rasch model (MML estimation) for 12 GHQ items. Loevinger H, ppp pmm for 12 GHQ items.
Problem Set 1 Troubleshooting. Log Files Save in text format for readability: log using ps1.log, replace or: log using ps1, text.
TI-83 Plus. GraphLink: an interface helps us to exchange data between a PC and the TI- calculators.
Box Plots. Statistical Measures Measures of Central Tendency: numbers that represent the middle of the data (mean, median, mode) Mean ( x ): Arithmetic.
CS100A, Fall 1998, Lecture 201 CS100A, Fall 1998 Lecture 20, Tuesday Nov 10 More Matlab Concepts: plotting (cont.) 2-D arrays Control structures: while,
Ec 2390: Section 1 Useful STATA commands Jack Willis September 14th, 2015.
Before the class starts: 1) login to a computer 2) start Stata 13.
Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.
Descriptive Statistics using R. Summary Commands An essential starting point with any set of data is to get an overview of what you are dealing with You.
Programming with Microsoft Visual Basic 2012 Chapter 14: Access Databases and SQL.
Practical Solutions Analysing Continuous Data. 2 1)To produce the overall histogram you can use the options exactly as given. This results in the following.
2016 Mexican Stata Users Group Meeting May 17, 2016 Carlos Alberto Dorantes Dosamantes Accounting and Finance Department Monterrey Tech, Querétaro Campus.
AP Statistics 5 Number Summary and Boxplots. Measures of Center and Distributions For a symmetrical distribution, the mean, median and the mode are the.
Creating summary tables using the sumtable command
EHS 655 Lecture 4: Descriptive statistics, censored data
Lecture 3: Changing Data
Econometrics 704 Emilio Cuilty
Laugh, and the world laughs with you. Weep and you weep alone
ECONOMETRICS ii – spring 2018
Numerical Descriptives in R
Accent marks – how to.
Introduction to Stata Spring 2017.
STATA User Group September 2007
Amélie Quesnel-Vallée
Introduction to Stata II
Objectives This is an introduction to the statistical software STATA aiming at: Preparing the participants in STATA basics (interphase and commands) for.
Stata Basic Course Lab 4.
Presentation, data and programs at:
Comparing Statistical Data
First Quartile- Q1 The middle of the lower half of data.
Two Issues on Remote Data Access
R-lab 2 -Dorji Pelzom.
Fitting generalized linear models
UNIT 8: Statistical Measures
Evaluation of Public Policy
Presentation transcript:

2014 Nordic and Baltic Stata Users Group Metting Working sideways in Stata Jakob Hjort DataManager, MPH Department of Cardiology Aarhus University Hospital DK-8200 Aarhus Denmark

The rectangular dataset

Statistics The rectangular dataset

Statistics The rectangular dataset ”It is not the data we want it’s the  ssence of data” results

Datamanagement The rectangular dataset

Datamanagement The rectangular dataset

Datamanagement Statistics

Datamanagement Statistics The rectangular dataset - transpose?

use ”family.dta”, clear * Dataset with: fam_name, inc_mother & inc_father mata st_view(x=0,.,(”inc_mother”,”inc_father”)) income=colsum(x’)’ st_addvar(”long”,”inc_household”) st_store(.,”inc_household”,income) end list fam_name inc_mother inc_father inc_household The rectangular dataset – subset in matrix using mata?

generate [type] newvar=exp [if] [in] The direct approach Datamanagement

generate [type] newvar=exp [if] [in] The direct approach WeightHeightBMI Datamanagement Ex.: generate BMI=Weight/Height^2

egen [type] newvar=fcn(arguments) [if] [in] [,options] rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss, rowpctile, rowsd, concat, anycount, anymatch, anyvalue,count, diff, fill, group, iqr, kurt, max, mdev, mean, median, min, mode, mtr, pc, pctile, rank, sd, seq, skew, std, tag, total The direct approach Datamanagement

egen [type] newvar=fcn(arguments) [if] [in] [,options] rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss, rowpctile, rowsd, concat, anycount, anymatch, anyvalue,count, diff, fill, group, iqr, kurt, max, mdev, mean, median, min, mode, mtr, pc, pctile, rank, sd, seq, skew, std, tag, total The direct approach IncJanIncFebincome Datamanagement Ex.: egen income=rowtotal(inc*) IncMarIncAprIncMay IncJunIncJul…

program define _growmin version 6, missing gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0 syntax varlist [if] [in] [, BY(string)] if `"`by'"' != "" { _egennoby rowmin() `"`by'"' } tempvar touse mark `touse' `if' `in' quietly { gen `type' `g' =. tokenize `varlist' while "`1'"!="" { replace `g' = cond(`1' < `g',`1',`g') mac shift } end Looking under the skirts – just for inspiration viewsource _growmin.ado the rowmin() function of egen

program define _growmin version 6, missing gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0 syntax varlist [if] [in] [, BY(string)] if `"`by'"' != "" { _egennoby rowmin() `"`by'"' } tempvar touse mark `touse' `if' `in' quietly { 1. gen `type' `g' =. 2. tokenize `varlist' 3. while "`1'"!="" { 4. replace `g' = cond(`1' < `g',`1',`g') 5. mac shift 6. } } end Looking under the skirts – just for inspiration viewsource _growmin.ado the rowmin() function of egen 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands

 Prepare the variable-list Variables can be specified with wildcards - The expanded list is stored in `vars' (unab means unabbreviate – however the command itself can’t be un-abbreviated). unab vars: inc*. unab vars: incJan-incDec 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands. local vars incJan incFeb incMar incApr incMay incJun /// incJul incAug incSep incOct incNov incDec. ds inc*. ds incJan-incDec incJan incFeb incMar incApr incMay incJun incJul incAug incSep incOct incNov incDec Full specification of each and every variable – OK with 12 but what in case of hundreds? The list is stored in `vars' Variables can be specified with wildcards - The list is stored in `r(varlist)’ Nice feature: the expanded list is shown for inspection

 Looping ”foreach” is the quickest and the most transparent loop command foreach lvar in incJan incFeb { // do stuff with "`lvar'” } unab lvar: inc* foreach lvar in `lvar' { // do stuff with "`lvar'” } ds inc* foreach lvar in `r(varlist)' { // do stuff with "`lvar'” } 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands

”foreach” is the quickest and the most transparent loop command foreach lvar in incJan incFeb { // do stuff with "`lvar'” } unab lvar: inc* foreach lvar in `lvar' { // do stuff with "`lvar'” } ds inc* foreach lvar in `r(varlist)' { // do stuff with "`lvar'” } 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands alt Hold + press … on numeric keypad ` ’ Hold + press … on numeric keypad alt = = Left single-quote Right single-quote  Looping

 In the loop generate minimum=. unab vars: inc* foreach lvar in `vars' { replace minimum = cond(`lvar' < minimum,`lvar’,minimum) } generate minimum=. unab vars: inc* foreach lvar in `vars' { replace minimum = `lvar’ if `lvar’<minimum } generate minimum=. unab vars: inc* foreach lvar in `vars' { if `lvar’<minimum { replace minimum = `lvar’ } 1. Initialize target variable 2. Prepare the variable-list 3. Looping: 4. In-the-loop-commands !

Some of the danish participants who might know ”the DREAM database” will propably be able to see how these approaches can be useful when working with this fantastic but difficult construction.

Thank you very much