Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health

Slides:



Advertisements
Similar presentations
Introduction to STATA About STATA Basic Operations Regression Analysis
Advertisements

Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
EViews Student Version. Today’s Workshop Basic grasp of how EViews manages data Creating Workfiles Importing data Running regressions Performing basic.
Describing Quantitative Variables
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
1. Overview Brief guide to the display windows and toolbar
Generating new variables and manipulating data with STATA Biostatistics 212 Lecture 3.
IB Math Studies – Topic 6 Statistics.
Sociology 601 Class 8: September 24, : Small-sample inference for a proportion 7.1: Large sample comparisons for two independent sample means.
 Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.
STATA TUTORIAL: LAB STATA windows  The command window  The viewer/results window  The review of commands window  The variable window.
Introduction to Statistical Computing in Clinical Research Biostatistics 212 Course director: Mark Pletcher Teaching Assistant: Lee Zane.
Examine the data Hsien-Ming Lien Dept of Public Finance, NCCU.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Descriptive statistics (Part I)
Getting Started with your data
Alok Srivastava Chapter 2 Describing Data: Graphs and Tables Basic Concepts Frequency Tables and Histograms Bar and Pie Charts Scatter Plots Time Series.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
PY550 Research and Statistics Dr. Mary Alberici Central Methodist University.
Fundamentals of Statistical Analysis DR. SUREJ P JOHN.
Day 1: Getting Started Department of Economics
Teaching with Stata Peter A. Lachenbruch & Alan C. Acock Oregon State University
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
Key Data Management Tasks in Stata
Chapter 11 Descriptive Statistics Gay, Mills, and Airasian
Computing for Research I Spring 2012 Exploratory Data Analysis and Hypothesis Testing February 21 Primary Instructor: Elizabeth Garrett-MAyer.
Introduction to Statistical Computing in Clinical Research Biostatistics 212.
 Statistics The Baaaasics. “For most biologists, statistics is just a useful tool, like a microscope, and knowing the detailed mathematical basis of.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
T T03-01 Calculate Descriptive Statistics Purpose Allows the analyst to analyze quantitative data by summarizing it in sorted format, scattergram.
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education.
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.
Basics of Biostatistics for Health Research Session 1 – February 7 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
SPSS Workshop Day 2 – Data Analysis. Outline Descriptive Statistics Types of data Graphical Summaries –For Categorical Variables –For Quantitative Variables.
Computing for Research I Spring 2014 Primary Instructor: Elizabeth Garrett-Mayer Introduction to Stata February 19.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Today Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation – GOF.
Statistics Nik Bobrovitz BHSc, MSc PhD Student University of Oxford December 2015
Stata Review Session Economics 1018 Abby Williamson and Hongyi Li November 17, 2006.
Descriptive Statistics Unit 6. Variable Any characteristic (data) recorded for the subjects of a study ex. blood pressure, nesting orientation, phytoplankton.
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
1 Take a challenge with time; never let time idles away aimlessly.
Before the class starts: 1) login to a computer 2) start Stata 13.
Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.
Introduction to STATA Before you get frustrated, imagine processing data by hand and think dearly of STATA.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Topics Introduction to Stata – Files / directories – Stata syntax – Useful commands / functions Logistic regression analysis with Stata – Estimation –
Data Presentation Numerical Summary Measures Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU.
Advanced Quantitative Techniques
EHS 655 Lecture 4: Descriptive statistics, censored data
Dr. Siti Nor Binti Yaacob
Introduction to SPSS.
QM222 Class 13 Section D1 Omitted variable bias (Chapter 13.)
What we’ll cover today Transformations Inferential statistics
Econometrics 704 Emilio Cuilty
Description of Data (Summary and Variability measures)
ECONOMETRICS ii – spring 2018
UCLA IDRE STATISTICAL CONSULTING GROUP ANDY LIN FALL 2018
Lab 2 Data Manipulation and Descriptive Stats in R
Do Statistical Analysis with Stata
Introduction to Stata Spring 2017.
Stata Basic Course Lab 4.
Statistical Analysis with
Stata Basic Course Lab 2.
Hsien-Ming Lien Dept of Public Finance, NCCU
Introductory Statistics
Presentation transcript:

Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health

Outline Do files Data entry Data management Data description Estimation: Confidence Interval Hypothesis testing

Do files Stata programs –Easy to add or skip comments –One click/command can run the whole program Reproducible –Don’t need to retype all of the commands Interactive work vs. do files

Data Entry

Stata Commands 1.cd: Change directory 2.dir or ls: Show files in current directory 3.insheet: Read ASCII (text) data created by a spreadsheet 4.infile: Read unformatted ASCII (text) data 5.infix: Read ASCII (text) data in fixed format 6.input: Enter data from keyboard 7.save: Store the dataset currently in memory on disk in Stata data format 8.use: Load a Stata-format dataset 9.count: Show the number of observations 10.list: List values of variables 11.clear: Clear the entire dataset and everything else 12.memory: Display a report on memory usage 13.set memory:Set the size of memory

Ways to enter data Change the directory to the folder you like cd c:\Stata Common separated values (.csv) format files insheet using test.csv,clear (with variable names) infile gender id race ses schtyp str10 prgtype read write math science socst using hs0.raw, clear (without variable names) Stata (.dta) files use test Type in data one by one input id female race ses str3 schtype prog read write math science socst End (when you are done) What’s in the dataset? describe list

Data Management

Stata Commands 1.pwd: show: current directory (pwd=print working directory) 2.keep if: keep observations if condition is met 3.Keep: keep variables or observations 4.drop: drop variables or observations 5.append: append a data file to current file 6.sort: sort observations 7.merge: merge a data file with current file 8.codebook: show codebook information for file 9.label data: apply a label to a data set 10.order: order the variables in a data set 11.label variable: apply a label to a variable 12.label define: define a set of a labels for the levels of a categorical variable 13.label values: apply value labels to a variable 14.encode: create numeric version of a string variable 15.rename a variable 16.recode: recode the values of a variable 17.notes: apply notes to the data file 18.generate: creates a new variable 19.replace: replaces one value with another value 20.egen: extended generate - has special functions that can be used when creating a new variable

Merging two datasets test1 and test2 have the same variables but different subjects use test1 append using test2 save test12 test3 and test4 have the same subjects and only share a link variable, e.g. ID use test3, clear sort id save test3,replace use test4, clear sort id save test4,replace use test3 merge id using test4 save test34

Play with Variables use test label variable gender "Male" rename gender male gen female=1-male order id male female encode prgtype, gen(prog) codebook prog keep if female==1 (delete male) drop female

Dummy Variables A categorical variable with K possible levels Need K-1 dummy variables (one as the reference) Dummy variables are convenient for regression analysis How to create dummy variables? Use generate command –gen female=1-gender Use tabulate command –tabulate gender, gen(male) Use factor variables –xi i.gender –list,clean

Data Description

Stata Commands 1.describe: describe a dataset 2.log: create a log file 3.summarize: descriptive statistics 4.tabstat: table of descriptive statistics 5.table: create a table of statistics 6.stem: stem-and-leaf plot 7.graph: high resolution graphs 8.kdensity: kernel density plot 9.histogram: histogram for continuous and categorical variables 10.tabulate: one- and two-way frequency tables 11.correlate: correlations 12.pwcorr: pairwise correlations

Example: raw data log using test.txt, text replace use lead describe sum maxfwt, detail histogram maxfwt, by(Group) normal graph box maxfwt, by(Group) stem maxfwt kdensity maxfwt tab Group sex cor ageyrs maxfwt,sig cor ageyrs maxfwt if sex==1 (male only),sig pwcorr ageyrs maxfwt fwt_r,sig log close

Example: grouped data use group (a grouped dataset) sum age [fweight=freq],detail hist age [fweight=freq] Pretty much the same as raw data. Just need to specify the weight.

Some Review Use both location and spread measures to summarize a dataset Mean, standard deviation and range are easily affected by extreme observations Median and inter-quartile range are less affected by extreme observations Coefficient of variation (standard deviation divided by mean) removes the scale effect.

Estimation

Estimation of Parameters Binomial distribution –Parameters n (usually known) and p –How to estimate p? Poisson distribution –Parameter λ –How to estimate λ? Normal distribution –Parameters µ and σ 2 –How to estimate µ and σ 2 ? –σ 2 unknown  t distribution

Stata Commands Raw data –ci [varlist] [if] [in] [weight] [, options] confidence intervals for mean, proportion (b) and count (p) Summarry statistics –cii #obs #mean #sd [, ciin_option] Normal –cii #obs #succ [, ciib_options] Binomial

Examples gen female=sex-1 tab female Group What’s the average maxfwt for females in the exposed group? –ci maxfwt if female==1 & Group==2 (raw data) –sum maxfwt if female==1 & Group==2 –cii ,level(95) (summary statistics) What’s the proportion of females in the exposed group? –gen expose=Group-1 –ci expose if female==1,b –cii 48 16,level(95)

Hypothesis Testing

Stata Commands (mean) ttest –Raw data ttest varname == # [if] [in] [, level(#)] ttest varname1 == varname2 [if] [in], unpaired [unequal welch level(#)] ttest varname1 == varname2 [if] [in] [, level(#)] ttest varname [if] [in], by(groupvar) [options1] –Summarry statistics ttesti #obs #mean #sd #val [, level(#)] ttesti #obs1 #mean1 #sd1 #obs2 #mean2 #sd2 [, options2]

Examples One sample –Is the average maxfwt for females in the exposed group significantly lower than 45? ttest maxfwt==45 if female==1 & Group==2 ttesti (summary statistics) Two samples –Do females have a higher average maxfwt than males in the exposed group? ttest maxfwt if Group==2, by(female) sum maxfwt if female==0 & Group==2 ttesti

Stata Commands (variance) sdtest –Raw data sdtest varname == # [if] [in] [, level(#)] sdtest varname1 == varname2 [if] [in] [, level(#)] sdtest varname [if] [in], by(groupvar) [level(#)] –Summarry statistics sdtesti #obs {#mean |. } #sd #val [, level(#)] sdtesti #obs1 {#mean1 |. } #sd1 #obs2 {#mean2 |. } #sd2 [, level(#)]

Examples One sample –Is the variance of maxfwt for females in the exposed group significantly greater than 100? sdtest maxfwt==10 if female==1 & Group==2 sdtesti (summary statistics) Two samples –Do females have a greater variation in maxfwt than males in the exposed group? sdtest maxfwt if Group==2, by(female) sum maxfwt if female==0 & Group==2 sdtesti

Stata Commands (proportion) prtest –Raw data prtest varname == #p [if] [in] [, level(#)] prtest varname1 == varname2 [if] [in] [, level(#)] prtest varname [if] [in], by(groupvar) [level(#)] –Summarry statistics prtesti #obs1 #p1 #p2 [, level(#) count] prtesti #obs1 #p1 #obs2 #p2 [, level(#) count]

Examples One sample –Is it more than 50% of females in the exposed group? prtest expose==0.5 if female==1 prtesti Two samples –Are there more females in the exposed group than the control group? prtest female, by(expose) tab expose female, r prtesti

Power and Sample Size

Stata Command (sample size) One sample –continuous sampsi μ 0 μ 1, sd(.) p(.) a(.) onesam sampsi , sd(420) p(.9) onesam –Binary proportions sampsi p0 p1, p(.) onesam sampsi , p(0.9) onesam Two samples –continuous sampsi μ 1 μ 2, p(.) sd1(.) sd2(.) a(.) sampsi , p(0.8) sd1(15.34) sd2(18.23) –Binary proportions sampsi p1 p2, p(.) sampsi , p(0.9)

Stata Command (power) One sample –continuous sampsi μ 0 μ 1, sd(.) n(.) a(.) onesam sampsi , sd(10.3) n(5) onesam onesided –Binomial proportion sampsi p 0 p 1, n1(.) onesam sampsi , n1(100) onesam Two samples –continuous sampsi μ 1 μ 2, n1(.) n2(.) sd1(.) sd2(.) a(.) sampsi 9 14, n1(100) n2(100) sd1(15.34) sd2(18.23) –Binomial proportions sampsi p1 p2, n1(.) n2(.) sampsi , n1(100) n2(150)

Useful links Once the D2L site is created, all of the handouts and related materials will be posted on the D2L site.