Session I How to use STATA & Basic Data Management Commands.

Slides:



Advertisements
Similar presentations
Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
Advertisements

Anita M. Baker, Ed.D. Jamie Bassell Evaluation Services Program Evaluation Essentials Evaluation Support 2.0 Session 2 Bruner Foundation Rochester, New.
Departments of Medicine and Biostatistics
Introduction to SPSS Allen Risley Academic Technology Services, CSUSM
INTRODUCTION TO STATA Võ Tuấn Khoa Trần Thế Trung.
Lecture 10 Non Parametric Testing STAT 3120 Statistical Methods I.
Chapter 12 Chi-Square Tests and Nonparametric Tests
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
A Simple Guide to Using SPSS© for Windows
Statistics for Managers Using Microsoft® Excel 5th Edition
Nemours Biomedical Research Statistics March 26, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Student’s t statistic Use Test for equality of two means
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Getting Started with your data
Quantifying Data.
Statistical Analysis KSE966/986 Seminar Uichin Lee Oct. 19, 2012.
Introduction to SPSS (For SPSS Version 16.0)
Data Management & Basic Analysis Interpretation of Diagnostic test.
ANALYSIS OF VARIANCE. Analysis of variance ◦ A One-way Analysis Of Variance Is A Way To Test The Equality Of Three Or More Means At One Time By Using.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 12-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Stata Workshop #1 Chiu-Hsieh (Paul) Hsu Associate Professor College of Public Health
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
NONPARAMETRIC STATISTICS
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
18 August Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1.
API-208: Stata Review Session Daniel Yew Mao Lim Harvard University Spring 2013.
How to use SPSS ? Aug, 17, 2011 Hirohide Yokokawa, M.D., Ph.D. Department of General Medicine, Juntendo University School of Medicine.
Where are we?. What we have covered: - How to write a primary research paper.
A Repertoire of Hypothesis Tests  z-test – for use with normal distributions and large samples.  t-test – for use with small samples and when the pop.
Biostat 200 Lecture 7 1. Hypothesis tests so far T-test of one mean: Null hypothesis µ=µ 0 Test of one proportion: Null hypothesis p=p 0 Paired t-test:
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
ANOVA (Analysis of Variance) by Aziza Munir
2nd Half Review ANOVA (Ch. 11) Non-Parametric (7.11, 9.5) Regression (Ch. 12) ANCOVA Categorical (Ch. 10) Correlation (Ch. 12)
Using Excel Biostatistics 212 Lecture 4. Housekeeping Questions about Lab 3? –replace vs. recode –Cross-checking/recoding missing values –Analysis of.
SPSS Basics and Applications Workshop: Introduction to Statistics Using SPSS.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
MK346 – Undergraduate Dissertation Preparation Part II - Data Analysis and Significance Testing.
Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Analyses using SPSS version 19
Nonparametric Statistics
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
ANALYSIS PLAN: STATISTICAL PROCEDURES
STATA for S-052 M. Shane Tutwiler Your Friendly S-040 Lecturer William Johnston IT Services Harvard Graduate School of Education.
Basics of Biostatistics for Health Research Session 3 – February 21, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health Sciences.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
NON-PARAMETRIC STATISTICS
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
PSY6010: Statistics, Psychometrics and Research Design Professor Leora Lawton Spring 2007 Wednesdays 7-10 PM Room 204.
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.
Chapter 4 Selected Nonparemetric Techniques: PARAMETRIC VS. NONPARAMETRIC.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Bivariate analysis. * Bivariate analysis studies the relation between 2 variables while assuming that other factors (other associated variables) would.
If sig is less than 0.05 (A) then the test is significant at 95% confidence (B) then the test is significant at 90% confidence (C) then the test is significant.
Introduction to the SPSS Interface
EHS 655 Lecture 4: Descriptive statistics, censored data
BIOSTATISTICS Qualitative variable (Categorical) DESCRIPTIVE
Statistical Inference for more than two groups
ECONOMETRICS ii – spring 2018
Introduction Introduction to Stata 2016.
Statistics in SPSS Lecture 9
Stata Basic Course Lab 4.
Introduction to the SPSS Interface
Presentation transcript:

Session I How to use STATA & Basic Data Management Commands

What will be covered?  Introduction to STATA Software  General Guidelines in Data entry  Data Management in STATA

Introduction to STATA

Open & Close the Output File To open the log file log using “directory\path\filename.log” log using d:\trials\zinc.log To close log close zinc.dta

To Open Log (Output) File

To Close the Log File

Append & Replace the Existing Log File To append the existing log file log using d:\trials\zinc.log, append To replace the existing log file log using d:\trials\zinc.log, replace

Open the Data File To open the data file use “directory\path\filename.dta” use d:\trials\zinc.dta To save save zinc.dta zinc.dta

To Make A New Directory

To Change the Directory

General Guidelines in Data Entry  Rows in the datasheet should contain individual information - Record.  Each column should contain values of a single entity of all the individuals – Variable.  Variable name should not exceed more than eight characters.  Variables can be either numeric or string or alphanumeric.  A numeric variable must posses only numbers.  In any datasheet, identification number is must.

DATA DESCRIPTION

Data Management using STATA

 Inputting Data  Editing Data  Creating and Changing Variables  Saving and Reusing Data  Data Reorganization  Merging and Appending datasets Data Management using STATA

Inputting Data   Enter data from keyboard – –input varlist – – input str25 name age str1 sex – –Best way is copy from excel and directly paste the data to STATA editor – –Transfer from other programs

Arithmetic Operators + (Addition) - (Subtraction) * (Multiplication) / (Division) ^ (Raise to power)

Relational Operators > (greater than) < (less than) > = (greater than or equal) < = (less than or equal) = = (equal) != (not equal)

Logical Operators & (and) | (or) != (not equal)

Expressions If – used when expression is to be specified with the condition In – used when range is to be specified in the condition

Editing Data   Edit using Data Editor – – edit [varlist] [if] [in] – – edit treatment centre age – – edit treatment age if centre==3&age>25

Browsing Data  List using Data Editor – browse [varlist] [if] [in] – browse treatment centre age – browse treatment age if centre==3&age>25

Do this Exercise…  Edit the following: –pcode, treatment and cough only for centre 4 –browse for the same and feel the difference zinc.dta

Creating & Changing Variables Creating & Changing Variables  Create new variable – generate newvar = exp [if] [in] – gen totstl24= s1_tstool_wt+ s2_tstool_wt+ s3_tstool_wt

Do this Exercise…… Generate total stool output from 0-48 hours zinc.dta

Creating & Changing Variables …contd   Change contents of existing variable – –To replace   replace oldvar =exp [if] [in]   replace sodium1 =. if sodium1==0 – –To recode   recode varlist (erule) [(erule)...] [if] [in]   recode age min/6=1 7/11=2 12/max=3, gen(agecat) RuleExampleMeaning # = # # # = # #/# = # nonmissing = # missing = # 3 = = 5 4/8 = 3 nonmissing = 2 missing = 9 3 recoded to 1 2 and 4 recoded to 5 4 through 8 recoded to 3 all other nonmissing to 2 all other missing to 9

Do this Exercise…… Replace all zeros in serum Potassium as missing. Ex 1: Replace all zeros in serum Potassium as missing. Ex 2: Recode pre admission diarrhea duration into 0-24h, 25-72h and > 72h zinc.dta

  Rename the existing variable – – rename oldvarname newvarname – – ren tlc_t2 tlc2 – – ren tlc_t3 tlc3   Eliminate the existing variable – –To drop   drop varlist   drop name address – –To keep   keep varlist   keep idno age sodium albumin-tlc Creating & Changing Variables …contd zinc.dta

Saving & Reusing Data in Stata Format   To Save data – – save filename.dta – – save zinc, replace – – clear   To reuse data – – use filename – – use zinc zinc.dta

Data Reorganization   Sorting observations and changing variable order – – To sort   sort varlist [in] {ascending}   sort pcode – – Move specified variables to front of dataset   order varlist – – Move one variable to specified position   move varname1 varname2 – – Alphabetize specified variables and move to front of dataset   aorder [varlist] zinc.dta

Data Reorganization …contd   Convert data from wide to long – – reshape long stubnames, i(varlist) j(varname) – – reshape long albumin, i(pcode) j(time) Wide Shape DataLong Shape Data

Data Reorganization …contd   Convert data from long to wide – – reshape wide stubnames, i(varlist) j(varname) – – reshape wide albumin, i(pcode) j(time) Long Shape DataWide Shape Data

Do this Exercise… Convert serum zinc from wide to long shape data using zinclab.dta zinclab.dta

Answer!!! zinclab.dta

Merging & Appending Datasets   To append datasets – – append using filename   use zinc1.dta   append using zinc2.dta   To merge datasets – – merge [varlist] using filename   use zinclab   sort pcode   save zinclab, replace   use zincprognostic   sort pcode   merge pcode using zinclab zinclab.dta

Merge file 1 (zinclab.dta) with file 2 (zincprognosis.dta) Do this Exercise… zinclab.dta

Session II Data Cleaning & Preparing Data for Analysis

Preparing Data for Analysis Inclusion criteria ≤ 35 months old children

Preparing Data for Analysis …contd

Do this Exercise… Inclusion criteria for the study was pre admission diarrhea duration < 7 days Ex 1: Convert pre admission diarrhea duration from hours to days using zincclean.dta Ex 2: Find values beyond expected range zinc.dta

Answer!!!

Preparing Data for Analysis …contd

Do this Exercise… Do similar exercise for hemoglobin using zinc.dta zinc.dta

Answer!!!

Preparing Data for Analysis …contd What do you mean by 1 & 2??? zinc.dta

Preparing Data for Analysis …contd Label name

Preparing Data for Analysis …contd What is wrong and how to correct it??? zinc.dta

Preparing Data for Analysis …contd

Generate total stool output for first 48 hrs Do this Exercise… zinclean.dta

Preparing Data for Analysis …contd

Draw a boxplot and identify extreme value, if any, for s2_tstool_wt using zincclean.dta Do this Exercise… zincclean.dta

Session III Introduction to Basic Data Analysis

What will be Covered?   Descriptive Statistics   Parametric tests   Non-parametric tests

Analyses  Univariate (one variable at a time)  Bivariate (two variables at a time)  Multivariate (more than two variables at a time)

Descriptive Statistics

Univariate Analysis Quantitative Mean Median Range/IQ Range SD Categorical CategoricalFrequencypercentage

Descriptive Statistics-Categorical Variable Can we label the variables???

Contingency Table

Contingency Table …contd

Immediate commands

Ex 1: Draw a crosstab between treatment and withdrawn using zinc.dta Ex 2: Draw a crosstab between treatment and diarr24, diarr48 Do this Exercise… zinc.dta

Descriptive Statistics-Quantitative Variable

Summary in Detail

Calculate summary statistics for the following variables: 1.Total stool output 0-48h 2.Total ORS intake 0-24h 3.Total stool frequency in 24h before admission 4.Serum zinc at admission Do this Exercise… zinc.dta

Summary Statistics by Group

Calculate summary statistics by “treament” for the following variables: 1.Total stool output 0-48h 2.Total ORS intake 0-24h 3.Total stool frequency in 24h before admission 4.Serum zinc at admission Do this Exercise… zinc.dta

Percentile Values

Calculate 3 rd and 97 th percentile value by “treatment” for the following variables: 1.Total stool output 0-48h 2.Total ORS intake 0-24h Do this Exercise… zinc.dta

Session IV (A) Bi-variate Analyses

Analysis of Clinical Trial Data

1. 1. Compare patient characteristics at the time of randomization and baseline measurements between the groups Assess the difference in outcome variable(s) between the groups (adjusting for any imbalance in patient characteristics or baseline outcome variables) Analysis of Clinical Trial Data

1.Categorical vs Categorical 2.Categorical vs Quantitative Bi-variate Analyses

1. Categorical Vs Categorical UnrelatedRelated -Chi square testMcNemar test - Fishers Exact test X=2, Y=2X>2, Y>2 Unrelated - Chi square test - Fishers Exact test X :Group variable Y :Outcome variable

Chi-square test

Is there a difference between the proportion of patients requiring IV fluids in the two treatment groups? Do this Exercise… zinc.dta

Chi-square Test/Fisher’s exact Test by Group

Comparison of two proportions

1.Is there a difference in the proportion of patients recovered in rota virus negativity between the two treatment groups? 2.91% of patients recovered in treatment A (n=248) and 95% of patients recovered in treatment B (n=252). Test these proportions and find out the p-value Do this Exercise… zinc.dta

McNemar’s Chi-square Test

McNemar’s Chi-square Test …contd < <

Is there a shift in zinc deficiency from baseline after giving treatment B? Do this Exercise… zinc.dta

2. Categorical vs Quantitative X=2 &Y: Normal UnrelatedRelated Student’s t testPaired ‘t’ test X=2 &Y: Non Normal UnrelatedRelated Wilcoxon ranksumWilcoxon signrank X>2 &Y: Non-NormalX> 2 &Y: Normal UnrelatedRelated One wayRepeated ANOVAmeasures ANOVA Unrelated Related Kruskal Wallis Freidmans test Parametric Non-Parametric

Student’s ‘t’ Test for Independent Groups

Student’s ‘t’ Test for Independent Groups …contd

What is the Difference in the Total ORS Intake in the First 24h between the Two Groups?

Transformations

Transformations …contd

Ex 1: What is the difference in total stool output 0-48hours between the two groups? Ex 2: Is there a difference between total duration of diarrhea (in hours) (varname: tot_du_dia_h) between the two treatment groups? Do this Exercise… zinc.dta

Geometric Mean if Log Transformation is Used

Do this Exercise Ex: Calculate the geometric mean for stool output 0-48 hours zinc.dta

Paired t-Test

Do this Exercise… Is there a change in zinc value from baseline after giving treatment B? zinc.dta

Is there a Change in the Serum Zinc from Baseline to Recovery between Two Treatment Groups? Discuss………..

One-way ANOVA * * Analysis of Variance

Multiple Comparisons Difference in means of zinc values between age group of ≤6 & > 12 P-value

Non-Parametric Methods

Is there a difference in total stool output in the first 24h between the two treatment groups? Answer: Wilcoxon Ranksum test

Is there a difference in total stool output in the first 24h between the two treatment groups? …contd Answer: Wilcoxon Ranksum test

Do this Exercise… Is there a difference in total diarrhea duration between the two groups? zinc.dta

Is there a Change in zinc from baseline after giving treatment A? Answer: Wilcoxon signed-rank test

Is There a Change in zinc from baseline after giving treatment A?

Do this Exercise… 1.Is there any difference in zinc from baseline after giving treatment B? zinc.dta

Is there a difference in total stool output across age groups? …… Contd Answer: Kruskal-Wallis Test

…… Contd Is there a difference in total stool output across age groups? Answer: Kruskal-Wallis Test

… Contd Is there a difference in total stool output across age groups? Answer: Kruskal-Wallis Test

…… Contd Is there a difference in total stool output across age groups? Answer: Kruskal-Wallis Test

Do this Exercise… 1.Is there any difference in serum zinc (at admission) across the age groups ? zinc.dta