Download presentation
Presentation is loading. Please wait.
Published byReynard Dean Modified over 8 years ago
1
Data Workshop H397
2
Data Cleaning Inputting data Missing Values Converting String Variables Creating Scales Creating Dummy Variables
3
Inputting and Merging Data Inputting STATA “insheet using /Users/daphnepenn/Dropbox/CleaningPractice.csv” SPSS (dropdown menu EASY) Merging “merge m:1 sch_no using "C:\Users\dmp869\Desktop\bpsschools.dta” SPSS (dropdown menu EASY)
4
Strategies for Missing Data Figure out why! Analyze only the available data (i.e. ignoring the missing data) Imputing the missing data with replacement values, and treating these as if they were observed Imputing the missing data and accounting for the fact that these were imputed with uncertainty Using statistical models to allow for missing data, making assumptions about their relationships with the available data.
5
Converting String Variables Summarizing string variables… You can’t! Convert them into numeric variables “describe” “destring, replace” (for the entire dataset) “destring var” (for a particular variable) “destring schoolethnicityw2, replace” “encode schoolethnicityw2, generate(schoolethnicityw2)” encode lowincomestatus, generate(lowincomestatus2)
6
Creating Scales Stata Average – “egen avg = rowmean(v1 v2 v3 v4)” Sum – “egen total = rowtotal(v1 v2 v3 v4)” SPSS Average – “COMPUTE MPW2=mean (MP1W2,MP2W2,MP3W2,MP4W2,MP5W2,MP6W2,MP7W2,MP8W2, MP9W2R).” Sum – “COMPUTE AGW2=AG1W2+AG2W2+AG3W2+AG4W2+AG5W2+AG6W2+AG 7W2.”
7
Creating Dummy Variables STATA “ gen newvar = oldvar ==__” gen male = 0 replace male = 1 if schoolgenderw2=="M” SPSS Dropdown menu
8
Summarizing Data and Choosing Tests tabstat ytdgpaw2, stat(me min med max) tab schoolgenderw2 schoolethnicityw2 tab schoolethnicityw22 lowincomestatus2 tabstat ytdgpaw2, s (me med sd co) by (schoolethnicityw22) http://www.som.soton.ac.uk/learn/resmethods/statistical notes/which_test.htm
9
Using appropriate statistics and graphs Report statistics and graphs depends on the types of variables of interest: For continuous (Normally distributed) variables N, mean, standard deviation, minimum, maximum histograms, dot plots, box plots, scatter plots For continuous (skewed) variables N, median, lower quartile, upper quartile, minimum, maximum, geometric mean histograms, dot plots, box plots, scatter plots For categorical variables frequency counts, percentages one-way tables, two-way tables bar charts
10
Using appropriate statistics and graphs… Z=Cat. Y=Cat.Y=Cont.Y=Cat.Y=Cont. X=Cat. Use 3-Way Table X=Cont. X=Time N/A 10 All these graphs are available in Chart Builder, from the Choose from: list.
11
Bar chart Clustered bar charts (two categorical variables) Bar charts with error bars Histogram (can be plotted against a categorical variable) Box & Whisker plot (can be plotted against a categorical variable) Dot plot (can be plotted against a categorical variable) Scatter plot (two continuous variables) Mean Median Standard deviation Range (Min, Max) Inter-quartile range (LQ, UQ) Flow chart of commonly used descriptive statistics and graphical illustrations Frequency Percentage (Row, Column or Total) Exploring data Descriptive statistics Graphical illustrations Categorical data Continuous data: Measure of location Continuous data: Measure of variation Categorical data Continuous data
12
Choosing appropriate statistical test Having a well-defined hypothesis helps to distinguish the outcome variable and the exposure variable Answer the following questions to decide which statistical test is appropriate to analysis your data What is the variable type for the outcome variable? Continuous (Normal, Skew) / Binary / If more than one outcomes, are they paired or related? What is the variable type for the main exposure variable? Categorical (1 group, 2 groups, >2 groups) / Continuous For 2 or >2 groups: Independent (Unrelated) / Paired (Related) Any other covariates, confounding factors? 12
13
13 Continuou s Categoric al Outcom e variable NormalSkew Survival 1 group 2 groups >2 groups Paired Sign test / Signed rank test Mann-Whitney U test Wilcoxon signed rank test Kruskal Wallis test 1 group 2 groups >2 groups Paired Chi-square test / Exact test Chi-square test / Fisher’s exact test / Logistic regression McNemar’s test / Kappa statistic Chi-square test / Fisher’s exact test / Logistic regression 2 groups >2 groups KM plot with Log-rank test Continuou s Spearman Corr / Linear Reg Logistic regression / Sensitivity & specificity / ROC Cox regression Two-sample t test Paired t test One-way ANOVA test Pearson Corr / Linear Reg One-sample t test Exposure variable Flow chart of commonly used statistical tests
14
Other Issues Organizing Quantitative Data Choosing the right tests Sampling
15
Favorite Stats Resources Youtube http://www.ats.ucla. edu/stat/stata/ http://www.ats.ucla
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.