18 August 20151 Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1.

Slides:



Advertisements
Similar presentations
A PowerPoint®-based guide to assist in choosing the suitable statistical test. NOTE: This presentation has the main purpose to assist researchers and students.
Advertisements

STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Chapter 16 Introduction to Nonparametric Statistics
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Data Processing, Fundamental Data Analysis, and Statistical Testing of Differences CHAPTER.
Basic Statistical Review
Chapter 14 Analysis of Categorical Data
Test statistic: Group Comparison Jobayer Hossain Larry Holmes, Jr Research Statistics, Lecture 5 October 30,2008.
A Simple Guide to Using SPSS© for Windows
Data Analysis Statistics. Inferential statistics.
Student’s t statistic Use Test for equality of two means
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Biostatistics in Research Practice: Non-parametric tests Dr Victoria Allgar.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Statistical Analysis KSE966/986 Seminar Uichin Lee Oct. 19, 2012.
Stats & Excel Crash Course Jim & Sam April 8, 2014.
Nonparametrics and goodness of fit Petter Mostad
Leedy and Ormrod Ch. 11 Gray Ch. 14
8 October PASW-SPSS Predictive Analytics SoftWare Statistical Package for Social Sciences Collect data from experiments or questionnaires or time.
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242.
Statistical Analysis I have all this data. Now what does it mean?
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
11 Aug Infographics Overview on What is important Traditional graphs Charts for specific usages Common misleading factors Read also the two books.
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Quantitative Research in Education Sohee Kang Ph.D., lecturer Math and Statistics Learning Centre.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
Covariance and correlation
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Biostatistics – A Revisit What are they? Why do we need them? Their relevance and importance.
Statistical Analysis I have all this data. Now what does it mean?
Copyright © 2012 Pearson Education. Chapter 23 Nonparametric Methods.
Analyzing and Interpreting Quantitative Data
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 4 l Introduction to Statistical Software Package 4.1 Data Input 4.2 Data Editor 4.3 Data.
Ordinally Scale Variables
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
CADA Final Review Assessment –Continuous assessment (10%) –Mini-project (20%) –Mid-test (20%) –Final Examination (50%) 40% from Part 1 & 2 60% from Part.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
1 An Introduction to SPSS for Windows Jie Chen Ph.D. 6/4/20161.
Introduction to Quantitative Research Analysis and SPSS SW242 – Session 6 Slides.
Chapter Twelve Copyright © 2006 John Wiley & Sons, Inc. Data Processing, Fundamental Data Analysis, and Statistical Testing of Differences.
Lesson 15 - R Chapter 15 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Chapter 13 CHI-SQUARE AND NONPARAMETRIC PROCEDURES.
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
SPSS Workshop Day 2 – Data Analysis. Outline Descriptive Statistics Types of data Graphical Summaries –For Categorical Variables –For Quantitative Variables.
Chapter 6: Analyzing and Interpreting Quantitative Data
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
CD-ROM Chap 16-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition CD-ROM Chapter 16 Introduction.
Analisis Non-Parametrik Antonius NW Pratama MK Metodologi Penelitian Bagian Farmasi Klinik dan Komunitas Fakultas Farmasi Universitas Jember.
Nonparametric Statistics
Tuesday PM  Presentation of AM results  What are nonparametric tests?  Nonparametric tests for central tendency Mann-Whitney U test (aka Wilcoxon rank-sum.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Biostatistics Nonparametric Statistics Class 8 March 14, 2000.
Power Point Slides by Ronald J. Shope in collaboration with John W. Creswell Chapter 7 Analyzing and Interpreting Quantitative Data.
Hypothesis Testing Procedures Many More Tests Exist!
Analyzing Data. Learning Objectives You will learn to: – Import from excel – Add, move, recode, label, and compute variables – Perform descriptive analyses.
Data Workshop H397. Data Cleaning  Inputting data  Missing Values  Converting String Variables  Creating Scales  Creating Dummy Variables.
Dr.Rehab F.M. Gwada. Measures of Central Tendency the average or a typical, middle observed value of a variable in a data set. There are three commonly.
Chapter 15 Analyzing Quantitative Data. Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories.
EMPA Statistical Analysis
PSYC 355Competitive Success/snaptutorial.com
PSYC 355 Education for Service-- snaptutorial.com
Presentation transcript:

18 August Statistical Analysis with R Questionnaires Variables organization Descriptive analysis Graphs Statistical tests 1

18 August R Statistical package 4 th generation programming language extensible through functions and extensions environment for statistical computing and graphics statistical and graphical techniques extensible through packages Competitors: SPSS, Matlab 2

Variables 18 August Scale or numeric variables time, age, weight, distance in Kilometers, length, number of children, GDP Nominal or categorical variables country of residence, sex, degree course Ordinal variables education level, rankings, Likert scale in statistical analysis are often considered as nominal or scale variables Questionnaire overview

Missing values 18 August NA: means "not available", are inserted manually by you whenever datum is missing NaN: means "not a number", whenever calculation cannot be done for this datum Are skipped in any statistical analysis Any math operation with them gives NaN 4

Portable R 18 August Portable R Download from my website already preconfigured or download from Uncompress it on your computer’s hard disk or on an USB pendrive or install R on your computer Download from Install it on your computer Try desperately to set the language to English 5

Installing packages 18 August To install R commander Packages  Install Package(s)...  CRAN Mirror  Rcmdr wait for installation of Rcmdr and additional packages To load R commander Packages  Load Package...  Rcmdr to warning on missing packages answer Yes answer to download them from CRAN Learn to load an R package 6

Running R commander 18 August Whenever you want to run it Packages  Load Package...  Rcmdr File  Change Working directory R commander has problems navigating through your directories’ tree Choose an easy-to-find directory, such as your Desktop or the place where you keep your R exercises. 7

Files to save 18 August R commander windows script, contains the written instructions R commander  File  Save Script as… output, contains the output R commander  File  Save Output as… pasting them into a text file Workspace contains the data structure File  Save Workspace… R commander  File  Save R workspace As… File  Load Workspace… 8

data.frame or dataset 18 August database table suited for statistical analysis case names are optional 9

Building a new dataset 18 August R commander  Data  New data set … Insert all variables first Only after insert data and build a codebook use numbers for nominal and ordinal variables Convert nominal and ordinal variables to factor R commander  Data  Manage variables in active data set  Convert numeric variables to factor Convert ordinal variables to ordered Submit the 3 lines of code with ordered instead of factor ls.str() and str(dataset) 10

Importing dataset 18 August R commander  Data  import from a package  Data in packages import from a text file  Import Data  from text file, clipboard or URL… import from Excel (hoping that it works  )  Import Data  from Excel, Access or dBase data set… export to a text file  Active data set  Export active data set… 11

Importing dataset from SPSS 18 August written here just in case you'll ever need it; better and easier converting to text file! R commander  Data  Import Data  from SPSS data set… Pay attention to value labels and factors date importing is wrong! Fix it with library(chron) var <- as.chron(ISOdate(1582, 10, 14) + var) 12

Univariate descriptive analysis 18 August Statistics  Summaries  For scale variables  Numerical summaries For ordinal and nominal variables  Frequency distributions 13

Graphs for one nominal variable Column plot 18 August

Graphs for one nominal variable Pie chart Radar graph 18 August

Graphs for one nominal variable Bar plot Line plot 18 August

Graphs for one nominal variable Area plot 3D variants 18 August

Graphs for one nominal variable 18 August R commander  Graphs  Color palette…  Bar graph…  Pie chart… To change colors, add option col=c(number of colors from palette) to text command, select text command and submit it 18

Graphs for one scale variable Building an histogram grouping into bins 18 August $1,000$2,000$3,000$4,000$5,

Graphs for one scale variable Choosing the bins carefully 18 August $1,000$2,000$3,000$4,000$5,

Graphs for one scale variable Boxplot Median in black line Central 50% is in the rectangle Central 90% is between whiskers Extremes are symbols 18 August

One scale variable case by case Only for scale variable with few cases Use any appropriate nominal variable graph 18 August

Graphs for one scale variable 18 August R commander  Graphs  Histogram…  Boxplot…  Index plot… 23

Bivariate analysis: nominal vs nominal 18 August Statistics   Contingency table  Two-way table… Percentages Understand clearly when using row percentages and column percentages 24

Graphs for nominal vs nominal Side by side Stacked 18 August

Graphs for nominal vs nominal Appropriate 3D variants 18 August

Graphs for nominal vs nominal a rare example of a useful stacked area chart 18 August

Graphs for nominal vs nominal 18 August No available graph in R  as far as I know How to export your graphics into Word right-click  copy as bitmap 28

Bivariate analysis: scale vs nominal 18 August Statistics  Summaries   Numerical summaries  Summarize by groups…  Table of statistics… 29

Graphs for scale vs nominal Boxplot side by side Histogram one above the other 18 August

Graphs for two variables 18 August R commander  Graphs  Boxplot…  Plot by groups… 31

Bivariate analysis: scale vs scale 18 August Statistics  Summaries   Correlation matrix Pearson linear correlation Spearman rank correlation 32

Scale versus scale Scatterplot 18 August

Scale versus scale Mathematical graph Regression line 18 August

Graphs for two variables 18 August R commander  Graphs  Scatterplot… Remove all the unnecessary options  Line graph… (mathematical graph) X variable must have values in order 35

Multivariate analysis 18 August Statistics  three nominal  Contingency table   Multi-way table three scale  Summaries  Correlation matrix 36

Graphs for three scale variables Surface plot 18 August

Graphs for three scale variables Bubble chart 18 August

Graphs for two scale and one nominal variables 18 August R commander  Graphs  Scatterplot…  Plot by groups… 39

Restrict data set 18 August R commander  Data  Active Data Set  Subset active data set… Used to restrict data set to some cases Use labels and not numbers for nominal variables!  Remove cases with missing data… 40

Recode 18 August Used to create or modify factor/ordered variables R commander  Data  Manage variables in active data set  Recode variables… "Bolzano"="here" c("Munich","Hannover",“Bonn") = "Germany“ Do not use "Munich","Hannover",“Bonn" = "Germany” as suggest by help else= "Others" For numerical variableswe may use also 8:27= "high" together with lo and hi Massive recoding 41

Binning 18 August Used to group scale variables into ordered (but it produces factor) R commander  Data  Manage variables in active data set  Bin numeric variable… 42

Compute 18 August Used to create new variable through math operations R commander  Data  Manage variables in active data set  Compute new variable… newvector <- with(dataset, formula) CO2$myname <- with(CO2, uptake*7-sqrt(conc) ) it is identical to CO2$myname <- CO2$uptake*7-sqrt(CO2$conc) 43

Computing (line command) 18 August Instruction produced by compute CO2$myname <- with(CO2, uptake*7-sqrt(conc) ) can be easily typed directly by you! Or you can type CO2$myname <- CO2$uptake*7-sqrt(CO2$conc) Variables’ names must be preceded by dataset’s name and $ <- means take things from the right and put on the left 44

Computing (line command) 18 August If you do not specify dataset$, variable will be created outside the dataset with only one case (unless otherwise specified) print(variable) to look at it Variable assignment variable <- value or formula, value or formula -> variable + - * / ** 45

Computing (line command) 18 August Variable with many cases outside dataset is called “vector” vector <- c(list of items) to create it manually vector[index] to access a specific vector’s element vector[from:to] to access a sequence of vector’s elements 46

18 August Statistical tests Example: we want to study the age of Internet users, checking whether the average age is 35 years or not The only information we have are the observations on a sample of 100 users, which are: 25; 26; 27; 28; 29; 30; 31; 30; 33; 34; 35; 36; 37; 38; 30; 30; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 51; 52; 20; 54; 55; 56; 57; 20; 20; 20; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49; 50; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35; 36; 37; 35.

18 August Statistical tests Test’s hypotheses: H 0 : average age on population is 35 H 1 : average age on population is not 35 We calculate the age average on the sample, 36.2, which is an estimation for the average population’s age. We compare this result with the 35 of the H 0 hypothesis and we find a difference of We ask ourselves whether this difference is: large, implying that the average population’s age is not 35 and thus H 0 must be rejected small and it can be caused by random fluctuation in the sample choice and therefore H 0 must be accepted.

18 August Statistical tests In order to answer, the test provides us with a significance: probability that H 0 is not false In this example significance is 16% If significance is large, we accept H 0 this implies that we do not know If significance is small, we reject H 0 this implies that we are almost sure that H 0 is false Significance is also called p-value

18 August Typical univariate analysis techniques Variables Numerical description Graphical description Parametric test Non- parametric test nominal Frequencies (one-dimensional contingency table) Column plot Pie chart --- Chi-square for a one-dimensional contingency table scale Descriptive statistics Histogram Boxplot Student’s t for one variable Sign test

18 August Tests for one scale variable Student’s t test for one var H0: avg on the population = m Statistics  Means  Single-sample t-test Sign test H0: median on the population = m Not available in R commander

18 August Tests for one nominal variable Chi-square test for a one-dimensional contingency table H0: classification follows a predetermined distribution Statistics  Summaries  Frequencies Distributions…  Chi-square

18 August Typical bivariate analysis techniques Variables Numerical description Graphical description Parametric test Non-parametric test nominal vs nominal 2D contingency table Clustered or stacked or 3D column plot --- Chi square for a 2D contingency table binary nominal vs scale Descriptive statistics by groups Boxplots or histograms by groups Student’s t for two populations Mann-Whitney non binary nominal vs scale One-way analysis of variance (ANOVA) Kruskal-Wallis scale vs scale Person’s or Spearman’s correlation Scatterplot Pearson’s correlation Student’s t for paired data Spearman’s correlation Wilcoxon signed rank test

18 August Tests for two nominal variables Chi-square test for a two-dimensional contingency table H0: classification of two variables is independent Statistics  Contingency table  Two-way table…  Statistics  Chi-square test of independence Warning: you should have no expected frequency less than 5

18 August Test for binary nominal vs scale Student’s t test for two pop H0: average group 1 = average group 2 Statistics  Means  Independent samples t-test Warning: scale variable should be normally distributed on two groups

18 August Non-parametric test for binary nominal vs scale Mann-Whitney Wilcoxon rank-sum It tests the ranks H0: position group 1 = position group 2 Statistics  Nonparametric tests  Two- samples Wilcoxon test

18 August Test for non-binary nominal vs scale ANOVA (ANalysis Of VAriance) H0: average is the same for all groups Statistics  Means  One-way ANOVA Test rejects if just one population’s average is different than the others Warning: scale variable should be normally distributed for each group

18 August Non-parametric test for non- binary nominal vs scale Kruskal-Wallis It tests the ranks H0: position is the same for all groups Statistics  Nonparametric tests  Kruskal- Wallis test

18 August Tests for two scale variables Pearson’s and Spearman’s correlation tests H0: correlation = 0 Statistics  Summaries  Correlation test

18 August Tests for difference of two scale variables When using tests on variables differences Student’s t test for paired data H0: average (var 1 – var 2) = 0 Statistics  Means  Paired t test Warning: distribution of difference of scale variables must be normal

18 August Nonparametric test for two scale paired variables Wilcoxon signed-rank test It tests the ranks H0: var 1 – var 2 is positioned around 0 Statistics  Nonparametric tests  Paired-samples Wilcoxon test

18 August Is a variable normally distributed? Histogram with normal curve Find out average a and standard deviation s Build an histogram with appropriate binning close it, add prob=TRUE and rebuild it do not close it! curve(dnorm(x, mean=a, sd=s), col="blue", lwd=2, add=TRUE, yaxt="n") Q-Q plot (data must be on the line) Graphs  Quantile-comparison Plot

18 August Is a variable normally distributed? Skewness negative: tail left, positive: tail right excess Kurtosis negative : flat, 0: normal, positive: too pointy Statistics  Summaries  Numerical summaries  Options Shapiro-Wilk normality test H0: variable comes from a normal distribution Statistics  Summaries  Shapiro-Wilk test of normality