Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Statistical Software Package

Similar presentations


Presentation on theme: "Introduction to Statistical Software Package"— Presentation transcript:

1 Introduction to Statistical Software Package
l Chapter 4 l Introduction to Statistical Software Package 4.1 Data Input 4.2 Data Editor 4.3 Data Computation and Transformation

2 Introduction to Statistical Software Package
The statistical software package used – SPSS (Statistical Package for the Social Sciences) v. 21/v. 22 Sophisticated software used by social scientists and related professional for statistical analysis. Compatible with the Windows environment. Sample average Population average “X-bar” “mu”

3 4.1 Data Input Defining variables (in Variable view): Variable names
Labels (Variable and value) Missing values Variable types Column format Measurement level 2 5 10 15 20 Defects per lot Frequency (lots) Average is 5.1 defects per lot

4 4.1 Data Input Variable names 5+7 2
Must begin with letter. The remaining can be letter, digit, period or the #, _ or $. Cannot end with period or underscore Blanks and special characters cannot be used (E.g.: !, ?, ‘ and *) Must be unique; Duplication is not allowed The length in one variable cannot exceed 64 characters Reserved keywords cannot be used as variable names (E.g.: ALL, AND, BY, EQ, GE, LE, LT, NE, NOT, OR, TO and WITH) Are not case sensitive 5+7 2

5 4.1 Data Input Labels Variable label is a full description of the variable name Optional It has a value label that can be in alphanumeric and numeric code Alphanumeric code Numeric code

6 4.1 Data Input Missing values Variable types
How to dealt with missing data – Leave the cell blank or assign missing value codes Rules for assigning missing value codes: Must be the same data type as they represent (E.g.: Missing numeric data must use numeric missing value code) Missing value code cannot occur as data in the data set By default, the choice of digit is usually 9 Variable types By default, SPSS assumes all new variables are numeric with two decimal places. Can be changed to other type (date, currency, etc) and have different number of decimal Data Ranks Rank of median = (1+8)/2 = 4.5 6 4 9 Median Average

7 4.1 Data Input Column format Measurement level
It is possible to adjust the width of the Data Editor columns and change the alignment of data in the column (left, centre or right) Measurement level Can be specified as nominal, ordinal, interval or ratio 5 -20% -10% 0% Percent change at opening Frequency Average = -8.2% Median = -8.6%

8 4.2 Data Editor Specifically, in editing and viewing data and result in SPSS, it has seven different types of windows: Data editor (Our focus) Viewer and draft viewer (Output) Pivot table editor Chart editor Text output editor Syntax editor Script editor 10 20 30 40 50 $0 $100,000 $200,000 Income Average = $38,710 Median = $27,216 Frequency

9 4.2 Data Editor Mode Data view

10 4.2 Data Editor Variable view Average, median, and mode are identical
in the case of a normal distribution

11 4.2 Data Editor Menu-driven and has variety of pull-down menus.
The main menu bar contains 12 menu: Graphs* File Utilities Edit Add-ons View Windows Data Help Transform *Available in all windows (Easier to generate new output) Analyze* Direct marketing Average Median Mode

12 4.2 Data Editor Toolbar Average Median Mode Variable File open
File print Undo Go to case Find Insert variable Weight cases Value labels Show all variables Average Median Mode File save Dialog recall Redo Go to variable Insert cases Split file Select cases Use sets Spell check Run descriptive statistics

13 4.2 Data Editor Exercise Enter the following cases: Average Median
Mode

14 4.3 Data Computation and Transformation
Data screening Objectives: Making sure that data have been entered correctly Making sure that the distributions are normal The importance of data screening May affect the validity of the results that are produced May assist in deciding; the need of data transformation, the usage of nonparametric techniques This subtopic will be demonstrated using data of work3.sav Average Median Mode

15 4.3 Data Computation and Transformation
Errors in data entry Error in data entry are common and therefore data files must be carefully screened Use Frequencies or Descriptive command Output example: Average Median Mode

16 4.3 Data Computation and Transformation
Assessing normality The assumption of normality is prerequisite of many inferential statistical techniques Normality can be explored using a combined assessment of graphic: Histogram Boxplot Normal probability plot Detrended normal plot And normality test: Kolmogorov-Smirnov and Shapiro-Wilk statistics Skewness Kurtosis Average Median Mode

17 4.3 Data Computation and Transformation
Assessing normality Performed using the Explore command in Descriptive Statistics under Analyze Histogram Need to be Bell shaped /symmetry Average Median Mode

18 4.3 Data Computation and Transformation
Assessing normality Boxplot Median should be at the center of the body (symmetric) Average Median Mode

19 4.3 Data Computation and Transformation
Assessing normality Normal probability plot Plot should fall more or less in a straight line Average Median Mode

20 4.3 Data Computation and Transformation
Assessing normality Detrended normal plot There should be no pattern to the clustering of points (The point should assemble around a horizontal line through zero) Average Median Mode

21 4.3 Data Computation and Transformation
Assessing normality Kolmogorov-Smirnov and Shapiro-Wilk statistics Kolmogorov-Smirnov Sample size Shapiro-Wilk Sample size < 100 If the significance level (Sig.) is greater than 0.05, normality is assumed Average Median Mode

22 4.3 Data Computation and Transformation
Assessing normality Skewness and Kurtosis Their value need to be close to zero to be assumed normal Basically, the data of age can be assumed normal There is no need for transformation Data can be analyzed using parametric test Average Median Mode

23 4.3 Data Computation and Transformation
Variable transformation The decision to transform variables depends on the severity of the departure from normality Most appropriate transformation method need to be selected* *Suggested by Tabachnick and Fidell (2007) and Howell (2007) C = A constant added to each score so that the smallest score is 1 K = A constant from which each score is subtracted so that the smallest score is 1; usually equal to the largest score + 1 Data distribution Transformation method [Num. Expression] Moderately positive skewness Square-Root [SQRT(X)] Substantially positive skewness Log 10 [LG10(X)] (with zero values) Log 10 [LG10 (X + C) Moderately negative skewness Square-Root [SQRT(K - X)] Substantially negative skewness Log 10 [LG10(K - X)] Average Median Mode

24 4.3 Data Computation and Transformation
To illustrate the process of transformation, the hours of exercise (hourex) will was examined using the preceding steps. Variable transformation The decision to transform variables depends on the severity of the departure from normality Most appropriate transformation method need to be selected* *Suggested by Tabachnick and Fidell (2007) and Howell (2007) C = A constant added to each score so that the smallest score is 1 K = A constant from which each score is subtracted so that the smallest score is 1; usually equal to the largest score + 1 Average Median Mode

25 4.3 Data Computation and Transformation
It is obvious that variable hours of exercise (hourex) is not normal. Variable transformation need to be performed Based on previous assessment, we know that variable hourex is not normally distributed but is significantly positively skewed and because of the is no zero value involved, Log 10 transformation method should be performed. Variable transformation The decision to transform variables depends on the severity of the departure from normality Most appropriate transformation method need to be selected* *Suggested by Tabachnick and Fidell (2007) and Howell (2007) C = A constant added to each score so that the smallest score is 1 K = A constant from which each score is subtracted so that the smallest score is 1; usually equal to the largest score + 1 Average Median Mode

26 4.3 Data Computation and Transformation
Results Variable transformation The decision to transform variables depends on the severity of the departure from normality Most appropriate transformation method need to be selected* *Suggested by Tabachnick and Fidell (2007) and Howell (2007) C = A constant added to each score so that the smallest score is 1 K = A constant from which each score is subtracted so that the smallest score is 1; usually equal to the largest score + 1 Average Median Mode It is apparent that Log 10 transformation was appropriate because the distribution of hourex is now relatively normal

27 4.3 Data Computation and Transformation
Data transformation Recoding Collapsing continuous variables into categorical variables Recoding negatively worded scale items Replacing missing values and bringing outlying cases into the distribution Computing To obtain composite scores for items on a scale Data selection In the Select Cases option in the Data menu, data can be selected: On specified cases using If option Randomly using the Sample option Based on time or case range using the Range option Variable transformation The decision to transform variables depends on the severity of the departure from normality Most appropriate transformation method need to be selected* *Suggested by Tabachnick and Fidell (2007) and Howell (2007) C = A constant added to each score so that the smallest score is 1 K = A constant from which each score is subtracted so that the smallest score is 1; usually equal to the largest score + 1 Average Median Mode


Download ppt "Introduction to Statistical Software Package"

Similar presentations


Ads by Google