Introduction to SPSS Asst. Prof. Dr. Emrah Oney
Topics we will cover today SPSS at a glance Basic Structure of SPSS Cleaning your data Descriptive Statistics Charts Other Resources
SPSS at a glance SPSS stands for Statistical Package for the Social Sciences SPSS was made to be easier to use then other statistical software like S-Plus, R, or SAS. The newest version of SPSS is SPSS 17.0. Today we will be working on SPSS 16.0.
How to open SPSS Go to START Click on PROGRAMS Click on SPSS INC The computers in the lab typically have SPSS on the desktop. It is a red box that says SPSS on the top.
Opening a data file Click on FILE OPEN DATA Click MY COMPUTER LOCAL DISK C:/ Click PROGRAM FILES SPSS Click TUTORIAL SAMPLE FILES Select CATALOG.SAV
Basic structure of SPSS There are two different windows in SPSS 1st – Data Editor Window - shows data in two forms Data view Variable view 2nd – Output viewer Window – shows results of data analysis *You must save the data editor window and output viewer window separately. Make sure to save both if you want to save your changes in data or analysis.*
Data view vs. Variable view Rows are cases Columns are variables Variable view Rows define the variables Name, Type, Width, Decimals, Label, Missing, etc. Scale – age, weight, income Nominal – categories that cannot be ranked (ID number) Ordinal – categories that can be ranked (level of satisfaction)
Entering Data Can either enter data by hand or import from other programs (i.e. Excel) Hand entering Insert a variable by: Right clicking one of the rows in variable view and selecting “Insert Variable” Entering a “Name” in variable view and pressing “Enter” or “Tab” Right clicking on a column in the data view and selecting “Insert Variable” Clicking on the “Insert Variable” icon in the Toolbar Clicking on “Data” “Insert Variable”
Entering Data Define variables in variable view Name = Name of variable displayed in data view Type = Numeric, Comma, Dot, Scientific notation, Date, Dollar, Custom currency, & String Width = # of digits displayed in data view Decimals = # of decimal places displayed in data view Label = Name of variable displayed when running analyses Values = Value Labels – i.e. 1 = Male, 0 = Female Missing = Values that the system will recognize as missing Columns = # of columns used to display variable in data view Align = variable left, right, or center aligned Measure = scale on which variable is measured – Nominal, Ordinal, or Scale (Interval or Ratio)
Entering Data Importing Data Click “File” “Open” “Data” Select the file type in question If Excel: Make sure top row of excel file lists variable names & the variables all have different names After selecting the file, click Enter – make sure the box “Read variable names from the first row of data” is clicked Make sure you variable are defined properly in the variable view
Menus File & Edit Menus View Menu Exactly the same as all Windows programs View Menu Allows you to customize the SPSS desktop Status Bar – “Processor Area” at the very bottom of the screen Toolbars Fonts Grid Lines Value Labels – Make sure this is selected if you want to use them Variables/Data view
Menus Data Menu Define Dates… = Inserts a Date variable Insert Variable Insert Case Go to Case… Sort Cases… = Ascending or descending order Transpose… = Switches cases and variables (former in columns and latter in rows) Merge Files – More on this later Split Files – More on this later Select Cases = If condition is satisfied, Random sample of cases, Based on time or case range, Use filter variable
Splitting and Merging Files Click on “Organize output by groups” – grouping variable should be discrete (i.e. gender, hair color, etc.) Click on grouping variable and move to “Groups Based on” box Click “OK” Merging You can add either variables or cases If adding variables: Make sure both files share at least one variable that is identical, the key variable (i.e. SubID) Make sure both files are sorted by this variable Make sure, in both files, all cases have data for this variable and there are no duplicate cases Click on “Merge Files” “Add Variables” Find the file you wish to merge with the one you have open The variable in the “Excluded Variables” box should be the key variable, denoted by a (+) indicating its presence in both files Click on “Match cases on key variables in sorted files” Move the key variable to the “Key Variables” box
Menus Transform Menu Compute... Recode – Into Same/Different Variable Name new variable in “Target Variable” box Type equation in “Numeric Expression” box Recode – Into Same/Different Variable Select variable(s) to recode and move to the “Variables” box Click “Old and New Values” Click “OK”
Obtaining Descriptive Statistics Click on “Analyze” “Descriptive Statistics” Frequencies Use to determine counts on values of variables Cut scores and %iles
SPSS Output for Frequency Distribution
Relative Frequency Distribution Relative Frequency Distribution of IQ for Two Classes IQ Frequency Percent Valid Percent Cumulative Percent 82.00 1 4.2 4.2 4.2 87.00 1 4.2 4.2 8.3 89.00 1 4.2 4.2 12.5 93.00 2 8.3 8.3 20.8 96.00 1 4.2 4.2 25.0 97.00 1 4.2 4.2 29.2 98.00 1 4.2 4.2 33.3 102.00 1 4.2 4.2 37.5 103.00 1 4.2 4.2 41.7 105.00 1 4.2 4.2 45.8 106.00 1 4.2 4.2 50.0 107.00 1 4.2 4.2 54.2 109.00 1 4.2 4.2 58.3 111.00 1 4.2 4.2 62.5 115.00 1 4.2 4.2 66.7 119.00 1 4.2 4.2 70.8 120.00 1 4.2 4.2 75.0 127.00 1 4.2 4.2 79.2 128.00 1 4.2 4.2 83.3 131.00 2 8.3 8.3 91.7 140.00 1 4.2 4.2 95.8 162.00 1 4.2 4.2 100.0 Total 24 100.0 100.0
Grouped Relative Frequency Distribution Relative Frequency Distribution of IQ for Two Classes IQ Frequency Percent Cumulative Percent 80 – 89 3 12.5 12.5 90 – 99 5 20.8 33.3 100 – 109 6 25.0 58.3 110 – 119 3 12.5 70.8 120 – 129 3 12.5 83.3 130 – 139 2 8.3 91.6 140 – 149 1 4.2 95.8 150 and over 1 4.2 100.0 Total 24 100.0 100.0
Descriptives Click on “Analyze” “Descriptive Statistics” Use to get descriptive statistics (central tendency, variability, etc.) Use to convert variables to z-scores
Explore Click on “Analyze” “Descriptive Statistics” Explore Use to examine descriptive statistics by grouping variable
Explore
Explore
Explore
Cleaning your data – missing data There are two types of missing values in SPSS: system-missing and user-defined. System-missing data is assigned by SPSS when a function cannot be performed. For example, dividing a number by zero. SPSS indicates that a value is system-missing by one period in the data cell.
Cleaning your data – missing data User-defined missing data are values that the researcher can tell SPSS to recognize as missing. For example, 9999 is a common user-defined missing value. To define a variable’s user-defined missing value… Look at your variables in VARIABLE VIEW Find the column labeled MISSING Find the variable that you would like to work with. Select that variable’s missing cell by clicking on the gray box in the right corner. A range can also be used if you only want to use half of a scale.
Cleaning your data – missing data cont. When you have missing data in your data set, you can fill in the missing data with surrounding information so it does not affect your analysis. click TRANSFORM click REPLACE MISSING VALUES select the variable with missing values and move it to the right using the arrow SPSS will rename and create a new variable with your filled in data. click METHOD to select what type of method you would like SPSS to use when replacing missing values. click OK and view your new data in data view
Graphing Data Click GRAPH Click CHART BUILDER Click HISTOGRAM Put MEN on the X axis. Click ELEMENT PROPERTIES. Check the box labeled DISPLAY NORMAL CURVE. This will impose a normal curve onto your graph. You can also change the style of your graph in this element properties window. You can copy and paste these graphs into word and excel files.
Graphing Continued There are other ways to make graphs. Click ANALYZE Click DESCRIPTIVE STATISTICS Click FREQUENCIES Click services Click CHART Click BAR CHART Click PERCENTAGES
Data manipulation – select cases By selecting cases, the researcher can select only certain cases for analysis click DATA click SELECT CASES click RANDOM SAMPLE OF CASES select your preferences
Data manipulation – compute new variable Computing new variables – create a new variable from multiple variables click TRANSFORM click COMPUTE fill in the new target variable TOTALSALES fill in numeric expression = men+women+jewel create an IF statement by clicking on the IF button click INCLUDE IF CASE SATISFIES CONDITION enter condition MAIL>10000 This new variable TOTALSALES tells us what the total sales are for catalogs which mailed over 10,000 catalogs.
Mean Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 102 115 128 109 131 89 98 106 140 119 93 97 110 Class B--IQs of 13 Students 127 162 131 103 96 111 80 109 93 87 120 105 109 Σ Yi = 1437 Σ Yi = 1433 Y-barA = Σ Yi = 1437 = 110.54 Y-barB = Σ Yi = 1433 = 110.23 n 13 n 13
Mean The mean is the “balance point.” Each person’s score is like 1 pound placed at the score’s position on a see-saw. Below, on a 200 cm see-saw, the mean equals 110, the place on the see-saw where a fulcrum finds balance: 1 lb at 93 cm 1 lb at 106 cm 1 lb at 131 cm 110 cm 17 units below 21 units above 4 units below 0 units The scale is balanced because… 17 + 4 on the left = 21 on the right
Mean Bill Gates All of Us Outlier Mean Means can be badly affected by outliers (data points with extreme values unlike the rest) Outliers can make the mean a bad measure of central tendency or common experience Income in the U.S. Bill Gates All of Us Outlier Mean
Median The middle value when a variable’s values are ranked in order; the point that divides a distribution into two equal halves. When data are listed in order, the median is the point at which 50% of the cases are above and 50% below it. The 50th percentile.
Median Median = 109 (six cases above, six below) Class A--IQs of 13 Students 89 93 97 98 102 106 109 110 115 119 128 131 140 Median = 109 (six cases above, six below)
Median If the first student were to drop out of Class A, there would be a new median: 89 93 97 98 102 106 109 110 115 119 128 131 140 Median = 109.5 109 + 110 = 219/2 = 109.5 (six cases above, six below)
Median All of Us Bill Gates outlier The median is unaffected by outliers, making it a better measure of central tendency, better describing the “typical person” than the mean when data are skewed. All of Us Bill Gates outlier
Median Mean Mean Median Median If the recorded values for a variable form a symmetric distribution, the median and mean are identical. In skewed data, the mean lies further toward the skew than the median. Symmetric Skewed Mean Mean Median Median
Median The middle score or measurement in a set of ranked scores or measurements; the point that divides a distribution into two equal halves. Data are listed in order—the median is the point at which 50% of the cases are above and 50% below. The 50th percentile.
Mode A la mode!! The most common data point is called the mode. The combined IQ scores for Classes A & B: 80 87 89 93 93 96 97 98 102 103 105 106 109 109 109 110 111 115 119 120 127 128 131 131 140 162 BTW, It is possible to have more than one mode! A la mode!!
Mode It may mot be at the center of a distribution. Data distribution on the right is “bimodal” (even statistics can be open-minded)
Mode Mean Median Mode Mode Median Mean It may give you the most likely experience rather than the “typical” or “central” experience. In symmetric distributions, the mean, median, and mode are the same. In skewed data, the mean and median lie further toward the skew than the mode. Symmetric Skewed Mean Median Mode Mode Median Mean
Descriptive Statistics Summarizing Data: Central Tendency (or Groups’ “Middle Values”) Mean Median Mode Variation (or Summary of Differences Within Groups) Range Interquartile Range Variance Standard Deviation
Data manipulation – recode a variable Recoding allows a researcher to create a new variable with a different set of parameters click TRANSFORM click RECODE INTO DIFFERENT VARIABLE move mail over to the right create a name for the new variable mailcategories click OLD AND NEW VALUES
Other Resources There are many resources online to help you learn SPSS (tutorials, blogs, etc.) CSSCR has a Quicktime SPSS class on its website CSSCR offers SPSS handouts which are also on its website CSSCR offers classes on SPSS each quarter – come back for the SPSS Beyond the Basics class!