Organizing & Reporting Data: An Intro Statistical analysis works with data sets A collection of data values on some variables recorded on a number cases (records) For example, the student data from last week:
Organizing & Reporting Data (cont.): Structure of most data sets = “rectangular Columns = Variables Rows = Cases Cells = individual values
Managing Data: Basic Tasks NOTE: Reliance on Codebook for Data Set – Specify information about variables in the data set – Indicate Variable Names & Labels – Indicate Variable Values (codes) & Value Labels – Indicates “missing values” Can Modify Overall Arrangement of Data Set –Sorting Change the order of the cases in the file –Selecting identify a subset of cases to work on –Transforming modify the values of a variable
Organizing & Reporting Data (cont.): Where do the data values come from? a)Raw Data: recorded from responses, record, or observations – In their (more-or-less) original form – Some coding (or editing) operations usually involved – Usually coded into numerical values (for ease of use) b)Transformed Data: modified from original values – Computed values (e.g., rates, %, sums, “imputations”) – Recoded values (into more correct or meaningful or useful values) c)Created Data: values are “made up” – Simulated values – Demonstration values
Managing Data: Basic Tasks Transforming Data: Variable Transformations a) Computing new variables from prior ones Index = Q1 + Q2 + Q3 + Q4 Utility = probability * outcome b) Recode Variable by changing its values Change missing values (“blanks”) to “0” c) Recode Variable into a New Variable Age (yrs) Child (1-11); Juvenile (12-17); Adult (18- over) Age (yrs) yrs; yrs; yrs; yrs; yrs; yrs; yrs; yrs; yrs.
Computed Data: Some Useful forms Rates – numbers divided by populations Ratios – one number divided by another Indexes – new variable = a sum (or other combination) of multiple prior variables Rescaled Data – a raw score modified by some mathematical function (e.g., logarithm) Standardized scores – Rescaled to standard units e.g., Z-scores
Recoded Data: Some Useful forms Collapsed (& abbreviated) scores Grouped scores – recoding a numeric variable into a discrete (numeric or ordinal) variable –Uniform (or fixed-width) groupings widths of groups are all the same [Note the standard rules for forming grouped variables] –Non-uniform (variable or flexible) groupings widths of groups are not all the same –Normed groupings grouped by proportions of cases e.g., percentiles, quartiles, median-splits [a special form of non-uniform grouping ]
How to recode variables in SPSS? Use the Transform option on the top menu bar to change the data ( see Appendix B in Kirkpatrick/Feeney for details) Compute allows for computing a new variable from prior variables Recode allows for modifying how a variable is coded a)‘Into same variables’ (change original variable) b)‘Into different variables’ (create new variable with different codes & leave original variable as is)
Representing Data Distributions: In statistics, we are working with a collection of many data points Our focus is on the distribution of the whole set of points Three forms of presentation for summarizing distributions of data points: 1.Tabular tables and lists of numbers 2.Graphical pictures, shapes, and lines (in charts, graphs, and diagrams) 3.Verbal words and phrases
Tabular Presentations: Basic Formats 1)Data Listing: s imple inventory of points in the data set 2)Ordered Data Listing: I nventory of data sorted into groups or arranged in increasing or decreasing order 3)Frequency Table: summary showing each value and the number of cases having that value (most relevant for discrete variables) 4)Percentage Table : table with percentages of total cases given rather than (or in addition to) numerical counts 5)Cumulative Percentage Table: reporting percentages of total cases which have that specific value or lower. 6)Cross-Tab Table: a “bivariate” frequency distribution of the values of one variable across the values of another variable
Cross-Tabulations (cont.) What are the parts of a cross-tab? a)Cells b)Rows and columns c)Marginals d)Grand total How to set up a cross-tab? a)Which variables are in the rows and columns? b)Use Percentages or Frequencies? c)How to percentage a cross-tab?
Representing Distributions Graphically: Basic Formats Pie Charts Bar Charts –Vertical or Horizontal –Simple or Grouped –Stacked Histograms Line Charts –Frequency polygons –Time (Trend) plots –Relationship plots
Representing Distributions Graphically: Basic Formats Other Charts ( to be dealt with later): –Box Plots (aka “Box-and-Whiskers”) –Scatter Plots