Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003.

Slides:



Advertisements
Similar presentations
Aggregate Data and Statistics
Advertisements

10. NLTS2 Documentation Overview. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training Modules.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
1 QUANTITATIVE DESIGN AND ANALYSIS MARK 2048 Instructor: Armand Gervais
Chuck Humphrey Data Library University of Alberta.
Chapter 17 Overview of Multivariate Analysis Methods
Chapter 1 The Where, Why, and How of Data Collection
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta September 29, 2008.
McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
SOWK 6003 Social Work Research Week 10 Quantitative Data Analysis
Chapter 1: Data Collection
Chapter 1 The Where, Why, and How of Data Collection
Quantitative Evidence for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library March 6, 2009.
Statistics and Data for Marketing Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 27, 2008.
EAS 293 Data Library, Rutherford North 1 st Floor Chuck Humphrey Data Library October 14, 2008.
6-1 Chapter Six DESIGN STRATEGIES. 6-2 What is Research Design? A plan for selecting the sources and types of information used to answer research questions.
Data Management: Quantifying Data & Planning Your Analysis
Quantifying Data.
BASIC STATISTICS WE MOST OFTEN USE Student Affairs Assessment Council Portland State University June 2012.
Canadian Travel Survey, 1998 Throughout 1998, Statistics Canada interviewed approximately 180,000 Canadians across the country about their trips in Canada,
Targeting Research: Segmentation Birds of a feather flock together, i.e. people with similar characteristics tend to exhibit similar behaviors Characteristics.
Geo-referenced data and DLI aggregate data sources Chuck Humphrey University of Alberta ACCOLEDS 2007.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 14.
Marketing Research Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides.
Chapter 3 Goals After completing this chapter, you should be able to: Describe key data collection methods Know key definitions:  Population vs. Sample.
Chapter Eight The Concept of Measurement and Attitude Scales
© The McGraw-Hill Companies, Inc., by Marc M. Triola & Mario F. Triola SLIDES PREPARED BY LLOYD R. JAISINGH MOREHEAD STATE UNIVERSITY MOREHEAD.
Anne Goodchild | Andrea Gagliano | Maura Rowell October 10, 2013 Examining Carrier Transportation Characteristics along the Supply Chain.
Introduction to Probability and Statistics Consultation time: Ms. Chong.
Key Data Management Tasks in Stata
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
1 The 2001 Census PUMFS Odyssey Sponsored by HAL and PALS Presented by Chuck Humphrey.
DLI Workshop -- Mar Hosted by Dalhousie University March 2000 DLI Training Workshop.
Data Gathering Techniques. Essential Question: What are the different methods for gathering data about a population?
Data Liberation Training 2001 Complex Files: Pasting and Cutting with SPSS Université de Montréal Wendy Watkins April 24, 2001.
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
Areej Jouhar & Hafsa El-Zain Biostatistics BIOS 101 Foundation year.
Backcasting United Nations Statistics Division. Overview  Any change in classifications creates a break in time series, since they are suddenly based.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
SURVEY RESEARCH.  Purposes and general principles Survey research as a general approach for collecting descriptive data Surveys as data collection methods.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Week #12 Assignment For your Week #12 assignment, you will write your Methods and Results Chapters for your descriptive statistics.
An Overview of Statistics Section 1.1. Ch1 Larson/Farber 2 Statistics is the science of collecting, organizing, analyzing, and interpreting data in order.
Units of Analysis The Basics. Outline An illustration Definitions Elements of the unit of analysis Complexity Data structure.
Chapter Twelve Copyright © 2006 John Wiley & Sons, Inc. Data Processing, Fundamental Data Analysis, and Statistical Testing of Differences.
DESIGNING, CONDUCTING, ANALYZING & INTERPRETING DESCRIPTIVE RESEARCH CHAPTERS 7 & 11 Kristina Feldner.
Questionnaire Outputs. The study This research has been conducted with a sample of a 400 respondents from Germany, Italy, Lithuania and Malta. For the.
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
1 UNIT 13: DATA ANALYSIS. 2 A. Editing, Coding and Computer Entry Editing in field i.e after completion of each interview/questionnaire. Editing again.
Data in context Chapter 1 of Data Basics. Frameworks Today, we will be presenting two frameworks for thinking about the content of data services. A.Statistics.
Hosted by the University of Regina Library December 1999 DLI Training Workshop Chuck Humphrey.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
Research Design in Education Research Methods. Describe your research topic What is the nature of the problem and your research question? To answer the.
Exploratory data analysis, descriptive measures and sampling or, “How to explore numbers in tables and charts”
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Chapter Fourteen Data Preparation 14-1 Copyright © 2010 Pearson Education, Inc.
Geo-referenced data and DLI aggregate data sources
Statistics Statistics is that field of science concerned with the collection, organization, presentation, and summarization of data, and the drawing of.
Introduction to SPSS SOCI 301 Lab session.
What Is Statistics? Chapter 1.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Methods Chapter Format Sources of Data Measurements
Units of Analysis The Basics.
The Nature of Probability and Statistics
RESEARCH METHODOLOGY ON ENVIRONMENTAL HEALTH PRACTICE IN WEST AFRICA
Presentation transcript:

Stretching Your Data Management Skills Chuck Humphrey University of Alberta Atlantic DLI Workshop 2003

Outline Two Topics  Aggregation: a review of the CTS  Finding the ‘smoking gun’: a review of variables and CTUMS

Aggregation In the 2002 Atlantic DLI workshop, we spent some time examining the importance of the unit of analysis in defining statistical data structure.

Unit of Analysis The unit of analysis is the object(s) about which data have been collected and about which generalizations are being sought.

Each member of the unit of analysis is a separate row in the data structure. Statistical Data Structure

Case 1 Case 2 Case 3 * Case n * * Statistical Data Structure

All of the information collected for each member of the unit of analysis is organized in a fixed location in the file called variables. Statistical Data Structure

Case 1 Case 2 Case 3 * Case n * * Variable 1 * Variable 2Variable 3 * Variable k-1 Variable k Statistical Data Structure

Case 1 Case 2 Case 3 * Case n * * Variable 1 * Variable 2Variable 3 * Variable k-1 Variable k Statistical Data Structure

Canadian Travel Survey Our exercise last year used two of four files from the Canadian Travel Survey: the person file and trip file.

Canadian Travel Survey Our assignment last year was to link information from the trip file about the respondents’ modes of transportation with information about the traveller in the person file.

Trip Microdata File Person Non-travellers Travellers Trip Key: linkable via UNIQID

Canadian Travel Survey The data management problem was finding a way to share the information from one person who took many trips with a single record in the person file for this individual.

Canadian Travel Survey For the person who only took one trip, the match between the trip and person file is one to one. Person File Uniqid = 1 Tottrip = 1 Trip File Uniqid = 1 Tripnum = 1

Canadian Travel Survey For the person who took two or more trips, the match was a many to one between the trip and person file. Person File Uniqid = 19 Tottrip = 2 Trip File Uniqid = 19 Tripnum = 1 Uniqid = 19 Tripnum = 2

Canadian Travel Survey Our strategy was to summarize the mode information in the trip file for each traveller to create a one-to-one match between the summarized trip file and the person file.

Canadian Travel Survey That is, we needed to summarize the two trips for respondent 19 into one record for this person while capturing the mode of travel information. Person File Uniqid = 19 Tottrip = 2 Trip File Uniqid = 19 Tripnum = 1 Uniqid = 19 Tripnum = 2

Canadian Travel Survey This summary strategy relied on aggregating over trips to produce one record per respondent in the trip file. More about this in a minute.

Canadian Travel Survey The first step was to read the raw data from the person file into SPSS, to sort the cases by UNIQID, and to write this to a.sav file. Person File UNIQID

Canadian Travel Survey The first step was to read the raw data from the person file into SPSS, to sort the cases by UNIQID, and to write this to a.sav file. Person File UNIQID

Canadian Travel Survey The next step was to read the raw data from the trip file, to sort the cases by UNIQID & TRIPNUM, and to save this file. Trip File UNIQID TRIPNUM

Canadian Travel Survey The next step was to read the raw data from the trip file, to sort the cases by UNIQID & TRIPNUM, and to save this file. Trip File UNIQID TRIPNUM

Canadian Travel Survey How do we summarize the modes of transportation in the trip file?

Canadian Travel Survey

The strategy was to convert the six categories of mode into six variables where each category of travel mode was represented by one of these new variables.

Canadian Travel Survey Mode 1. Car 2. Air 3. Bus 4. Rail 5. Boat 6. Other Car Air Bus Rail Boat Other

Canadian Travel Survey For each trip, the value of the mode variable was used to assign a value of one to the variable representing this mode of travel.

Canadian Travel Survey UniqidTripnumModeCarAirBusRailBoatOther

Canadian Travel Survey After creating these six new variables for each trip in the trip file, the next step was to add within each unique id the number of trips taken using each of the six modes of transportation.

Canadian Travel Survey UniqidTripnumModeCarAirBusRailBoatOther UniqidCarAirBusRailBoatOther

Canadian Travel Survey The output from the aggregate command resulted in a new file with one record for each UNIQID. This data structure then matched the one record for each UNIQID in the person file.

Canadian Travel Survey This new aggregate trip file was then merged with the person file to pass the mode of transportation information to the person file.

Canadian Travel Survey Now case 1157 has only one record to match in the new trip file with the person file. Person File Uniqid = 1157 Tottrip = 3 New Trip File Uniqid = 1157 Car = 2 Air = 1 Bus = 0 Rail = 0 Boat = 0 Other = 0

Aggregate The Aggregate procedure sorts all of the cases by a grouping variable (called the break variable) and then creates a new data file containing a case for each unique value in this grouping variable.

Aggregate The variables in this new file are created by assigning summary functions to the variables in the original file.

Aggregate

Why all this emphasis on Aggregate? We will be using the aggregate command in SPSS with the Canadian Community Health Survey tomorrow to summarize information at the person level to the level of health regions in Atlantic Canada.

Outline Two Topics  Aggregation: a review of the CTS  Finding the ‘smoking gun’: a review of variables and CTUMS

The Smoking Gun For the remainder of this session, we will explore a range of topics related to variables using content from the Canadian Tobacco Use Monitoring Survey (CTUMS).

Variables One might say that variables represent the ‘smoking gun’ of research data. Somewhere in a variable is the answer to a who-done-it mystery of a research project.

Variables Variables are the content vessels in research. They carry the information associated with the unit of analysis discussed earlier. As carriers of content, variables act as organizational instruments in research.

Instruments of Organization Variables help organize the content of research in two contexts.  Data  Analysis

Data and Analysis The use of variables differs somewhat in each of these contexts. As a result, variables serve different purposes and can be grouped into different classes.

Data and Analysis Research Data Analysis

Data and Analysis The vocabularies of data and analysis use different labels for the various functions that variables perform. Let’s look at each category separately to understand these differences better.

Variables and Data In the building of data files, variables can be classified into three general categories.  Administrative  Observed  Derived

Administrative Variables Administrative variables are those that data producers include to describe characteristics of:  the administration of the survey,  the survey design, and  the record management used with the original questionnaires

Administrative Variables The types of variables that are created as a record of administering the survey include the date and time when the interview was conducted, the identification of the interviewer, the number of call-backs before the interview was completed, etc.

Administrative Variables The types of variables that are created to reflect the survey design will include information about the strata in a stratified sample design, geographic identification in a cluster design, and weight variables for estimating populations.

Administrative Variables The types of variables that are created as part of the record management system include unique identification numbers for each respondent, project numbers for cycles, membership in panels, linkage identification with other files, etc.

Observed Variables Observed variables are those that are created from the answers given by respondents to the items in a survey’s questionnaire.

Derived Variables Derived variables are those that are created by the data producer from variables that were observed or from contextual information that was added.

Variables and Analysis For analysis purposes, variables tend to be grouped according to analytic technique. There are two general categories of analytic techniques.  Categorical  Analytic

Categorical Variables Categorical statistical techniques use variables employing a nominal level of measurement, that is, numbers are assigned to represent categories. These techniques focus on tables and methods that model frequencies (e.g., log-linear analysis).

Analytic Variables Analytic statistical techniques use variables employing an ordinal, interval or ratio level of measurement. These techniques focus on the means and standard deviations of variables or correlations and covariances among groups of variables.

Modeling Language Categorical and analytic variables can both be used with statistical modeling techniques. Modeling introduces new names for variables.

Modeling Language Dependent variables: these are variables that are seen to be caused or predicted by other variables in a model. They are said to depend on the values of other variables.

Modeling Language Independent variables: these are variables that are seen to be the causal agents in a model. They are the variables that determine the response in the dependent variable.

Modeling Language Dummy variables: these are variables that are used in analytic statistical techniques to represent categorical information. Each dummy variable represents one of the values from a categorical variable. The coding of modes of transportation employed a dummy variable coding scheme.

Modeling Language Latent variables: These are variables in a causal model that are not directly observed or measured. Instead, variables serving as indicators of the latent concept are included in the model.

Modeling Language Manifest variables: These are variables in a causal model that have been directly measured.

Combining Data & Analysis Looking at the two classifications of variables between data and analysis, there are some combinations that are natural. Observed and derived variables are often categorical and analytic.

Combining Data & Analysis Administrative variables are not often used in analytic techniques but can be used to identify groups of cases to study subpopulations or to group cases for comparative techniques.