Data Reference (the very, very basics)
Data-reference: what do we need? Tools Strategies Terminology Understanding of what we are looking for: not books or articles -- or facts.
Data-reference: what do we need? Understanding of what we are looking for: not books or articles -- or facts. Terminology Strategies Tools
La trahison des images, The treachery of images, Rene Magritte
Ceci n’est pas les “data.” C’est les statistiques!
Raw (for analysis)Cooked (facts) Intended for use by computer For human use: Eye-readable, charts, tables, graphs Collected based on social science methodologies or administrative procedures Produced from data Computer- readable Can be print, micro, computer readable DataStatistics
Data
Statistics
Where do statistical babies come from? + =
Data or Statistics: Why does it matter? Different search strategies and tools. Defines your goal. Helps you know when you've found it!
Tip: Data or Statistics? Determine if the user wants (needs) statistics or data. – Do you want want one number? – Are you looking for a fact or figure? – Do you want to know “how many?”
Tip: Data or Statistics? Determine if the user wants (needs) statistics or data. – Or… do you want a series of numbers? – Do you want to identify trends, make comparisons, model relationships? – Will you be using statistical software (not Excel)?
/
ftp://ftp.bls.gov/pub/special.requests/lf/aat44.txt
From survey to data to statistics… Survey instrument Q1. [enter zip code ] Q2. [enter R’s first name ] Q3. [enter sex of R ] Q4. What was your major in College? Q5. What was your income last year? Q6. Did you go to church last week?
Answers to Questions Zip Name Sex Major income church Wilma F lit 0 y Barney M engin 10 n Betty F. 0 n Ethel F theater 1000 y Fred M. M PE y Lucy F lit 700 y Ricky M music y Fred A. M dance n Ginger F math 9500 y
Must anonymize the data! Zip Name Sex Major income church Wilma F lit 0 y Barney M engin 10 n Betty F. 0 n Ethel F theater 1000 y Fred M. M PE y Lucy F lit 700 y Ricky M music y Fred A. M dance n Ginger F math 9500 y
Zip Name Sex Major income church F lit 0 y M engin 10 n F. 0 n F theater 1000 y M PE y F lit 700 y M music y M dance n F math 9500 y Must anonymize the data!
Change Text to Numeric Codes Zip Name Sex Major income church F lit 0 y M engin 10 n F. 0 n F theater 1000 y M PE y F lit 700 y M music y M dance n F math 9500 y
Zip Name Sex Major income church lit 0 y engin 10 n n theater 1000 y PE y lit 700 y music y dance n math 9500 y Change Text to Numeric Codes
Zip Name Sex Major income church lit 0 y engin 10 n n theater 1000 y PE y lit 700 y music y dance n math 9500 y The “codebook” must document the numeric codes used! For example: Variable: “sex” 1 = female 2 = male Change Text to Numeric Codes
Zip Name Sex Major income church y n n y y y y n y Change Text to Numeric Codes
Zip Name Sex Major income church Change Text to Numeric Codes
Zip Name Sex Major income church lit 0 y engin 10 n n theater 1000 y PE y lit 700 y music y dance n math 9500 y Change Text to Numeric Codes
Zip Name Sex Major income church y engin 10 n n theater 1000 y PE y y music y dance n math 9500 y Change Text to Numeric Codes
Zip Name Sex Major income church y n n y y y y n y Change Text to Numeric Codes
Zip Name Sex Major income church Sometimes, even numeric variables are encoded in ranges. For example: Variable: “income” 1 = less than = = = more than = not reported Change Text to Numeric Codes
Zip Name Sex Major income church Sometimes, even numeric variables are encoded in ranges. For example: Variable: “income” 1 = less than = = = more than = not reported Change Text to Numeric Codes
Data Files do not need “headers” Zip Name Sex Major income church
Data Files do not need “headers”
Data Files do not need extra space
Data Files do not need extra space
Data Files do not need extra space
Data Files do not need extra space
Data Files do not need extra space
Codebook must document locations For example: Variable: “sex” location: column 9 width: 1
For example: Variable: “sex” location: column 9 width: Codebook must document locations
Codebook documents question, location, codes For example: Q3. [enter sex of R ] Variable: “sex” location: column 9 width: 1 Variable: “sex” 1 = female 2 = male
To Use Data You Need 3 Things Data: the datafile (the raw numbers) Metadata: the “codebook” (where the numbers are and what they mean) Statistical Software (for reading the datafile and analyzing the data)
Statistical software Codebook Data Q3. [enter sex of R ] Variable: “sex” location: column 9 width: 1 Variable: “sex” 1 = female 2 = male
SPSS commands SPSS reads the program Student writes SPSS program to analyze data… SPSS reads the data. And produces charts, tables, analysis, etc.
Female 49 years old
Codebook entry for variable PRES92 Question text Responses
Codebook entry for variable DEGREE Question text Responses
Voted for Clinton Junior college Female 49 years old
Degree Pres92
Tip: "variables" contain the essential, important content of data files
Tip: Data-reference is not about searching for an answer… Data reference is often less about searching to find an answer. (That's a statistical reference question.) Data reference is often more about exploring to find data that will enable users to ask a question.
What have we learned? Data and statistics are not the same Data reference leads to primary research material, not facts or statistics. To use data, a user must have data, metadata, and statistical software. A-and…
What have we learned? "Variables" are what contain critical, important content of data files. And that means that the gold-standard of data- reference is variable-level searching.
Question Text (Variable 34) Study of July 2003