Data Discovery The reference interview
Always begin by clarifying the distinction between statistics and data with your patron. Never assume that the patron clearly knows this distinction. Ask a question that will help you understand what they might be seeking using our frameworks from yesterday. Asking them if they want statistics or data isn’t a good starting question, though.
Frameworks Table Dimensions: Geography Time Subject content
The reference interview What the patron intends or needs to do with the numbers? What is their objective? –Does the patron need them for a report or for data analysis? What geographic area is needed? –Smallest geographic area to be described What time period is needed? What subject matter (variables) expressed in numbers is needed?
The reference interview If you determine the patron does need data: Population (unit of observation) to be described Do they need aggregate data, microdata, spatial data? What software does the patron intend to use? How would the patron like the data delivered?
level of service How much you do depends on the level of service you are offering. –Finding a resource –Retrieving a resource from an online service –Tailoring a product for the patron –Creating a product for a patron (e.g., postal code conversion linkage)
Does the person want one number? Are they pursuing a fact or figure? Want to know “how many?” Statistics in print or ready-ref. electronic source? YES Go to print or ready ref. electronic source.
Does the person want one number? Are they pursuing a fact or figure? Want to know “how many?” Statistics in print or ready-ref. electronic source? YES Go to print or ready ref. electronic source. NO Are the data accessible in computer-readable form? YES Go to computer-readable source. Extract relevant data from computer-readable source and compile statistics using appropriate software.
To Use Data You Need 3 Things Datafile (the raw numbers) “Codebook” (where the numbers are and what they mean) Statistical Software (for reading the datafile and analyzing the data)
Field California Poll (newsletter) September 24, 1996 as reproduced on microfiche in the collection, American Public Opinion Data. The Statistics
The Data
VARIABLE 15 RATE PERFORMANCE-BARBARA BOXER DECK 2/17 Q7. WHAT KIND OF JOB DO YOU THINK BARBARA BOXER IS DOING AS U.S. SENATOR - A VERY GOOD, GOOD, FAIR, POOR OR VERY POOR JOB? N OF CASES VALUE VALUE LABEL 33 1 VERY GOOD GOOD FAIR 63 4 POOR 43 5 VERY POOR NO OPINION NOT APPLICABLE (NOT FORM B) ____ 1023 TOTAL From the codebook for the data: The Field (California) Poll #96-04 THE FIELD INSTITUTE INTERVIEWING PERIODS: AUGUST 29 - SETEMBER 7, 1996 NUMBER OF CASES: 1023 The Codebook
Statistical Software Designed to read large files of raw numeric data Not a spreadsheet! –Can handle many more variables and cases. –Can do more elaborate and accurate statistics. –Designed to handle data (cases, observations, variables, weights), not unstructured “cells.”
GAUSS JMP MiniTab S-Plus SAS SPSS Stata Systat
SPSS Codebook Describe data layout Write commands to analyze data (data)
reference strategies Gov publications approach –What agency would produce such a statistic? Does the mandate or goals include the scope of content? Who are the members of the agency, if the agency is a membership organization? –What jurisdiction responsible for this content? –Is this likely an official or non-official statistic? –What publication titles are related to this content? –What is the availability of statistics from the agency Data librarian approach –What data source would be used to produce such a statistic? –Who would collect such data? –What unit of observation would be needed to produce such a statistics? –What would the structure of the table look like given time, geography and attributes of the unit of observation? –Would the source be in the realm of official or non-official statistics? –Use the literature trail and its indexes (non-official vs. official publications)
the data reference interview process The information-seeking context is as important to statistics and data as other reference interviews. How is the data reference interview similar to general reference interviews? How is the data reference interview different?
research on the data reference interview process A colleague is developing a model from which comparisons can be made between the general and data reference interviews. One aspect of the model, namely the discovery and clarification of concepts and language, is being investigated using items from a specialist discussion list and a blog.