Upgrading ABC News/Washington Post Data Collections Using DDI and Legacy Databases Marc Maynard The Roper Center for Public Opinion Research University of Connecticut IASSIST Conference 2005, Edinburgh, Scotland
Upgrading Data Collections Introduction Background Scope Challenges & Opportunities Prototype System Summary
The Roper Center Archives Public opinion data archive established in 1946 Commercial and academic surveys from 1936-present The Archives house ~8,000 US and ~7,000 non-US surveys including data files & documentation ABC News/Washington Post Survey Collection Over 850 surveys 1979-present
Background: Metadata Integration Catalog of Holdings –study level –15,000 records –only studies for which raw data is housed at the Center iPOLL Databank –variable level –nearly 500,000 records –includes studies for which data is not housed at the Center
Background: Metadata Integration External Review (2001) Overall Integrated Vision (IASSIST 2002) DDI – Archive Catalog Mapping (IASSIST 2003) –Study and File Level Integration (Sections 2 & 3) iPOLL Archive Catalog Links ( ) Enhance Question/Variable Metadata (IASSIST 2005)
Background: Prototype Project ABC and the Post want to easily access and analyze all their survey data SPSS system files for post-1997 surveys exist Pre-1998 studies are a hodge-podge of available ASCII data, documentation and survey reports ABC experimented with various alternative strategies Determined that the major cost factor would be variable and response labeling
Scope >600 ABC/WP surveys, More than 16,000 questions in the iPOLL system Fairly consistent documentation and data structure All ASCII data files Average about 35 variables per study Not including standard socio-demographic variables Employ a prioritized phased approach Focus on joint monthly surveys (216 studies)
Challenges 1.iPOLL includes only surveys of US adult population 2.iPOLL does not store standard socio- demographic variables 3.Published results are source for many items 4.iPOLL does not store enough metadata on the variable level
Opportunities Enhance metadata available in iPOLL Repurpose iPOLL’s store of question text and response categories Capitalize on: –the fact that response categories are stored as individual items –Linkages between question-level information and existing data files
Addressing the Challenges 1.iPOLL includes only surveys of US adult population State/Local surveys are lower priorities 2.iPOLL does not store standard socio- demographic variables Add standard demogs menu to system 3.Published results are source for many items Must allow for modifications to the variables 4.iPOLL does not store enough metadata on the variable level Extend iPOLL DataBank with DDI elements
Mapping Scheme - Sec. 4 Question/Variable DDI ElementDatabase Field 4.3NameVarName QstnLitQstn_txt 4.3.1RecSegNo 4.3.1StartPos - EndPosLocation 4.3.1WidthvarWidth varFormat Response Categories DDI ElementDatabase Field lablResp_Txt catValuResp_Code Missing
File Preparation iPOLL SPSS: Enhanced variable- level metadata iPOLL (q/v) Project Application ASCII Data File SPSS Portable File SPSS Syntax File Archive Catalog Standard Demogs
Application Requirements Edit and add missing metadata to each variable –Variable names, location, type Review and edit response category coding Select and add standard socio-demographic variables Specify any recodes within variables or to new variables Handle string, as well as numeric, value labeling and recoding Generate SPSS syntax file to include study metadata, creation date and data file path and structure
Prototype System
Summary Continuation of metadata enhancement and integration efforts begun in 2001 Will provide practical feedback and suggestions for extending the capabilities of iPOLL Promising beginning for expanding coverage to other data collections
iPOLL Databank can be found at: