BLS Metadata Repository – Issues and Progress Daniel Gillman US Bureau of Labor Statistics
Outline BLS Programs Time Series Data Dissemination Metadata Model BLS Repository Wolfram Data Summit Wolfram Data Summit 9/10/2010
BLS Programs 8 Major Program Areas Inflation & Prices Employment Unemployment Pay & Benefits Spending & Time Use Productivity Workplace Injuries International Wolfram Data Summit Wolfram Data Summit 9/10/2010
Time Series Measure or index over time 30 series types Tables Index: number relative to fixed point 30 series types Subset by Industry Occupation Geography (state, county, MSA, etc) Tables Generated from time series data Wolfram Data Summit 9/10/2010
Data Dissemination All time series Tables Web site: http://www.bls.gov 8 major numbers Unemployment rate (m) Consumer price index (m) Producer price index (m) Employment cost index (q) Average hourly earnings (m) Payroll employment (m) Productivity (q) Import price index (m) All time series Tables Wolfram Data Summit 9/10/2010
Data dissemination Wolfram Data Summit 9/10/2010
Data Dissemination Organized by programs Time series in ASCII files by FTP Some tables Crude database search Little metadata Web site itself Hidden in FTP directories Handbook of Methods Seasonal adjustment Wolfram Data Summit 9/10/2010
Data Dissemination Requires knowing Relies on “Series ID” Organization of BLS Specific surveys or programs Specific series Terms & technical meaning E.g., earnings Relies on “Series ID” Brittle scheme for identifying series Known by power users Wolfram Data Summit 9/10/2010
Metadata Supports Does not support Dissemination Support Data.Gov Time series and tables Does not support Internal processing Describing survey life-cycle Microdata (respondent level) Wolfram Data Summit 9/10/2010
Metadata Hard to collect Need “simple” model Maybe not so easy Basic metadata already on FTP sites Support finding data by Traditional means Series ID, BLS structure New means Subject matter Wolfram Data Summit 9/10/2010
Metadata Previous BLS focus group study Metadata must support this Users find data by Time Place Subject (title or keywords) Structure of agencies not known Technical terms not known Metadata must support this Wolfram Data Summit 9/10/2010
Model Model – Time Series Data Element Classification Concept Naming Convention Wolfram Data Summit 9/10/2010
Model Wolfram Data Summit 9/10/2010
BLS Repository Under development Requirement – fast response Testing – Flat single table design Using Apache Lucene Solr Open source enterprise search Various interface approaches Visual Basic Java Wolfram Data Summit 9/10/2010
BLS Repository Need term map Link terms to relevant data Common terms to technical terms Definitions for technical terms Concept based management Link terms to relevant data Manage multi-faceted search Development schedule Still research project Wolfram Data Summit 9/10/2010
Daniel Gillman gillman.daniel@bls.gov