Upcoming Improvements to the Longitudinal Business Database and the Business Dynamics Statistics Martha Stinson* T. Kirk White* James Lawrence** *Center for Economic Studies, U.S. Census Bureau **Economy-Wide Statistics Division, U.S. Census Bureau Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed.
Short History of the LBD 1980s: CES develops Longitudinal Research Database (LRD): Longitudinally linked Annual Surveys and Censuses of Manufactures (ASM/CMF) Late 1990s: CES develops Longitudinal Business Database (LBD) Longitudinally linked establishment-level employment and payroll for entire non-farm employer economy from Census Bureau’s Business Register Jarmin & Miranda (2002) describes earliest vintages
Shorter History of the BDS Lots of influential research has used the LBD Jarmin, Klimek, and Miranda (2005); Davis, Haltiwanger, Jarmin, and Miranda (2007) Late 2000s: CES creates and releases the Business Dynamics Statistics (BDS), tabulations from the LBD annual measures of business dynamics, 1977-”present” Recent releases include tabs by firm age, establishment age, size, state, MSA, 2-digit industry See: https://www.census.gov/ces/dataproducts/bds/
Related Census Bureau Data Product Early 1990s: at request of SBA, Census Bureau’s Economy-Wide Division develops Business Information Tracking System (BITS) Annual statistics on business dynamics using establishment-level data from the County Business Patterns (CBP) program, 1988-present Tabulated as Statistics of US Businesses (SUSB): https://www.census.gov/programs-surveys/susb.html Similar to BDS, but doesn’t publish firm age or firm-level dynamics Most-downloaded Econ statistics
Related BLS Data Product Since 2002, Bureau of Labor Statistics (BLS) has published Business Employment Dynamics (BED) Statistics generated from Quarterly Census of Employment and Wages (QCEW) Quarterly data series on gross job gains and gross job loss statistics, 1992 forward Different data source and firm definition than BDS, but similar statistics at national level, annual frequency See https://www.bls.gov/bdm/
Transitioning LBD/BDS to Production/ Integration with BITS Until now, the LBD/BDS have been research products Core BDS will become an official Census Bureau data product Core LBD/BDS production will transition to a production division, Economy-Wide Statistics Requires bringing code and specifications up to production standards Makes sense to integrate LBD/BDS with BITS Good opportunity to make systematic improvements
Upcoming Major Improvements to LBD/BDS Consistent longitudinal linking methodology over entire LBD/BDS time series, 1976-forward Use recently recovered CBP and BR microdata from 1976-1986 to fill in data gaps and improve quality Integration with BITS, keeping best features of both Streamline and document code—easier to maintain and improve in future Publish entire time series on NAICS basis Implement new disclosure avoidance method
1. Making Longitudinal Linking Methods Consistent over LBD Time Series Vast majority of longitudinal links use estab-level numeric identifiers However, numeric identifiers can change if: Single-unit (SU) firm becomes multi-unit (esp. before 2002) Firm “reorganizes”/changes its EIN LBD has always included fixes to these broken links by name-and-address matching (for SUs), but.. Improvements to name-and-address matching methods applied only to most recent years of time series
1. Making Longitudinal Linking Methods Consistent over LBD Time Series Integrate LBD method with BITS name/address matching method Include physical address data for entire time series Use clerical review, matched W2 records, and machine learning to determine which types of links/matches are best Rebuild entire LBD time series using best, consistent matching methods
2. Filling In Data Gaps and Improving Data Quality Current LBD: Several BR files for SUs are completely are partially missing in late 1970s/early 1980s Only recent years use microdata edits from CBP analysts New LBD: Fill in data gaps and improve quality with CES-recovered CBP microdata files for 1976-1984 and missing BR files for 1978-1985 Improve data quality with CBP microdata files from EWD for 1988-forward
3. Integrating LBD with BITS Keep best parts of both programs: BITS: Uses CBP-edited data as inputs for entire time series Code and methodology easier to maintain and consistent over time LBD: Longer time series (starts 1976 vs. 1989 for BITS) More extensive name/address matching Firm age and firm dynamics Birth/death retiming
4. Streamlining and Documenting Code LBD/BDS were developed in CES as research products SAS code mostly written by economists Not well-documented Transition to production as official data product requires: Entire program must be reproducible from written detailed specifications Code written and maintained to programming division’s standards (e.g., code versioning and change control) Will make LBD/BDS easier to maintain and enhance
5. Publish entire BDS time series on NAICS basis Current BDS published on an SIC basis Most Census economic programs switched to NAICS beginning with 1997 reference year We will use the Fort and Klimek (2016) method to assign latest NAICS coding to every establishment for the entire BDS time series
6. Implement new disclosure avoidance method Current BDS tables use cell suppression to avoid disclosing confidential information Research is currently underway to implement a disclosure avoidance method called differential privacy (Dwork 2006) Intuitively, the published tables will be synthesized from the actual tabulations in a way that puts an acceptable bound on the probability of disclosure, while preserving the data’s usefulness
Questions? Contact: Martha Stinson: Martha.Stinson@census.gov T. Kirk White: Thomas.Kirk.White@census.gov James Lawrence: James.Lawrence@census.gov