Improving researcher access to USDA’s Agricultural Resource Management Survey Charles Towe and Mitch Morehart Economic Research Service, USDA
What is the Agricultural Resource Management Survey? ARMS is USDA’s primary survey for the annual collection of data from farm operators about their: Farm--ownership, governance, management, and performance Farm--ownership, governance, management, and performance Choice of practices, inputs, and expenditures to produce crop and livestock commodities Choice of practices, inputs, and expenditures to produce crop and livestock commodities Household--demographic attributes, economic and financial activities Household--demographic attributes, economic and financial activities
1.Responding to mandates: Income for farms, Costs for commodities, Status of family farms 2.Support for U.S. National Economic Accounts (GDP, Personal Income) 3.Providing data to respond to USDA policies & programs 4.Enabling research to inform decision makers on a variety of issues Program Activities Supported by ARMS
Data delivery (pre 2004) ARMS is complex survey that has existed, in one form or another, for approximately 20 years. ARMS is complex survey that has existed, in one form or another, for approximately 20 years. Since 1996 the data collection methodology has been standardized. Since 1996 the data collection methodology has been standardized.
Data delivery
Project goals Allow user to customize table request Allow user to customize table request Allow 2 way tables Allow 2 way tables Add state level analysis Add state level analysis Support graphical representation of data Support graphical representation of data Provide advanced users access to suite of regression-type methods Provide advanced users access to suite of regression-type methods Provide this to users in an environment that protects survey participants confidentiality to ensure future participation.
Primary Primary looking at individual cells looking at individual cells class disclosure class disclosure Secondary Secondary solving from totals or known formulae solving from totals or known formulae combining data from different tables and sources combining data from different tables and sources using non-suppressed information to infer things using non-suppressed information to infer things much more difficult to check much more difficult to check Primary and complementary cell suppression algorithm
Primary disclosure 1) Threshold rule no cells with less than 3 units (enterprises) no cells with less than 3 units (enterprises) 2) Dominance rule sum of the sample minus the two largest observations (C) cannot exceed 60% of the largest value, or sum of the sample minus the two largest observations (C) cannot exceed 60% of the largest value, or C > 3/5 * U Two largest observations C UV W XY Z
Secondary disclosure 1) Algorithm (equation check) determines additional cells for obfuscation in order to keep primary disclosure intact 2) Factored in solving from totals and across cells using relationship of data in a single table 3) Could not prevent cross table searches
Collect list of violating variables Primary Disclosure Rules on Data (plus statistical reliability) Equation Check Identify the method for selecting complementary cells Done! Build table for display Key variables Final List Candidate List Data request made Primary and complementary cell suppression algorithm
Final implementation Prototype built in 2003 and presented at a peer review panel Prototype built in 2003 and presented at a peer review panel highlighted the need to disseminate data further and illustrated the risks associated highlighted the need to disseminate data further and illustrated the risks associated Resulted in approval of a weighting scheme for all data which, theoretically, eliminates need of secondary suppression. Resulted in approval of a weighting scheme for all data which, theoretically, eliminates need of secondary suppression. Pre-generated each data point Pre-generated each data point Faster response time, which Faster response time, which Allowed greater graphic capabilities Allowed greater graphic capabilities
eGov integration Testing & evaluation Data preparation Application functionality Design 4/21/03 Kickoff Team Charter Identify Goals, Project Plan, & Resources 11/25/03 First prototype presented 4/29/04 V-2 Release ltd, secure Extranet by IP address 9/24/04 Extranet Tool Released 2004 Project Timeline and Milestones 5/10/04 Peer Review 11/9/04 Tailored Reporting Tool Public Release Public website overhauled 6/30/04 V-3 Release ltd, secure Extranet open outside of ERS 3/26/04 V-1 Release Intranet, by IP address 8 6/21/04 Noise implemented 8/5/04 Security Evalua- tion 8/21/04 Audit logging
19 Enclave Basics Mission Mission To Promote Access to sensitive micro data To Promote Access to sensitive micro data To Protect Confidentiality To Protect Confidentiality To Archive, Index and Curate Micro-data To Archive, Index and Curate Micro-data Background Background Started by NIST/ATP Started by NIST/ATP Went live July 2007 Went live July 2007 Current participants/data producers: NIST/ATP, USDA/ERS (pilot), Kauffman Foundation Current participants/data producers: NIST/ATP, USDA/ERS (pilot), Kauffman Foundation Innovations Innovations Secure remote access Secure remote access Collaboratory: a collaborative environment for researchers to work, share code, ideas & work with online discovery tools Collaboratory: a collaborative environment for researchers to work, share code, ideas & work with online discovery tools Standardized metadata documentation techniques (IHSN’s microdata management toolkit; DDI compliance) Standardized metadata documentation techniques (IHSN’s microdata management toolkit; DDI compliance)
20 NORC Data Enclave: Mechanics of Portfolio Approach to Protection Provision of access – a) Technical protection (IT and operational) b) Agency-specific data protection requirements (Legal) c) Statistical protection (Statistical) d) Researcher training (Educational)