BTS Confidentiality Seminar Series June 11, 2003 FCSM/CDAC Disclosure Limiting Auditing Software: DAS Mark A. Schipper Ruey-Pyng Lu Energy Information.

Slides:



Advertisements
Similar presentations
Characterization and Management of Multiple Components of Cost and Risk in Disclosure Protection for Establishment Surveys Discussion of Advances in Disclosure.
Advertisements

Integer Optimization Basic Concepts Integer Linear Program(ILP): A linear program except that some or all of the decision variables must have integer.
Optimization problems using excel solver
BU Decision Models Integer_LP1 Integer Optimization Summer 2013.
Chapter 5: Linear Programming: The Simplex Method
Confidentiality risks of releasing measures of data quality Jerry Reiter Department of Statistical Science Duke University
Lecture 3 Linear Programming: Tutorial Simplex Method
Linear Programming. Introduction: Linear Programming deals with the optimization (max. or min.) of a function of variables, known as ‘objective function’,
Wyndor Example; Enter data Organize the data for the model on the spreadsheet. Type in the coefficients of the constraints and the objective function.
DETAILED DESIGN, IMPLEMENTATIONA AND TESTING Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
Linear Programming Problem
Introduction to Sensitivity Analysis Graphical Sensitivity Analysis
What is GAMS?. While they are not NLP solvers, per se, attention should be given to modeling languages like: GAMS- AIMMS-
Optimization Models Module 9. MODEL OUTPUT EXTERNAL INPUTS DECISION INPUTS Optimization models answer the question, “What decision values give the best.
CIPSEA, Confidentiality and the ALMIS Database Roger Therrien Director, Office of Research Connecticut Department of Labor ALMIS Database Seminar San Diego,
Computational Methods for Management and Economics Carla Gomes
1 Linear Programming Using the software that comes with the book.
5.6 Maximization and Minimization with Mixed Problem Constraints
Computational Methods for Management and Economics Carla Gomes Module 4 Displaying and Solving LP Models on a Spreadsheet.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Spreadsheet Modeling & Decision Analysis:
Linear Programming Operations Research – Engineering and Math Management Sciences – Business Goals for this section  Modeling situations in a linear environment.
FORS 4710 / 6710 Forest Planning FORS 8450 Advanced Forest Planning Lecture 2 Linear Programming.
Stevenson and Ozgur First Edition Introduction to Management Science with Spreadsheets McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
Introduction to Mathematical Programming OR/MA 504 Chapter 3.
Chapter 19 Linear Programming McGraw-Hill/Irwin
1. “Software for Tabular Data Protection” Joe Fred Gonzalez, Jr. Lawrence H. Cox National Center for Health Statistics NCHS Data Users Conference July.
Chapter 3 Linear Programming Methods 高等作業研究 高等作業研究 ( 一 ) Chapter 3 Linear Programming Methods (II)
1. The Simplex Method.
Linear Programming Chapter 13 Supplement.
Integer Programming Models
Overview of 2002 CIPSEA: Methods to Protect Confidential Tabular Data Amrut Champaneri, Ph.D. U.S. Department of Transportation Bureau of Transportation.
Special Conditions in LP Models (sambungan BAB 1)
27-30 August, Issyk-kul, Kyrgyzstan National Statistical Committee of Kyrgyz Republic (NSC KR) SPECA PWG on statistics August, Issyk-kul, Kyrgyzstan.
Example 15.1 Daily Scheduling of Postal Employees Workforce Scheduling Models.
Linear Programming: Basic Concepts
Discussion of “ Statistical Disclosure Limitation: Releasing Useful Data for Statistical Analysis” Nancy J. Kirkendall Energy Information Administration.
Types of IP Models All-integer linear programs Mixed integer linear programs (MILP) Binary integer linear programs, mixed or all integer: some or all of.
1 1 Slide © 2000 South-Western College Publishing/ITP Slides Prepared by JOHN LOUCKS.
Kerimcan OzcanMNGT 379 Operations Research1 Linear Programming: The Simplex Method Chapter 5.
1 1 © 2003 Thomson  /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 © 2003 Thomson  /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Linear Programming McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Workforce scheduling – Days off scheduling 1. n is the max weekend demand n = max(n 1,n 7 ) Surplus number of employees in day j is u j = W – n j for.
Goal Seek and Solver. Goal seeking helps you n Find a specific value for a target cell by adjusting the value of one other cell whose value is allowed.
1 Using Fixed Intervals to Protect Sensitive Cells Instead of Cell Suppression By Steve Cohen and Bogong Li U.S. Bureau of Labor Statistics UNECE/Work.
 Minimization Problem  First Approach  Introduce the basis variable  To solve minimization problem we simple reverse the rule that is we select the.
1 1 Slide © 2005 Thomson/South-Western Linear Programming: The Simplex Method n An Overview of the Simplex Method n Standard Form n Tableau Form n Setting.
Chapter 4 Linear Programming: The Simplex Method
1 1 Slide © 2000 South-Western College Publishing/ITP Slides Prepared by JOHN LOUCKS.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
Spreadsheet Modeling & Decision Analysis A Practical Introduction to Management Science 5 th edition Cliff T. Ragsdale.
SAS® Macros for Constraining Arrays of Numbers Charles D. Coleman Economic Statistical Methods Division US Census Bureau.
McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Supplement 6 Linear Programming.
3 Components for a Spreadsheet Optimization Problem  There is one cell which can be identified as the Target or Set Cell, the single objective of the.
LINEAR PROGRAMMING 3.4 Learning goals represent constraints by equations or inequalities, and by systems of equations and/or inequalities, and interpret.
Integer Bounds on Suppressed Cells in Multi-Way Tables Stephen F. Roehrig Carnegie Mellon University For.
1 Confidentiality and Data Access Committee Jacob Bournazian, Chair Energy Information Administration BTS Confidentiality Seminar Series June 11, 2003.
Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.
Chapter 4 The Simplex Algorithm and Goal Programming
1 Chapter 13 Mathematical models of networks give us algorithms so computationally efficient that we can employ them to evaluate problems too big to be.
An Introduction to Linear Programming
“Software for Tabular Data Protection”
Basic Project Scheduling
Wyndor Example; Enter data
Chap 9. General LP problems: Duality and Infeasibility
Optimization Models Module 9.
BUS-221 Quantitative Methods
Presentation transcript:

BTS Confidentiality Seminar Series June 11, 2003 FCSM/CDAC Disclosure Limiting Auditing Software: DAS Mark A. Schipper Ruey-Pyng Lu Energy Information Administration

2 Background To protect confidentiality, agencies suppress table cells that might reveal individual data. Software exists to select cells for suppression, provides no evaluation ( Auditing finds the lower and upper bounds on the values of a withheld (suppressed) cell. EIA lead an inter-agency project to prepare table auditing software, produced FCSM DAS.

3 Common Problem Seeking A Common Solution Seven Agencies Funded Software ($250k) –Bureau of Labor Statistics –Bureau of Economic Analysis –Bureau of the Census –National Center for Education Statistics –Internal Revenue Service –National Science Foundation –Energy Information Administration

4 Planned Uses of DAS Bureau of Labor Statistics (BLS) –DAS was tested and approved for use on Windows NT –Future BLS Statistical Order will require the use of DAS with the following: ES-202 – Covered Employment and Wages OSHS - Occupational Safety and Health Statistics CES - Current Employment Statistics OES - Occupational Employment Statistics

5 Planned Uses Continued… Energy Information Administration –Joint project with US Bureau of the Census working on developing auditing tools for processing of the 2002 Manufacturing Energy Consumption Survey National Science Foundation –Initial contact with NSF’s contractor on executing DAS software

6 SWP Paper 22: Report on Statistical Disclosure Limitation Methodology Auditing Software (mid 1970’s) –U.S. Census Bureau (Cox, 1980) –Statistics Canada (Sande, 1984) Audit systems produce upper and lower estimates for the suppressed cell based on linear combinations of published cells If software is already available, why DAS?

7 Software Requirements must be written in SAS © code, using macros language; must use the PROC LP (SAS/OR Software) as the linear optimizer; must be able to specify (as a LP model) and efficiently audit tables of up to 5 dimensions ;

8 Requirements Continued... must display model results (e.g., minimum, maximum, protection range, and appropriate quality warnings) for all suppressed values; must use ASCII format for model statement input files; and, must pre-verify internal consistency of audit tables.

9 Modules of Software Front-End User Interface Pre-Verification of Audit Table(s) –Ensure Feasible Linear Model Published Cell Values Sum to Published Totals –Rounding of Continuous Cell Values –Negative Cell Values Linear Program Modeling Results Display

10 Auditing Schematic

11 Pre-Verification Verify Aggregates Dimension Totals and Marginal Totals Assume Maximum from Rounding Process e = Max {e i }  i e is dictated by the rounding process; if rounded to integer e = 0.5 e is a variable defined by the user Pre-Verification Satisfies Inequality  X i - ne  X.  e   X i + ne

12 2-D Example: Unrounded Table Total

13 2-D Example: Unrounded and Suppressed Table Total

14 Operations Research Linear Programming (LP) Model –Objective Min or Max v; Subject to: v1 + v2 = 2.6(1) v3 + v4 = 3.0(2) v1 + v3 = 2.6(3) v2 + v4 = 3.8(4) v1 + v2 + v3 + v4 = 9.0 (5) v  0 –Feasible LP Model

15 LP Model Solutions

16 2-D Example: Suppressed and Rounded Total

17 Operations Research Linear Programming (LP) Model 1 –Objective Min or Max v; Subject to: 1 + v1 + v2 = 3 (1) 1 + v3 + v4 = 3(2) 1 + v1 + v3 = 3(3) 2 + v2 + v4 = 4(4) v1 + v2 + v3 + v4 = 9 (5) v  0 –Infeasible LP Model 1 due to Independent Rounding!

18 Infeasibility via Rounding Adding LP Constraints (1) and (2) v1 + v2 + v3 + v4 = 4 Adding LP Constraints (3) and (4) v1 + v2 + v3 + v4 = 4 However, reducing Constraint (5) yields v1 + v2 + v3 + v4 = 3 Hence, the LP model is not feasible. What to do?

19 How To Ensure Feasibility? Accounting for Independent Rounding –Add Surplus and Slack Variables to LP Equality Constraints - Not Used –Directly Adjust Table(s) - Not Used –Represent Rounding Found in Each Published Cell – Option in Current Use –“Best Fit” table approach (Stephen F. Roehrig, Carnegie Mellon University) – Future ?

20 From Tables to Constraints For each non-zero, unsuppressed cell value (u), create a new variable x and add the following constraint for each non-zero, unsuppressed cell. u - e  x  u + e For withheld cells, associate a variable x, constrained only by non-negativity.

21 New LP Model Format Total

22 Revised LP Model Linear Programming (LP) Model 2 –Objective Min or Max x; Subject to: x1 + x2 + x3 = x10 (row 1) x4 + x5 + x6 = x11(row 2) x7 + x8 + x9 = x12(row 3)...and so forth u - e  X  u + e or X is non-negative –where u denotes non-zero, unsuppressed cell values and e is the max (+) rounding value

23 Revised Model Solutions Bounds Expand

24 Is there a another way? Assuming all e’s take the maximum value has some ill effects –With large tables (i.e., large n) likely to obtain wide inequality bounds in verification and optimal solution sets (Kirkendall, Lu, Schipper, Roehrig 2001) Is there a better ways to assign values to e i ? –Heuristically assign a value to e –Best-Fit Approach

25 One Approach – Best-Fit Continuous Table (Roehrig) Directly adjust table cells in the LP model –Goal: Produce an additive table that generates the published table, given independent rounding “Best-Fit” table exists where objective function is the sum of absolute deviations –Minimize Z =  | a ij – x ij | where i,j range over table rows and columns, a ij are the published values, and x ij are the LP variables

26 Software Status Distributed Beta Version in August 2000 to agencies on CDAC Sub-Committee Demonstration at EIA – March 2, 2001 –Test files (csv format) provided by BEA and EIA Potential Additions –Add a user-friendly display manager system –Add a make-tables-add function (e.g., “Best Fit”) –Add a non-SAS optimizer for optimization speed – CPLEX ( Completed inter-agency agreements in August 2001 and distributed copies to those agencies.

27 System Requirements Operating Systems –Windows 95, 98, NT, and 2000 –UNIX Operating Platforms –Stand-Alone PC –Windows “box” –UNIX “box”