Re-development of the Cell Suppression Methodology at the US Census Bureau Philip Steel, James Fagan, Paul Massell, Richard Moore Jr., John Slanta, Bei.

Slides:



Advertisements
Similar presentations
Outline LP formulation of minimal cost flow problem
Advertisements

BTS Confidentiality Seminar Series June 11, 2003 FCSM/CDAC Disclosure Limiting Auditing Software: DAS Mark A. Schipper Ruey-Pyng Lu Energy Information.
Solving LP Models Improving Search Special Form of Improving Search
Linear programming: lp_solve, max flow, dual CSC 282 Fall 2013.
Wyndor Example; Enter data Organize the data for the model on the spreadsheet. Type in the coefficients of the constraints and the objective function.
Linear Programming Models & Case Studies
1 MERLIN A polynomial solution for the Traveling Salesman Problem Dr. Joachim Mertz, 2005.
Linear Programming and CPLEX Ting-Yuan Wang Advisor: Charlie C. Chen Department of Electrical and Computer Engineering University of Wisconsin-Madison.
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Duality Dual problem Duality Theorem Complementary Slackness
Chapter 10: Iterative Improvement
CSC5160 Topics in Algorithms Tutorial 1 Jan Jerry Le
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Computational Methods for Management and Economics Carla Gomes
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Support Vector Machines
Protein Encoding Optimization Student: Logan Everett Mentor: Endre Boros Funded by DIMACS REU 2004.
Simulation Basic Concepts. NEED FOR SIMULATION Mathematical models we have studied thus far have “closed form” solutions –Obtained from formulas -- forecasting,
Linear Programming Econ Outline  Review the basic concepts of Linear Programming  Illustrate some problems which can be solved by linear programming.
Computational Methods for Management and Economics Carla Gomes Module 4 Displaying and Solving LP Models on a Spreadsheet.
Matrix table of values
Simulation Basic Concepts. NEED FOR SIMULATION Mathematical models we have studied thus far have “closed form” solutions –Obtained from formulas -- forecasting,
1 Contents college 3 en 4 Book: Appendix A.1, A.3, A.4, §3.4, §3.5, §4.1, §4.2, §4.4, §4.6 (not: §3.6 - §3.8, §4.2 - §4.3) Extra literature on resource.
Linear Programming David Kauchak cs161 Summer 2009.
Network Models II Shortest Path Cross Docking Enhance Modeling Skills Modeling with AMPL Spring 03 Vande Vate.
Lecture 3 Transshipment Problems Minimum Cost Flow Problems
1 Chapter-4: Network Flow Modeling & Optimization Deep Medhi and Karthik Ramasamy August © D. Medhi & K. Ramasamy, 2007.
Spring 03 Vande Vate Hierarchy of Models Define Linear Models Modeling Examples in Excel and AMPL Linear Programming.
1 USING A QUADRATIC PROGRAMMING APPROACH TO SOLVE SIMULTANEOUS RATIO AND BALANCE EDIT PROBLEMS Katherine J. Thompson James T. Fagan Brandy L. Yarbrough.
Types of IP Models All-integer linear programs Mixed integer linear programs (MILP) Binary integer linear programs, mixed or all integer: some or all of.
U N E C E Software Tools for Statistical Disclosure Control by Complementary Cell Suppression – Reality Check Ramesh A. Dandekar U. S. Department.
1 New Implementations of Noise for Tabular Magnitude Data, Synthetic Tabular Frequency and Microdata, and a Remote Microdata Analysis System Laura Zayatz.
1 Assessing the Impact of SDC Methods on Census Frequency Tables Natalie Shlomo Southampton Statistical Sciences Research Institute University of Southampton.
Linear Programming – Simplex Method
Linear Programming Chapter 6. Large Scale Optimization.
296.3Page :Algorithms in the Real World Linear and Integer Programming II – Ellipsoid algorithm – Interior point methods.
© Statistisches Bundesamt, IIA - Mathematisch Statistische Methoden Topic ii New Methodologies for Protecting Data (Disclosure Limitation) Univ. Edinburgh:
Techniques to apply cell suppression to large sparse linked tables and some results using those techniques on the 2012 (US) Economic Census Philip Steel,
Risk Analysis & Modelling
Column Generation By Soumitra Pal Under the guidance of Prof. A. G. Ranade.
Java Methods Big-O Analysis of Algorithms Object-Oriented Programming
1 1 Slide © 2000 South-Western College Publishing/ITP Slides Prepared by JOHN LOUCKS.
1 Network Models Transportation Problem (TP) Distributing any commodity from any group of supply centers, called sources, to any group of receiving.
Chapter 3 Algorithms Complexity Analysis Search and Flow Decomposition Algorithms.
Branch-and-Cut Valid inequality: an inequality satisfied by all feasible solutions Cut: a valid inequality that is not part of the current formulation.
Protection of frequency tables – current work at Statistics Sweden Karin Andersson Ingegerd Jansson Karin Kraft Joint UNECE/Eurostat.
Copyright © Curt Hill Hashing A quick lookup strategy.
Integer Bounds on Suppressed Cells in Multi-Way Tables Stephen F. Roehrig Carnegie Mellon University For.
Linear Programming Chapter 1 Introduction.
The minimum cost flow problem. Solving the minimum cost flow problem.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
Data Driven Resource Allocation for Distributed Learning
“Software for Tabular Data Protection”
Problem 1 Demand Total There are 20 full time employees, each can produce 10.
The minimum cost flow problem
The Simplex Method The geometric method of solving linear programming problems presented before. The graphical method is useful only for problems involving.
CSCI-100 Introduction to Computing
ENGM 535 Optimization Networks
Microsoft Access 2003 Illustrated Complete
Why network models? Visualize a mathematical model
Differential Privacy in Practice
Wyndor Example; Enter data
Duality Theory and Sensitivity Analysis
Facilities Planning and Design Course code:
The Simplex Method The geometric method of solving linear programming problems presented before. The graphical method is useful only for problems involving.
Chapter 5 Transportation, Assignment, and Transshipment Problems
LINEAR PROGRAMMING Example 1 Maximise I = x + 0.8y
Chapter 10: Iterative Improvement
Presentation transcript:

Re-development of the Cell Suppression Methodology at the US Census Bureau Philip Steel, James Fagan, Paul Massell, Richard Moore Jr., John Slanta, Bei Wang

Background Jewett’s network flow program Need for new program 2012 economic census LP (linear programming) methodology R&M cell suppression team

Processing Model Preprocessing – Create table description – Determine primaries – Unduplicate Sequential processing of primaries Queue reduction Test company protection (aggregate/supercell) Sequential processing of supercells

Table relations Marginals are the sum of interior cells Geographic relationships tend to generate our most complex sets of table relations – State is the sum of metropolitan areas within the state and the balance. – State is also the sum of counties Of the form A=B+..+Z where A,B,…,Z are (one of) rows columns or levels that define some Cartesian integer space (i,j,k) Duplicates are recorded as A=B (eg a county is also a place)

Objective Function

Additivity constraint generator (based on row relations) (b) for ii = 1,..., rr, j = 1,..,cols, k = 1,..., levs : limr(ii) ≥ 1, ws(ii,j,k) = 0

Bounds h i,j,k = max(0,v i,j,k ) for i = 1,..., rows, j = 1,..., col, k = 1,..., levs : (i,j,k) ⋲A

For the primary

Skip P Model changes only on the target primary constraints. How can the minimal solution for one target be transformed to be a solution for another target? By applying a scalar that converts the flow through the second P to the fixed value of the model! Can be done when the scalar does not violate the bounding conditions and the complementary flow in the target is 0. I.e. when the solutions flow through the secondary target exceeds its protection requirement.

Empirical confirmation In our large sparse tables, we would see a lot of objective 0 results. That is, the solver finds a 0 cost pattern to protect the primary … it is already protected! Skip P eliminated most objective 0 results and left intact the sequence of positive objectives their solutions.

Fat solution CPLEX is using a dual simplex method to find solutions. The solutions have a growing 0 cost component, with many more cells than are required to protect the target P. The flow in the 0 cost cells far exceeds what is required to protect the target P (except in very small or dense examples). The solution “lights up” the possible flows in the table’s current state, giving a “fat” solution.

Skip P and the fat solution Optimization number Count of P with flow Running total of skipped P

dg10 sector 44 Cartesian cells: 367,605 (2d) Non-zero cells: 159,849 Relations: 283 (row and column) – 14,000 potential tables, linked P: 95,062 LP problems: 10,604 Typical LP size – Reduced LP has rows, columns, and nonzeros Time: 8hr:37min (includes everything)

Comparison between network and LP on one (of hundreds) dataset from 2007 Network flowLP C14,55111,283 Cvalue1,813,213,710598,886,234 PubValue12,348,960,57813,563,288,054 undersuppressions #0 time24min8hrs 37min Statistics based on unduplicated data with an approximation of a published status flag

Thankyou!