Integer Bounds on Suppressed Cells in Multi-Way Tables Stephen F. Roehrig Carnegie Mellon University For.

Integer Bounds on Suppressed Cells in Multi-Way Tables Stephen F. Roehrig Carnegie Mellon University For

Outline Cell suppression: what and why Auditing Difficulties with multi-way tables The integer rounding property Test sets and Gröbner bases Related work, and questions for you

Cell Suppression: What and Why A typical ES-202 table 10101102103104 011928471727 134141523 240510 15 33682044 491215 Counties SICs Fictitious Data! Not Real! State

Suppressed Table (Yet Unprotected) 10101102103104 01192817 114153 24010 38204 495 Counties SICs Not Real! Fictitious Data!

A 3-Dimensional Example 36 36 36 66618 k=1 k=2 k=3 36 37 3339 66622 36 36 37 67619 66618 67619 96722 2119 59 i/j

Some Choices for CSP Network algorithms for 2-D tables Heuristics for 3-D and higher Fischetti & Salazar González: integer programming using branch-and-cut. They have solved CSP for tables w/ 16K cells (2-D) and 5K cells (3-D). Special structure can help: Duncan et al. solved random 3-D problems up to 4K cells in seconds.

Other Kinds of “Suppression” Start with an N-dimensional “base table” Publish derived marginal tables Think of the unpublished base table, or other margins, as suppressed

Auditing First, determine what constitutes disclosure. –Exact cell value? –Within some range? –High probability for some values? Then, choose a method. –Simultaneous linear equations? –Linear programming? –Enumeration or simulation?

Exact Cell Values Published values are used to write linear equations over the suppressed cells (non- negative variables). Solve simultaneous set with e.g., Gauss-Jordan. Exact disclosures pop out, multiple disclosures found in one “pass” Optimization unnecessary.

Generating Equations 10101102103104 011928x1x1 17x2x2 1x3x3 1415x4x4 3 240x5x5 10 x6x6 3x7x7 820x8x8 4 49x9x9 x 10 x 11 5 Counties SICs

Feasibility Intervals (Zayatz) The same equations used, now called “constraints”. Maximize/minimize each variable, subject to constraints, to get bounds. Is interval wide enough? How computationally hard is this? –BEA example, 16K table cells SAS: 7 hours CPLEX: 3 minutes LP_SOLVE: ??

Rounded Tables (Kirkendall et al.) Cell entries are sometimes rounded. Suppressions determined before or after rounding? Auditing assumes rounded values are exact? What sorts of errors? What about “blunders”?

Higher-Dimensional Tables Integer table, non-integer bounds? Possibility for 3-D and above, since network structure breaks down. Cox shows fractional bound examples are not mathematical anomalies. Experience: –3-way base table, all 2-way margins given: rare. –5-way base table, all 2-way margins given: common.

Why Fractional Bounds? Basically, some marginal sum values cause the constraints to intersect at non-integer points. x2x2 x1x1 LP maximum Integer maximum

Could This Happen? Standard example from integer programming: x2x2 x1x1 LP maximum Integer maximum

The Integer Rounding Property A rational system Ax  b has the integer rounding property if for each integer c for which the l.h.s. is finite.

Hilbert Bases A finite set of vectors { a 1,…, a t } is a Hilbert basis if each integer vector b in cone{ a 1,…, a t } is a non- negative integer combination of a 1,…, a t.

Example {a 1, a 2 } = {(1,1), (1,-1)} Not a Hilbert basis, since there are integer vectors in the cone that can’t be written as z 1 a 1 + z 2 a 2 with z 1, z 2 integer. {a 1, a 2, a 3 } = {(1,1), (1,-1), (1,0)} is a Hilbert basis. x2x2 x1x1

Big Theorem (Giles & Orlin) Ax  b has the integer rounding property iff the rows of the matrix form a Hilbert basis.

What Is Matrix A ? Bounds are obtained by max/min cell values, subject to constraints. We usually think of constraints as Ax=b, where each row of A is one marginal sum. To use Giles and Orlin, we need to “turn this around”, i.e., transpose. So think of A t is the matrix of constraint coefficients.

3-Way Table, All 2-Way Margins Given A t is the matrix of constraint equations. A t has I  J  K variables and ( I  J ) + ( I  K ) +( J  K ) rows. When row-reduced, A t has the form [ I, B ], where – I is the identity matrix –B has a special form (a row is either all non-positive or all non-negative)

3-Way Table (cont.) The matrix A in Giles and Orlin looks like where B ’s columns are either all non-positive or all non-negative.

A Fragment of A 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 -1 0 -1 1 0 -1 -1 1 -1 0 0 1 0 -1 0

Why Is This a Hilbert Basis? 1 -1 0 -1  ½ 1 -1 0 -1  0 1 0 -1 -1  ½ 1 0 -1 -1  1 1 -1 0 0  ½ 1 -1 0 0  1 1 0 -1 0  ½ 1 0 -1 0  0 2 -1 -1 -1 =

Putting It All Together Matrix A is a Hilbert basis The matrix is as well, because b is a vector with a single 1. So we get the integer rounding property, at least in this case.

Finding An Integer Solution The integer rounding property let’s us use simplex to find sharp integer bounds: –Round fractional lower bound up –Round fractional upper bound down What if we need to exhibit an all-integer solution achieving a bound? Use a single “Gomery cut” based on the objective function at the fractional optimum.

Digression: Gröbner Bases What’s the pattern? 0, 2, 4, 6, 8,… = {0,1,2,3,4,…}  {2} What’s the pattern? 0, 2, 4, 6, 8,…, 7, 9, 11, 13, 15,…, 14, 16, 18, 20, 22,… = {0,1,2,3,4,5,…}  {2,7} = {z 1 b 1 +z 2 b 2 } These are “ideals:” {2,7} is called a generator of the ideal.

Gröbner Bases (cont.) If we have a generating set, we can produce the ideal. Sometimes we only know properties of the ideal, not a generating set –E.g., “consecutive numbers are spaced four terms apart” How to find a good generating set (a basis)? There is an algorithm to find a Gröbner basis.

3-D Table, 2-D Margins Known 6 6 6 66618 k=1 k=2 k=3 6 7 9 66622 6 6 7 67619 66618 67619 96722 2119 59 i/j

Gröbner Bases “Moves” Suppose we know a table that matches the published margins (i.e., is feasible). How can we move to another feasible table? Example move: +  0  + 0 0 0 0  + 0 +  0 0 0 0 0 0 0 0 0 0 0 0 0

How to Find All Such Moves Moves can be described as polynomials in variables corresponding to the unknown cells, e.g., Apply Buchberger’s algorithm to find the Gröbner basis of an ideal of such polynomials.

Computing the Gröbner Basis The general-purpose program Macauley can find the 3  3  3 basis in about 7 hours (300 MHz PC). A specialized program does this in 25 mS. The 4  3  3 basis takes 20 minutes (628 moves) The 5  3  3 basis takes 3 months (3236 moves)…

All Moves for 3  3  3 Table Diaconis and Sturmfels show there are 110 basic moves. We can find every table matching the margins by applying these moves: –Try a (random) move –If it leaves the cells non-negative, use it –If not, pick another move and repeat

Connections With Simplex One move for 3  3  3 table looks like: All other moves have only 0,  1 (i.e., network moves). This is the move that can give a fractional bound. -2 + + + 0  +  0 + 0  0 0 0  0 + +  0  0 + 0 + 

Simplex (cont.) Simplex pivots construct new Gröbner basis moves “on the fly”. Simplex decides this direction is useful, but goes only “half way”. Conjecture: Gröbner basis moves for N-way tables, (N-1)-way margins known, contain only 0’s,  1’s and  2’s.

Related Work Dobra looks at N-way tables with only some lower-dimensional margins known. He constructs a graph to represent the known margins. Example: 3-way table, two 2-way margins known: IJK

Related Work If the graph is decomposable, then all GB moves are simple (i.e., network). Dobra develops a basis (larger than GB) and uses it to sample space of tables matching margins. He computes distributions of cell values, rather than absolute bounds. In general, 95% confidence intervals far tighter than bounds.

Questions If confidence intervals on unknown cell values prove to be easy to compute, are bounds still of interest? If the confidence interval is tight, but bounds are wide, what does this mean for disclosure limitation practice?

Integer Bounds on Suppressed Cells in Multi-Way Tables Stephen F. Roehrig Carnegie Mellon University For.

Similar presentations

Presentation on theme: "Integer Bounds on Suppressed Cells in Multi-Way Tables Stephen F. Roehrig Carnegie Mellon University For."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Integer Bounds on Suppressed Cells in Multi-Way Tables Stephen F. Roehrig Carnegie Mellon University For.

Similar presentations

Presentation on theme: "Integer Bounds on Suppressed Cells in Multi-Way Tables Stephen F. Roehrig Carnegie Mellon University For."— Presentation transcript:

Similar presentations

About project

Feedback