Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inference Problem Privacy Preserving Data Mining.

Similar presentations


Presentation on theme: "Inference Problem Privacy Preserving Data Mining."— Presentation transcript:

1 Inference Problem Privacy Preserving Data Mining

2 CSCE 522 - Farkas 2 Lecture 19 Readings and Assignments  I. Moskowitz, M. H. Kang: Covert Channels – Here to Stay? http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.it d.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994 zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf http://citeseer.nj.nec.com/cache/papers/cs/1340/http:zSzzSzwww.it d.nrl.navy.milzSzITDzSz5540zSzpublicationszSzCHACSzSz1994 zSz1994moskowitz-compass.pdf/moskowitz94covert.pdf  Jajodia, Meadows: Inference Problems in Multilevel Secure Database Management Systems http://www.acsac.org/secshelf/book001/book001.html, essay 24 http://www.acsac.org/secshelf/book001/book001.html

3 CSCE 522 - Farkas 3 Lecture 19 Indirect Information Flow Channels Covert channels Inference channels

4 CSCE 522 - Farkas 4 Lecture 19 Communication Channels Overt Channel: designed into a system and documented in the user's manual Covert Channel: not documented. Covert channels may be deliberately inserted into a system, but most such channels are accidents of the system design.

5 CSCE 522 - Farkas 5 Lecture 19 Covert Channel Timing Channel: based on system times Storage channels: not time related communication Can be turned into each other

6 CSCE 522 - Farkas 6 Lecture 19 Inference Channels + Meta-data Sensitive Information Non-sensitive information =

7 CSCE 522 - Farkas 7 Lecture 19 Inference Channels Statistical Database Inferences General Purpose Database Inferences

8 CSCE 522 - Farkas 8 Lecture 19 Statistical Databases Goal: provide aggregate information about groups of individuals  E.g., average grade point of students Security risk: specific information about a particular individual  E.g., grade point of student John Smith Meta-data:  Working knowledge about the attributes  Supplementary knowledge (not stored in database)

9 CSCE 522 - Farkas 9 Lecture 19 Types of Statistics Macro-statistics: collections of related statistics presented in 2- dimensional tables Micro-statistics: Individual data records used for statistics after identifying information is removed Sex\Year19971998Sum Female415 Male 6 1319 Sum101424 SexCourseGPAYear FCSCE 5903.52000 M CSCE 590 3.02000 FCSCE 7904.02001

10 CSCE 522 - Farkas 10 Lecture 19 Statistical Compromise Exact compromise: find exact value of an attribute of an individual (e.g., John Smith’s GPA is 3.8) Partial compromise: find an estimate of an attribute value corresponding to an individual (e.g., John Smith’s GPA is between 3.5 and 4.0)

11 CSCE 522 - Farkas 11 Lecture 19 Methods of Attacks and Protection Small/Large Query Set Attack  C: characteristic formula that identifies groups of individuals If C identifies a single individual I, e.g., count(C) = 1  Find out existence of property If count(C and D)=1 means I has property D If count(C and D)=0 means I does not have D OR  Find value of property Sum(C, D), gives value of D

12 CSCE 522 - Farkas 12 Lecture 19 Small/Large Query Set Attack cont. Protection from small/large query set attack: query-set-size control A query q(C) is permitted only if N-n  |C|  n, where n  0 is a parameter of the database and N is all the records in the database

13 CSCE 522 - Farkas 13 Lecture 19 Tracker attack TrackerC C1 C2 C=C1 and C2 T=C1 and ~C2 q(C)=q(C1) – q(T) q(C) is disallowed

14 CSCE 522 - Farkas 14 Lecture 19 Tracker attack Tracker C C1 C2 C=C1 and C2 T=C1 and ~C2 D C and D q(C and D)= q(T or C and D) – q(T) q(C and D) is disallowed

15 CSCE 522 - Farkas 15 Lecture 19 Query overlap attack C1 C2 John Kathy Max Fred Eve Paul Mitch Q(John)=q(C1)-q(C2) Protection: query-overlap control

16 CSCE 522 - Farkas 16 Lecture 19 Insertion/Deletion Attack Observing changes overtime  q 1 =q(C)  insert(i)  q 2 =q(C)  q(i)=q 2 -q 1 Protection: insertion/deletion performed as pairs

17 CSCE 522 - Farkas 17 Lecture 19 Statistical Inference Theory Give unlimited number of statistics and correct statistical answers, all statistical databases can be compromised (Ullman)

18 CSCE 522 - Farkas 18 Lecture 19 Inferences in General-Purpose Databases Queries based on sensitive data Inference via database constraints Inferences via updates

19 CSCE 522 - Farkas 19 Lecture 19 Queries based on sensitive data Sensitive information is used in selection condition but not returned to the user. Example: Salary: secret, Name: public  Name  Salary=$25,000 Protection: apply query of database views at different security levels

20 CSCE 522 - Farkas 20 Lecture 19 Database Constraints Integrity constraints Database dependencies Key integrity

21 CSCE 522 - Farkas 21 Lecture 19 Integrity Constraints C=A+B A=public, C=public, and B=secret B can be calculated from A and C, i.e., secret information can be calculated from public data

22 CSCE 522 - Farkas 22 Lecture 19 Database Dependencies Metadata: Functional dependencies Multi-valued dependencies Join dependencies etc.

23 CSCE 522 - Farkas 23 Lecture 19 Functional Dependency FD: A  B, that is for any two tuples in the relation, if they have the same value for A, they must have the same value for B. Example: FD: Rank  Salary Secret information: Name and Salary together  Query1: Name and Rank  Query2: Rank and Salary  Combine answers for query1 and 2 to reveal Name and Salary together

24 CSCE 522 - Farkas 24 Lecture 19 Key integrity Every tuple in the relation have a unique key Users at different levels, see different versions of the database Users might attempt to update data that is not visible for them

25 CSCE 522 - Farkas 25 Lecture 19 Example Name (key)SalaryAddress Black P38,000 PColumbia S Red S42,000 SIrmo S Secret View Name (key)SalaryAddress Black P38,000 PNull P Public View

26 CSCE 522 - Farkas 26 Lecture 19 Updates Public User: Name (key)SalaryAddress Black P38,000 PNull P 1.Update Black’s address to Orlando 2.Add new tuple: (Red, 22,000, Manassas) If Refuse update: covert channel Allow update: Overwrite high data – may be incorrect Create new tuple – which data it correct (polyinstantiation) – violate key constraints

27 CSCE 522 - Farkas 27 Lecture 19 Updates Name (key)SalaryAddress Black P38,000 PColumbia S Red S42,000 SIrmo S Secret user: 1.Update Black’s salary to 45,000 If Refuse update: denial of service Allow update: Overwrite low data – covert channel Create new tuple – which data it correct (polyinstantiation) – violate key constraints

28 CSCE 522 - Farkas 28 Lecture 19 Inference Problem No general technique is available to solve the problem Need assurance of protection Hard to incorporate outside knowledge

29 29 Web Evolution Past: Human usage Static Web pages (HTML, XML) Present: Human & Automated usage Semantic Web, WS, SOA Future: Mobile Computing

30 30 Web Data Security Access Control Models XML Heterogeneous Data: XML, Stream, Text Limitations:  Syntax-based  No association protection  Limited handling of updates  No data or application semantics  No inference control

31 31 Secure XML Views - Example UC S John Smith UC 111-2222 S Jim Dale UC TS S Harry Green UC 333-4444 S Joe White UC MT78 TS medicalFiles countyRec patient name John Smith milBaseRec physician Jim Dale physician Joe White name Harry Green milTag MT78 patient phone 111-2222 phone 333-4444 View over UC data

32 32 John Smith Jim Dale Harry Green Joe White medicalFiles countyRec patient name John Smith milBaseRec physician Jim Dale physician Joe White name Harry Green patient View over UC data Secure XML Views - Example

33 33 medicalFiles countyRec patient name John Smith milBaseRec physician Jim Dale physician Joe White name Harry Green patient View over UC data John Smith Jim Dale Harry Green Joe White Secure XML Views - Example

34 34 UC S John Smith UC Jim Dale UC TS S Harry Green UC Joe White UC medicalFiles countyRec patient name John Smith milBaseRec physician Jim Dale physician Joe White name Harry Green patient View over UC data Secure XML Views - Example

35 35 medicalFiles name John Smith physician Jim Dale physician Joe White name Harry Green View over UC data John Smith Jim Dale Harry Green Joe White Secure XML Views - Example

36 36 The Inference Problem General Purpose Database: Non-confidential data + Metadata  Undesired Inferences Semantic Web: Non-confidential data + Metadata (data and application semantics) + Computational Power + Connectivity  Undesired Inferences

37 37 Correlated Inference address fort Public district basin Public Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base place base Water Source Water source Base Place Water source Base Confidential

38 Organizational Data Confidentia l Attacker Public Access Control Misinfo X Ontology Data Integration and Inferences Web Data X Inference Control

39 Organizational Data Confidentia l Public Misinfo ACCESS and INFERENCE CONTROL POLICY Logic-based inference detection Exact and partial disclosure Data and metadata protection Heterogeneous data manipulation Metadata discovery Inference Control

40 Data Mining and Privacy Statistical inference:  K-anonymity  Correlation General inference:  Pattern  metadata  Biased learning CSCE 522 - Farkas 40 Lecture 19

41 Future 41

42 CSCE 522 - Farkas 42 Lecture 19 Next Class Midterm exam


Download ppt "Inference Problem Privacy Preserving Data Mining."

Similar presentations


Ads by Google