1 Lecture 15-16: Security Wednesday, May 17, 2006
2 Outline Traditional data security Two attacks Data security research today Conclusions
3 Data Security Dorothy Denning, 1982: Data Security is the science and study of methods of protecting data (...) from unauthorized disclosure and modification Data Security = Confidentiality + Integrity
4 Data Security Distinct from systems and network security –Assumes these are already secure Tools: –Cryptography, information theory, statistics, … Applications: –An enabling technology
5 Discretionary Access Control in SQL GRANT privileges ON object TO users [WITH GRANT OPTIONS] GRANT privileges ON object TO users [WITH GRANT OPTIONS] privileges = SELECT | INSERT(column-name) | UPDATE(column-name) | DELETE | REFERENCES(column-name) object = table | attribute
6 Examples GRANT INSERT, DELETE ON Customers TO Yuppy WITH GRANT OPTIONS GRANT INSERT, DELETE ON Customers TO Yuppy WITH GRANT OPTIONS Queries allowed to Yuppy: Queries denied to Yuppy: INSERT INTO Customers(cid, name, address) VALUES(32940, ‘Joe Blow’, ‘Seattle’) DELETE Customers WHERE LastPurchaseDate < 1995 INSERT INTO Customers(cid, name, address) VALUES(32940, ‘Joe Blow’, ‘Seattle’) DELETE Customers WHERE LastPurchaseDate < 1995 SELECT Customer.address FROM Customer WHERE name = ‘Joe Blow’ SELECT Customer.address FROM Customer WHERE name = ‘Joe Blow’
7 Examples GRANT SELECT ON Customers TO Michael Now Michael can SELECT, but not INSERT or DELETE
8 Examples GRANT SELECT ON Customers TO Michael WITH GRANT OPTIONS Michael can say this: GRANT SELECT ON Customers TO Yuppi Now Yuppi can SELECT on Customers
9 Examples GRANT UPDATE (price) ON Product TO Leah Leah can update, but only Product.price, but not Product.name
10 Examples GRANT REFERENCES (cid) ON Customer TO Bill Customer(cid, name, address, balance) Orders(oid, cid, amount) cid= foreign key Now Bill can INSERT tuples into Orders Bill has INSERT/UPDATE rights to Orders. BUT HE CAN’T INSERT ! (why ?)
11 Views and Security CREATE VIEW PublicCustomers SELECT Name, Address FROM Customers GRANT SELECT ON PublicCustomers TO Fred CREATE VIEW PublicCustomers SELECT Name, Address FROM Customers GRANT SELECT ON PublicCustomers TO Fred David says NameAddressBalance MaryHuston SueSeattle-240 JoanSeattle AnnPortland-520 David owns Customers: Fred is not allowed to see this
12 Views and Security NameAddressBalance MaryHuston SueSeattle-240 JoanSeattle AnnPortland-520 CREATE VIEW BadCreditCustomers SELECT * FROM Customers WHERE Balance < 0 GRANT SELECT ON BadCreditCustomers TO John CREATE VIEW BadCreditCustomers SELECT * FROM Customers WHERE Balance < 0 GRANT SELECT ON BadCreditCustomers TO John David says David owns Customers: John is allowed to see only <0 balances
13 Views and Security Each customer should see only her/his record CREATE VIEW CustomerMary SELECT * FROM Customers WHERE name = ‘Mary’ GRANT SELECT ON CustomerMary TO Mary CREATE VIEW CustomerMary SELECT * FROM Customers WHERE name = ‘Mary’ GRANT SELECT ON CustomerMary TO Mary Doesn’t scale. Need row-level access control ! NameAddressBalance MaryHuston SueSeattle-240 JoanSeattle AnnPortland-520 David says CREATE VIEW CustomerSue SELECT * FROM Customers WHERE name = ‘Sue’ GRANT SELECT ON CustomerSue TO Sue CREATE VIEW CustomerSue SELECT * FROM Customers WHERE name = ‘Sue’ GRANT SELECT ON CustomerSue TO Sue...
14 Revocation REVOKE [GRANT OPTION FOR] privileges ON object FROM users { RESTRICT | CASCADE } Administrator says: REVOKE SELECT ON Customers FROM David CASCADE John loses SELECT privileges on BadCreditCustomers
15 Revocation Joe: GRANT [….] TO Art … Art: GRANT [….] TO Bob … Bob: GRANT [….] TO Art … Joe: GRANT [….] TO Cal … Cal: GRANT [….] TO Bob … Joe: REVOKE [….] FROM Art CASCADE Same privilege, same object, GRANT OPTION What happens ??
16 Revocation Admin JoeArt CalBob Revoke According to SQL everyone keeps the privilege
17 Summary of SQL Security Limitations: No row level access control Table creator owns the data: that’s unfair ! … or spectacular failure: Only 30% assign privileges to users/roles –And then to protect entire tables, not columns Access control = great success story of the DB community...
18 Summary (cont) Most policies in middleware: slow, error prone: –SAP has 10**4 tables –GTE over 10**5 attributes –A brokerage house has 80,000 applications –A US government entity thinks that it has 350K Today the database is not at the center of the policy administration universe [Rosenthal&Winslett’2004]
19 Security in Statistical DBs Goal: Allow arbitrary aggregate SQL queries Hide confidential data SELECT count(*) FROM Patients WHERE age=42 and sex=‘M’ and diagnostic=‘schizophrenia’ SELECT count(*) FROM Patients WHERE age=42 and sex=‘M’ and diagnostic=‘schizophrenia’ OK SELECT name FROM Patient WHERE age=42 and sex=‘M’ and diagnostic=‘schizophrenia’ SELECT name FROM Patient WHERE age=42 and sex=‘M’ and diagnostic=‘schizophrenia’ Not OK [ Adam&Wortmann’89]
20 Security in Statistical DBs What has been tried: Query restriction –Query-size control, query-set overlap control, query monitoring –None is practical Data perturbation –Most popular: cell combination, cell suppression –Other methods, for continuous attributes: may introduce bias Output perturbation –For continuous attributes only [ Adam&Wortmann’89]
21 Summary on Security in Statistical DB Original goal seems impossible to achieve Cell combination/suppression are popular, but do not allow arbitrary queries
22 Outline Traditional data security Two attacks Data security research today Conclusions
23 Search claims by: SQL Injection Your health insurance company lets you see the claims online: Now search through the claims : Dr. Lee First login: User: Password: fred ******** SELECT…FROM…WHERE doctor=‘Dr. Lee’ and patientID=‘fred’ [Chris Anley, Advanced SQL Injection In SQL]
24 SQL Injection Now try this: Search claims by: Dr. Lee’ OR patientID = ‘suciu’; -- Better: Search claims by: Dr. Lee’ OR 1 = 1; -- …..WHERE doctor=‘Dr. Lee’ OR patientID=‘suciu’; --’ and patientID=‘fred’
25 SQL Injection When you’re done, do this: Search claims by: Dr. Lee’; DROP TABLE Patients; --
26 SQL Injection The DBMS works perfectly. So why is SQL injection possible so often ? Quick answer: –Poor programming: use stored procedures ! Deeper answer: –Move policy implementation from apps to DB
27 Latanya Sweeney’s Finding In Massachusetts, the Group Insurance Commission (GIC) is responsible for purchasing health insurance for state employees GIC has to publish the data: GIC(zip, dob, sex, diagnosis, procedure,...)
28 Latanya Sweeney’s Finding Sweeney paid $20 and bought the voter registration list for Cambridge Massachusetts: GIC(zip, dob, sex, diagnosis, procedure,...) VOTER(name, party,..., zip, dob, sex) GIC(zip, dob, sex, diagnosis, procedure,...) VOTER(name, party,..., zip, dob, sex)
29 Latanya Sweeney’s Finding William Weld (former governor) lives in Cambridge, hence is in VOTER 6 people in VOTER share his dob only 3 of them were man (same sex) Weld was the only one in that zip Sweeney learned Weld’s medical records ! zip, dob, sex
30 Latanya Sweeney’s Finding All systems worked as specified, yet an important data has leaked How do we protect against that ? Some of today’s research in data security address breaches that happen even if all systems work correctly
31 Summary on Attacks SQL injection: A correctness problem: –Security policy implemented poorly in the application Sweeney’s finding: Beyond correctness: –Leakage occurred when all systems work as specified
32 Outline Traditional data security Two attacks Data security research today Conclusions
33 Research Topics in Data Security Rest of the talk: Information Leakage Privacy Fine-grained access control Data encryption Secure shared computation
34 FirstLastAgeRace HarryStone34Afr-Am JohnReyser36Cauc BeatriceStone47Afr-am JohnRamos22Hisp FirstLastAgeRace *Stone30-50Afr-Am JohnR*20-40* *Stone30-50Afr-am JohnR*20-40* Information Leakage: k-Anonymity Definition: each tuple is equal to at least k-1 others Anonymizing: through suppression and generalization Hard: NP-complete for supression only Approximations exists [Samarati&Sweeney’98, Meyerson&Williams’04]
35 Information Leakage: Query-view Security Secret QueryView(s)Disclosure ? S(name)V(name,phone) S(name,phone) V1(name,dept) V2(dept,phone) S(name)V(dept) S(name) where dept=‘HR’ V(name) where dept=‘RD’ TABLE Employee(name, dept, phone) Have data: total big tiny none [Miklau&S’04, Miklau&Dalvi&S’05,Yang&Li’04]
36 Summary on Information Disclosure The theoretical research: –Exciting new connections between databases and information theory, probability theory, cryptography The applications: –many years away [Abadi&Warinschi’05]
37 Privacy “Is the right of individuals to determine for themselves when, how and to what extent information about them is communicated to others” More complex than confidentiality [Agrawal’03]
38 Privacy Involves: Data Owner Requester Purpose Consent Example: Alice gives her to a web service Privacy policy: P3P
39 Hippocratic Databases DB support for implementing privacy policies. Purpose specification Consent Limited use Limited retention … [Agrawal’03, LeFevrey’04] Privacy policy: P3P Hippocratic DB Protection against: Sloppy organizations Malicious organizations
40 Privacy for Paranoids Idea: rely on trusted agents Agent foreign keys ? [Aggarwal’04] Protection against: Sloppy organizations Malicious attackers
41 Summary on Privacy Major concern in industry –Legislation –Consumer demand Challenge: –How to enforce an organization’s stated policies
42 Fine-grained Access Control Control access at the tuple level. Policy specification languages Implementation
43 Policy Specification Language CREATE AUTHORIZATION VIEW PatientsForDoctors AS SELECT Patient.* FROM Patient, Doctor WHERE Patient.doctorID = Doctor.ID and Doctor.login = %currentUser CREATE AUTHORIZATION VIEW PatientsForDoctors AS SELECT Patient.* FROM Patient, Doctor WHERE Patient.doctorID = Doctor.ID and Doctor.login = %currentUser Context parameters No standard, but usually based on parameterized views.
44 Implementation SELECT Patient.name, Patient.age FROM Patient WHERE Patient.disease = ‘flu’ SELECT Patient.name, Patient.age FROM Patient WHERE Patient.disease = ‘flu’ SELECT Patient.name, Patient.age FROM Patient, Doctor WHERE Patient.disease = ‘flu’ and Patient.doctorID = Doctor.ID and Patient.login = %currentUser SELECT Patient.name, Patient.age FROM Patient, Doctor WHERE Patient.disease = ‘flu’ and Patient.doctorID = Doctor.ID and Patient.login = %currentUser e.g. Oracle
45 Two Semantics The Truman Model = filter semantics –transform reality –ACCEPT all queries –REWRITE queries –Sometimes misleading results The non-Truman model = deny semantics –reject queries –ACCEPT or REJECT queries –Execute query UNCHANGED –May define multiple security views for a user [Rizvi’04] SELECT count(*) FROM Patients WHERE disease=‘flu’
46 Summary of Fine Grained Access Control Trend in industry: label-based security Killer app: application hosting –Independent franchises share a single table at headquarters (e.g., Holiday Inn) –Application runs under requester’s label, cannot see other labels –Headquarters runs Read queries over them Oracle’s Virtual Private Database [Rosenthal&Winslett’2004]
47 Data Encryption for Publishing Users and their keys: Complex Policies: All authorized users: K user Patient: K pat Doctor: K dr Nurse: K nu Administrator : K admin All authorized users: K user Patient: K pat Doctor: K dr Nurse: K nu Administrator : K admin What is the encryption granularity ? Doctor researchers may access trials Nurses may access diagnostic Etc… Doctor researchers may access trials Nurses may access diagnostic Etc… Scientist wants to publish medical research data on the Web
48 Data Encryption for Publishing An XML tree protection: JoeDoe 28 Seattle flu K user K pat (K nu K adm )K nu K dr K dr K pat K master TylenolCandy [Miklau&S.’03] Doctor: K user, K dr Nurse: K user, K nu Nurse+admin: K user, K nu, K adm
49 Summary on Data Encryption Industry: –Supported by all vendors: Oracle, DB2, SQL-Server –Efficiency issues still largely unresolved Research: –Hard theoretical security analysis [Abadi&Warinschi’05]
50 Secure Shared Processing Alice has a database DB A Bob has a database DB B How can they compute Q(DB A, DB B ), without revealing their data ? Long history in cryptography Some database queries are easier than general case
51 Secure Shared Processing [Agrawal’03] Alice Bob a b c dc d e h(a) h(b) h(c) h(d)h(c) h(d) h(e) Compute one-way hash Exchange h(c) h(d) h(e)h(a) h(b) h(c) h(d) What’s wrong ? Task: find intersection without revealing the rest
52 Secure Shared Processing AliceBob a b c d c d e E B (c) E B (d) E B (e)E A (a) E A (b) E A (c) E A (d) commutative encryption: h(x) = E A (E B (x)) = E B (E A (x)) E A (a) E A (b) E A (c) E A (d) E B (c) E B (d) E B (e) EAEA EBEB h(c) h(d) h(e)h(a) h(b) h(c) h(d) EAEA EBEB h(c) h(d) h(e) [Agrawal’03]
53 Summary on Secure Shared Processing Secure intersection, joins, data mining But are there other examples ?
54 Outline Traditional data security Two attacks Data security research today Conclusions
55 Conclusions Traditional data security confined to one server –Security in SQL –Security in statistical databases Attacks possible due to: –Poor implementation of security policies: SQL injection –Unintended information leakage in published data
56 Conclusions State of the industry: –Data security policies: scattered throughout applications –Database no longer center of the security universe –Needed: automatic means to translate complex policies into physical implementations State of research: data security in global data sharing –Information leakage, privacy, secure computations, etc. –Database research community has an increased appetite for cryptographic techniques