Sovereign Information Sharing and Mining in a Connected World R. Agrawal Intelligent Information Systems Research IBM Almaden Research Center, San Jose,

Slides:



Advertisements
Similar presentations
Web Service Architecture
Advertisements

Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
General introduction to Web services and an implementation example
NIST Big Data Public Working Group Security and Privacy Subgroup Presentation September 30, 2013 Arnab Roy, Fujitsu Akhil Manchanda, GE Nancy Landreville,
High Performance Computing Course Notes Grid Computing.
Chapters 14 & 15 Internet Databases. E-Commerce  Bringing new products, services, or ideas to market, supporting and enhancing business operations 
Interactive Systems Technical Design Seminar work: Web Services Janne Ojanaho.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
Latest techniques and Applications in Interprocess Communication and Coordination Xiaoou Zhang.
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
Chapter 17: Client/Server Computing Business Data Communications, 4e.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
1 Pertemuan 13 Servers for E-Business Matakuliah: M0284/Teknologi & Infrastruktur E-Business Tahun: 2005 Versi: >
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
1 IBM SanFrancisco Product Evaluation Negotiated Option Presentation By Les Beckford May 2001.
Grid Computing, B. Wilkinson, 20043a.1 WEB SERVICES Introduction.
Workshop on Cyber Infrastructure in Combustion Science April 19-20, 2006 Subrata Bhattacharjee and Christopher Paolini Mechanical.
ECOMMERCE TECHNOLOGY SUMMER 2002 COPYRIGHT © 2002 MICHAEL I. SHAMOS eCommerce Technology Lecture 4: Web Architecture.
Ch 12 Distributed Systems Architectures
Chapter 4.1 Interprocess Communication And Coordination By Shruti Poundarik.
Centralized and Client/Server Architecture and Classification of DBMS
Distributed Databases
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
Enabling Sovereign Information Sharing Using Web Services R. Agrawal, D. Asonov, R. Srikant IBM Almaden Research Center P. Baliga, L. Liang, B. Porst Additional.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
Best Practices in Deploying a PKI Solution BIEN Nguyen Thanh Product Consultant – M.Tech Vietnam
A Cross-Platform Component Based Ecommerce Framework in.NET Vishwak Rajgopalan Under the guidance of Dr. Daniel Andresen (Major Professor) Dr. Mitchell.
SSL and https for Secure Web Communication CSCI 5857: Encoding and Encryption.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
CS480 Computer Science Seminar Introduction to Microsoft Solutions Framework (MSF)
Engr. M. Fahad Khan Lecturer Software Engineering Department University Of Engineering & Technology Taxila.
Web Services based e-Commerce System Sandy Liu Jodrey School of Computer Science Acadia University July, 2002.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
CoBrow Collaborative Browsing A Virtual Presence Service RE 1003 RE 4003.
Chapter 17: Client/Server Computing Business Data Communications, 4e.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #22 Secure Web Information.
Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
1 G52IWS: Web Services Chris Greenhalgh. 2 Contents The World Wide Web Web Services example scenario Motivations Basic Operational Model Supporting standards.
Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
The Holmes Platform and Applications
WHY VIDEO SURVELLIANCE
WHY VIDEO SURVELLIANCE
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Virtualization, Cloud Computing and Big Data
University of Technology
Chapter 17: Database System Architectures
Introduction to Databases Transparencies
PBKM: A Secure Knowledge Management Framework
Chapter 17: Client/Server Computing
Technical Capabilities
WHY VIDEO SURVELLIANCE
WHY VIDEO SURVELLIANCE
SDMX IT Tools SDMX Registry
Presentation transcript:

Sovereign Information Sharing and Mining in a Connected World R. Agrawal Intelligent Information Systems Research IBM Almaden Research Center, San Jose, CA D. Asonov, P. Baliga, A. Evfimieviski, L. Liang, B. Porst, R. Srikant Joint Work with: D. Asonov, P. Baliga, A. Evfimieviski, L. Liang, B. Porst, R. Srikant

Outline Information sharing today Information sharing today The new world The new world Some solution approaches Some solution approaches Observations on privacy-preserving data mining Observations on privacy-preserving data mining Musings about the future Musings about the future R. Agrawal, A. Evfimievski, R. Srikant. Information Sharing Across Private Databases. SIGMOD 03. R. Agrawal, D. Asonov, R. Srikant. Enabling Sovereign Information Sharing Using Web Services. SIGMOD 04 (Industrial Track). R. Agrawal, D. Asonov, P. Baliga, L. Liang, B. Porst, R. Srikant. A Reusable Platform for Building Sovereign Information Sharing Applications. DIVO 04.

Assumption: Information in each database can be freely shared. Information Sharing Today Mediator Q R Federated Q R Centralized

Need for a new style of information sharing Compute queries across databases so that no more information than necessary is revealed (without using a trusted third party). Compute queries across databases so that no more information than necessary is revealed (without using a trusted third party). Need is driven by several trends: Need is driven by several trends: –End-to-end integration of information systems across companies (virtual organizations) –Simultaneously compete and cooperate. –Security: need-to-know information sharing

Security Application Security Agency finds those passengers who are in its list of suspects, but not the names of other passengers. Security Agency finds those passengers who are in its list of suspects, but not the names of other passengers. Airline does not find anything. Airline does not find anything. Agency Suspect List Airline Passenger List

Epidemiological Research Validate hypothesis between adverse reaction to a drug and a specific DNA sequence. Validate hypothesis between adverse reaction to a drug and a specific DNA sequence. Researcher should not learn anything beyond 4 counts: Researcher should not learn anything beyond 4 counts: Medical Research Inst. DNA Sequences Drug Reactions Adverse Reaction No Adv. Reaction Sequence Present ?? Sequence Absent ??

R  S  R must not know that S has b & y  S must not know that R has a & x u v RSRSau v x bu v y R S Count (R  S)  R & S do not learn anything except that the result is 2. Minimal Necessary Sharing

Problem Statement: Minimal Sharing Given: Given: –Two parties (honest-but-curious): R (receiver) and S (sender) –Query Q spanning the tables R and S –Additional (pre-specified) categories of information I Compute the answer to Q and return it to R without revealing any additional information to either party, except for the information contained in I Compute the answer to Q and return it to R without revealing any additional information to either party, except for the information contained in I –For example, in the upcoming intersection protocols I = { |R|, |S| }

A Possible Approach Secure Multi-Party Computation Secure Multi-Party Computation –Given two parties with inputs x and y, compute f(x,y) such that the parties learn only f(x,y) and nothing else. –Can be solved by building a combinatorial circuit, and simulating that circuit [Yao86]. Prohibitive cost for database-size problems. Prohibitive cost for database-size problems. –Intersection of two relations of a million records each would require 144 days (Yao’s protocol)

Intersection Protocol RS R S Secret key a b f b (S ) Shorthand for { f b (s) | s  S } Commutative Encryption f a (f b (s)) = f b (f a (s)) f(s,b,p) = s b mod p

R Intersection Protocol S R S f b (S) f a (f b (S )) a b f b (f a (S )) Commutative property

R Intersection Protocol S R S f a (R ) f b (f a (S )) { } a b { } Since R knows

Related Work [Naor & Pinkas 99]: Two protocols for list intersection problem [Naor & Pinkas 99]: Two protocols for list intersection problem –Oblivious evaluation of n polynomials of degree n each. –Oblivious evaluation of n 2 linear polynomials. [Huberman et al 99]: find people with common preferences, without revealing the preferences. [Huberman et al 99]: find people with common preferences, without revealing the preferences. –Intersection protocols are similar [Clifton et al, 03]: Secure set union and set intersection [Clifton et al, 03]: Secure set union and set intersection –Similar protocols

Implementation: Grid of Data Services DP DB Server meta data Data Provider SIS Server n DP DB Server meta data Data Provider SIS Server 1 Application SIS Client User Application Developer Client Metadata SIS Platform Constructs web service query requests against multiple data providers, and collects responses. Mapping information and data provider access information. Thin layer on top of the SIS client: invokes the required SIS operations, provides an interface to a SIS user. Includes view information to retrieve data from the data provider database, database access information, and context information. Provides the necessary functionality on the data provider side to enable sovereign sharing. Templates to aid application development

System Issues How does the application developer find the necessary data sources and their schemas? (resource discovery mechanism) How does the application developer find the necessary data sources and their schemas? (resource discovery mechanism) Employ a UDDI registry to store and searchEmploy a UDDI registry to store and search –data providers and operations they support –available schemas for each data provider How does the application developer link the data between different providers? (schema mapping mechanism) How does the application developer link the data between different providers? (schema mapping mechanism) Data providers publish schemas in their own vocabularies.Data providers publish schemas in their own vocabularies. Developers link the schemas.Developers link the schemas. How to ensure that only eligible users can carry out the computation? (authentication mechanism) How to ensure that only eligible users can carry out the computation? (authentication mechanism) Authentication across multiple domainsAuthentication across multiple domains

Implementation Environment Data resides in DB2 v.8.1. database systems, installed on 2.4GHz/ 512MB RAM Intel workstations, connected by a 100Mbit LAN network. Data resides in DB2 v.8.1. database systems, installed on 2.4GHz/ 512MB RAM Intel workstations, connected by a 100Mbit LAN network. Web services run on top of the IBM WebSphere Application Server v.5.0 and use Apache AXIS v.1.1. SOAP library for messaging. Web services run on top of the IBM WebSphere Application Server v.5.0 and use Apache AXIS v.1.1. SOAP library for messaging. IBM private UDDI registry installed on one of the machines. IBM private UDDI registry installed on one of the machines.

PerformanceImplementationms Java program 32 Java DB2 UDF Exponentiation time for one number (Intel P3) 65 ms MS Visual C++ (Crypto++ library)

Making Encryption Faster: Software Approaches The main component of encryption is exponentiation: enc(x, k, p) = x k mod p The main component of encryption is exponentiation: enc(x, k, p) = x k mod p Tried custom implementations of exponentiation that used preprocessing based on Tried custom implementations of exponentiation that used preprocessing based on –fixed exponent (k) –fixed base (x) Fixed exponent implementation turned out to be slower than the Java native implementation Fixed exponent implementation turned out to be slower than the Java native implementation Fixed base is beneficial if the same value is encrypted multiple times with different keys (not useful for intersection where each value is encrypted once) Fixed base is beneficial if the same value is encrypted multiple times with different keys (not useful for intersection where each value is encrypted once)

Making Encryption Faster: Hardware Accelerator Use SSL card to speed-up exponentiation Use SSL card to speed-up exponentiation Multiple threads (100+) must post exponentiation request simultaneously to the card API to get the advertised speed-up Multiple threads (100+) must post exponentiation request simultaneously to the card API to get the advertised speed-up AEP scheduler distributes exponentiation requests between multiple cards automatically; linear speed-up AEP scheduler distributes exponentiation requests between multiple cards automatically; linear speed-up Example: AEP SSL CARD Runner 2000 ≈ $2k

Execution time: Encryption UDF Encryption Engine Number of rows in the table 1,0005,00010,000 CPU Intel III 2.0 Ghz 34s 175s 320s AEP Runner s 19s 37s

Application Performance Encryption speed is 20K encryptions per minute using one accelerator card ($2K per card) Encryption speed is 20K encryptions per minute using one accelerator card ($2K per card) Airline application: 150,000 (daily) passengers and 1 million people in the watch list: Airline application: 150,000 (daily) passengers and 1 million people in the watch list:  120 minutes with one accelerator card  12 minutes with ten accelerator cards Epidemiological research: 1 million patient records in the hospital and 10 million records in the Genebank: Epidemiological research: 1 million patient records in the hospital and 10 million records in the Genebank:  37 hours with one accelerator cards  3.7 hours with ten accelerator cards

Current Work Use of secure coprocessors to address Use of secure coprocessors to address –Richer join operations –Performance –Semi-dishonesty Incentive compatibility and auditing to address maliciousness Incentive compatibility and auditing to address maliciousness IBM 4764 cryptographic coprocessor

Privacy Preserving Data Mining: The Randomization Approach To hide original values x 1, x 2,..., x n To hide original values x 1, x 2,..., x n –from probability distribution X (unknown) we use y 1, y 2,..., y n –from probability distribution Y Problem: Given Problem: Given –x 1 +y 1, x 2 +y 2,..., x n +y n –the probability distribution of Y Estimate the probability distribution of X. Estimate the probability distribution of X. Use the estimated distribution of X to build the classification model Use the estimated distribution of X to build the classification model Extended subsequently to mining Association rules while preserving the privacy of individual transactions Extended subsequently to mining Association rules while preserving the privacy of individual transactions R. Agrawal, R. Srikant. Privacy Preserving Data Mining. SIGMOD 00. A. Evfimievski, R. Srikant, R. Agrawal, J. Gehrke. Privacy Preserving Mining of Association Rules. SIGKDD 02.

Distributed Setting Application scenario: A central server interested in building a data mining model using data obtained from a large number of clients, while preserving their privacy Application scenario: A central server interested in building a data mining model using data obtained from a large number of clients, while preserving their privacy –Web-commerce, e.g. recommendation service Desiderata: Desiderata: –Must not slow-down the speed of client interaction –Must scale to very large number of clients During the application phase During the application phase –Ship model to the clients –Use oblivious computations Implication: Implication: –Action taken to preserve privacy of a record must not depend on other records –Fast, per-transaction perturbation (potential loss in accuracy)

Inter-Enterprise Setting A party has access to all the records in its database A party has access to all the records in its database –Considerable increase in available options Cryptographic approaches Cryptographic approaches –Lindell & Pinkas [Crypto 2000] –Purdue Toolkit [Clifton et al 2003] Global approaches (e.g. swapping) from SDC Global approaches (e.g. swapping) from SDC Model combination and Voting Model combination and Voting –Potential for leakage from individual models Tradeoff between Generality, Performance, Accuracy, and Potential disclosure: Not Well understood

Outlook Three stages of Network era * Three stages of Network era * –Brochure stage (informational websites) –Transaction stage (e-commerce, online banking, etc.) –E-business on demand (integrate business processes within and with external parties; dynamic virtual organizations) The on demand era is presenting research opportunities for discontinuous thinking The on demand era is presenting research opportunities for discontinuous thinking Sovereign information sharing is one such key opportunity, but challenges abound: Sovereign information sharing is one such key opportunity, but challenges abound: –Fast, scalable, and composable protocols –New framework for thinking about ownership, privacy, and security (zero-leakage model does not scale) * IBM. Living in an On Demand World. October 2002.