Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.

Slides:



Advertisements
Similar presentations
Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
A Privacy Preserving Index for Range Queries
Privacy-Preserving Databases and Data Mining Yücel SAYGIN
M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir.
Sovereign Information Sharing and Mining in a Connected World R. Agrawal Intelligent Information Systems Research IBM Almaden Research Center, San Jose,
Efficiency concerns in Privacy Preserving methods Optimization of MASK Shipra Agrawal.
Strategic Management & Strategic Competitiveness
1 Trust and Privacy in Authorization Bharat Bhargava Yuhui Zhong Leszek Lilien CERIAS Security Center CWSA Wireless Center Department of CS and ECE Purdue.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Data Mining: Next 10 Years Rakesh Agrawal IBM Almaden Research Center Position from KDD-2001 Revisited.
Privacy Preserving Indexing of Documents on the Network Mayank Bawa Roberto J. Bayardo Jr. Rakesh Agrawal
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Some Interesting Problems Rakesh Agrawal IBM Almaden Research Center.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
1 When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University.
The Wharton School of the University of Pennsylvania OPIM 101 2/16/19981 The Information Retrieval Problem n The IR problem is very hard n Why? Many reasons,
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
Privacy Preservation for Data Streams Feifei Li, Boston University Joint work with: Jimeng Sun (CMU), Spiros Papadimitriou, George A. Mihaila and Ioana.
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
© 2008 Prentice Hall11-1 Introduction to Project Management Chapter 11 Managing Project Execution Information Systems Project Management: A Process and.
Ling Liu Professor School of Computer Science Georgia Institute of Technology Cloud Computing Research in my group.
SE571 Security in Computing
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Data Mining Technologies for Digital Libraries & Web Information Systems Ramakrishnan Srikant.
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
Enabling Sovereign Information Sharing Using Web Services R. Agrawal, D. Asonov, R. Srikant IBM Almaden Research Center P. Baliga, L. Liang, B. Porst Additional.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Secure Cloud Database using Multiparty Computation.
October 8, 2015 University of Tulsa - Center for Information Security Microsoft Windows 2000 DNS October 8, 2015.
Databases Collections of data. Set of rules to organize data. Types ◦ Relational: use (rows) & columns to organize. ◦ Object oriented: complex data (audio,
Tools for Privacy Preserving Distributed Data Mining
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Lecture 11 Managing Project Execution. Project Execution The phase of a project in which work towards direct achievement of the project’s objectives and.
Data Mining: Potentials and Challenges Rakesh Agrawal IBM Almaden Research Center.
Privacy Preserving Mining of Association Rules Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke IBM Almaden Research Center.
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Integrating Structured & Unstructured Data. Goals  Identify some applications that have crucial requirement for integration of unstructured and structured.
Software Security II Karl Lieberherr. What is Security Enforcing a policy that describes rules for accessing resources. Policy may be explicit or implicit.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Randomization based Privacy Preserving Data Mining Xintao Wu University of North Carolina at Charlotte August 30, 2012.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Secure Data Outsourcing
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
KMS & Collaborative Filtering Why CF in KMS? CF is the first type of application to leverage tacit knowledge People-centric view of data Preferences matter.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Research, Projects and Topics for Theseses (MSc or PhD) Presented by Prof. Ehud Gudes.
1 Maintaining Data Privacy in Association Rule Mining Speaker: Minghua ZHANG Oct. 11, 2002 Authors: Shariq J. Rizvi Jayant R. Haritsa VLDB 2002.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Projects on Extended Apache Spark
I don’t need a title slide for a lecture
Panagiotis G. Ipeirotis Luis Gravano
Database Management System
Some contents are borrowed from Adam Smith’s slides
Presentation transcript:

Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center

Thesis Organizational boundaries are blurring in the emerging networked economy Organizational boundaries are blurring in the emerging networked economy –Compete and co-operate simultaneously –Int’l value chain Need to rethink information sharing, searching, and mining in the new brave world of virtual organizations Need to rethink information sharing, searching, and mining in the new brave world of virtual organizations

Separate databases due to statutory, competitive, or security reasons. Separate databases due to statutory, competitive, or security reasons.  Selective, minimal sharing on need-to-know basis. Example: Among those who took a particular drug, how many had adverse reaction and their DNA contains a specific sequence? Example: Among those who took a particular drug, how many had adverse reaction and their DNA contains a specific sequence?  Researchers must not learn anything beyond counts.  Commutative Encryption: E1(E2(T)) = E2(E1(T)) Minimal Necessary Sharing R  S  R must not know that S has b & y  S must not know that R has a & x u v RSRSau v x bu v y R S Count (R  S)  R & S do not learn anything except that the result is 2. Sovereign Information Sharing Sovereign Information Sharing SIGMOD 00

Privacy Preserving Data Mining 50 | 40K |...30 | 70K |... Randomizer Reconstruct distribution of Age Reconstruct distribution of Salary Data Mining Algorithms Data Mining Model 65 | 20K |...25 | 60K |... Alice’s age Alice’s salary Bob’s age Insight: Preserve privacy at the individual level, while still building accurate data mining models at the aggregate level. Add random noise to individual values to protect privacy. EM algorithm to estimate original distribution of values given randomized values + randomization function. Algorithms for building classification models and discovering association rules on top of privacy- preserved data with only small loss of accuracy. SIGMOD 00

Finessing Schema Chaos  Use a simple regular expression extractor to get numbers  Do simple data extraction to get hints Hint for unit: the word following the number. Hint for attribute name: k following numbers.  Use only numbers in the queries Treat any attribute name in the query also as hint Reflectivity estimates accuracy W W W 03

Privacy Preserving Indexing A public mapping function that maps a query to a set of providers P that may contain the desired document A public mapping function that maps a query to a set of providers P that may contain the desired document P contains false negatives P contains false negatives Providers return a document only if the searcher is authorized to access the document Providers return a document only if the searcher is authorized to access the document VLDB 03

Some Interesting Topics Current integration approaches do not scale Current integration approaches do not scale –Information integration per se is not interesting –Static vs. dynamic plumbing Incentive compatibility Incentive compatibility Auditing interactions Auditing interactions