Download presentation
Presentation is loading. Please wait.
Published byHarry Gallagher Modified over 8 years ago
1
Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center
2
Thesis Organizational boundaries are blurring in the emerging networked economy Organizational boundaries are blurring in the emerging networked economy –Compete and co-operate simultaneously –Int’l value chain Need to rethink information sharing, searching, and mining in the new brave world of virtual organizations Need to rethink information sharing, searching, and mining in the new brave world of virtual organizations
3
Separate databases due to statutory, competitive, or security reasons. Separate databases due to statutory, competitive, or security reasons. Selective, minimal sharing on need-to-know basis. Example: Among those who took a particular drug, how many had adverse reaction and their DNA contains a specific sequence? Example: Among those who took a particular drug, how many had adverse reaction and their DNA contains a specific sequence? Researchers must not learn anything beyond counts. Commutative Encryption: E1(E2(T)) = E2(E1(T)) Minimal Necessary Sharing R S R must not know that S has b & y S must not know that R has a & x u v RSRSau v x bu v y R S Count (R S) R & S do not learn anything except that the result is 2. Sovereign Information Sharing Sovereign Information Sharing SIGMOD 00
4
Privacy Preserving Data Mining 50 | 40K |...30 | 70K |... Randomizer Reconstruct distribution of Age Reconstruct distribution of Salary Data Mining Algorithms Data Mining Model 65 | 20K |...25 | 60K |... Alice’s age Alice’s salary Bob’s age 30+35 Insight: Preserve privacy at the individual level, while still building accurate data mining models at the aggregate level. Add random noise to individual values to protect privacy. EM algorithm to estimate original distribution of values given randomized values + randomization function. Algorithms for building classification models and discovering association rules on top of privacy- preserved data with only small loss of accuracy. SIGMOD 00
5
Finessing Schema Chaos Use a simple regular expression extractor to get numbers Do simple data extraction to get hints Hint for unit: the word following the number. Hint for attribute name: k following numbers. Use only numbers in the queries Treat any attribute name in the query also as hint Reflectivity estimates accuracy W W W 03
6
Privacy Preserving Indexing A public mapping function that maps a query to a set of providers P that may contain the desired document A public mapping function that maps a query to a set of providers P that may contain the desired document P contains false negatives P contains false negatives Providers return a document only if the searcher is authorized to access the document Providers return a document only if the searcher is authorized to access the document VLDB 03
7
Some Interesting Topics Current integration approaches do not scale Current integration approaches do not scale –Information integration per se is not interesting –Static vs. dynamic plumbing Incentive compatibility Incentive compatibility Auditing interactions Auditing interactions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.