Download presentation
Presentation is loading. Please wait.
1
Demo, May 2005 Privacy Preserving Database Application Testing Xintao Wu, Yongge Wang, Yuliang Zheng, UNC Charlotte
2
Demo2 Overview Milestone Initial investigation from May 2002 to Dec 2002 Official starting from Sept 2003 and being supported by NSF CCR- 0310974 ( 200k, Sept 2003 – August 2005) The prototype system was finished April 2005. Developed using C++, Oracle with 22K lines of source code Demo at several Banks, May 2005 … Personnel Faculty: Xintao Wu, Yongge Wang, Yuliang Zheng Current graduate students: Songtao Guo, Ying Wu, Chintan Sanghvi, Guodong Jiao Previous graduate students: Jing Jin, Amol Kedar Several senior undergraduate students More Info http://www.cs.uncc.edu/~xwu/privacy xwu@uncc.edu
3
Demo3 Motivation To generate synthetic data for DB application testing, especially performance testing. Many applications are involving large-scale databases with sensitive information. Complete testing is essential for database applications to function correctly and to provide acceptable performance.
4
Demo4 Our Approach To generate synthetic databases based on a-priori knowledge about the current production databases The needed a-priori knowledge is generally available from ER, DDL, Data Dictionary with schema, data integrity rules as well as basic statistical information Can extract detailed statistical information if original data or samples from production database are available The data can be either realistic amounts or any amounts Better controllability, observability, and privacy
5
Demo5 Three Characteristics of Synthetic Data Valid The synthetic data need to satisfy all the same constraints and business rules as the live data Necessary for functional testing Privacy preserving No disclosure of any confidential information that need to be protected Resembling to real data The synthetic data need to have the similar statistical distributions or patterns as the live data Necessary for performance testing as the statistical nature of the data determines query performance We will show if data distributions are not similar, the execution time of the same workload may be totally different.
6
Demo6 ER Data DDL Catalog RNRS Schema & Domain Filter Schema’Domain’ Disclosure Assessment Performance Assessment Data Generator Synthetic database General Location Model Architecture
7
Demo7 Building a Project
8
Demo8 Data Dictionary Information
9
Demo9 Statistical Information Extraction Basic
10
Demo10 Statistical Information Extraction Advance
11
Demo11 Generating Meta & Data File
12
Demo12 Generating Confidential File
13
Demo13 Disclosure Analysis - Categorical
14
Demo14 Numerical Disclosure Basic Batch Mode
15
Demo15 Numerical Disclosure Basic Single Mode
16
Demo16 Creating Final Categorical File
17
Demo17 Creating Final Rule File (GLM Format)
18
Demo18 Generating Data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.