Download presentation
Presentation is loading. Please wait.
Published bySimon Hollingshed Modified over 10 years ago
1
A Data Masking Technique for Data Warehouses Ricardo Jorge Santos & Marco Vieira CISUC – DEI – FCTUC University of Coimbra - Portugal Jorge Bernardino CISUC – DEIS – ISEC Polytechnic Intitute of Coimbra - Portugal ISEL, Lisbon – September/2011 INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM
2
Agenda Background 2 Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011 Agenda Background Background Motivation Motivation MOBAT: A MOD Based Data Masking Technique MOBAT: A MOD Based Data Masking Technique Optimization Features Optimization Features Experimental Results Experimental Results Conclusions and Future Work Conclusions and Future Work Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work
3
3 Security Concerns in Data Warehousing A Data Warehouse (DW) is a critical asset for many enterprises A Data Warehouse (DW) is a critical asset for many enterprises Stores all relevant historical and current business information needed for supporting decision making (sensitive data) Stores all relevant historical and current business information needed for supporting decision making (sensitive data) Main targets for stealing or compromising sensitive data Main targets for stealing or compromising sensitive data Attack rate and complexity has increased in the recent past Attack rate and complexity has increased in the recent past Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
4
4 Data Security Domains Data Confidentiality: Only the right users should access the right data Data Confidentiality: Only the right users should access the right data Data Integrity: Data should always be correct, authentic and consistent Data Integrity: Data should always be correct, authentic and consistent Data Availability: User should always be able to access data whenever needed Data Availability: User should always be able to access data whenever needed Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
5
5 Data Privacy Issues in Todays DWs (Our Focus) Masking solutions are not considered an acceptable solution Masking solutions are not considered an acceptable solution Encryption techniques introduce too much overheads Encryption techniques introduce too much overheads Storage Space Storage Space Data Loading Time Data Loading Time Query Response Time Query Response Time Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
6
6 Data Privacy Issues in Todays DWs (Our Focus) Important feature: Facts in DWs are mainly numerical-based columns! Important feature: Facts in DWs are mainly numerical-based columns! Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
7
7 MOBAT – MOd BAsed data masking Technique for DWs Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011 MOBAT System Architecture
8
8 MOBAT – MOd BAsed data masking Technique for DWs Suppose table T => set of N numerical columns C i = {C 1, C 2, C 3, …, C N ) to mask; total set of M rows R j = {R 1, R 2, R 3, …, R M ). Each value to mask in the table identified as a pair (R j, C i ) R j and C i respectively represent the row and column to which the value refers Each new masked value (R j, C i ) is obtained by applying the following formula (1) for row j and column i of table T: (R j, C i ) = (R j, C i ) – ((K 3, j MOD K 1 ) MOD K 2, i ) + K 2, i The inverse formula (2) for retrieving the original value is: (R j, C i ) = (R j, C i ) + ((K 3, j MOD K 1 ) MOD K 2, i ) – K 2, i Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
9
9 MOBAT – Example Dataset Supposing K 1 = 7432, K 2,1 = 34 and K 2,2 = 17252 Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
10
10 MOBAT – Example Dataset Supposing K 1 = 9264, K 2,1 = 12 and K 2,2 = 78254 Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
11
11 MOBAT – Querying Using TPC-H benchmark with four numerical fact columns (i = 4) (L_Quantity, L_ExtendedPrice, L_Tax and L_Discount) masked by MOBAT New column L_KeyK3 for the j rows of the LineItem table, as the K 3, j key K 1 =9342 K 2, L_Quantity =12 K 2, L_ExtendedPrice =51234 K 2, L_Tax =6 K 2, L_Discount =4 SELECT SUM(L_ExtendedPrice * L_Discount) AS Total_Revenue FROM LineItem WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND L_Discount BETWEEN 0.05 AND 0.07 AND L_Quantity<24 SELECT SUM((L_ExtendedPrice+MOD(MOD(L_KeyK3,9342),51234)-51234) * (L_Discount+MOD(MOD(L_KeyK3,9342),4)-4)) AS Total_Revenue FROM LineItem WHERE L_ShipDate>=TO_DATE('1994-01-01','YYYY-MM-DD') AND L_ShipDate<TO_DATE('1995-01-01','YYYY-MM-DD') AND (L_Discount+MOD(MOD(L_KeyK3,9342),4)-4) BETWEEN 0.05 AND 0.07 AND (L_Quantity+MOD(MOD(L_KeyK3,9342),12)-12)<24 Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
12
12 MOBAT – Optimizing Features & Performance The inclusion of K3,j requires additional storage space The inclusion of K3,j requires additional storage space K 3,j can be created in several ways, all with different impact in performance: K 3,j can be created in several ways, all with different impact in performance: Simply adding a new column to the previous existing fact table Simply adding a new column to the previous existing fact table Recreating the fact table including K 3,j from the start Recreating the fact table including K 3,j from the start Using a 128-bit integer column already existing in the fact table (typically can be the primary key column) Using a 128-bit integer column already existing in the fact table (typically can be the primary key column) Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
13
13 Experimental Evaluation Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011 2.8GHz CPU, 2GB RAM (512MB for Oracle SGA), 1.5TB SATA HD 2.8GHz CPU, 2GB RAM (512MB for Oracle SGA), 1.5TB SATA HD Oracle 11g DBMS Oracle 11g DBMS One standard benchmark and one real-world DW One standard benchmark and one real-world DW TPC-H Decision Support Benchmark with 1GB and 10GB scale TPC-H Decision Support Benchmark with 1GB and 10GB scale Real-world Sales DW (2GB storage size) Real-world Sales DW (2GB storage size)
14
14 Experimental Evaluation Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
15
15 Experimental Evaluation Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
16
16 Experimental Evaluation Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
17
17 Conclusions Our technique decreases data storage space and processing overheads, while still proving a significant level of security Our technique decreases data storage space and processing overheads, while still proving a significant level of security Transparent method with minimal network bandwidth consumption overheads, due to only rewriting queries Transparent method with minimal network bandwidth consumption overheads, due to only rewriting queries Extremely easy and simple to implement in any DBMS / DW, with low costs Extremely easy and simple to implement in any DBMS / DW, with low costs Querying the database directly will produce only realistic results (stored data is masked at all times) Querying the database directly will produce only realistic results (stored data is masked at all times) Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
18
18 Future Work Developing the technique for also masking alphanumeric values Developing the technique for also masking alphanumeric values Assess its security strength in comparison with other solutions Assess its security strength in comparison with other solutions Developing the technique for increasing its security strength Developing the technique for increasing its security strength Using higher-sized keys Using higher-sized keys Enabling data integrity checks Enabling data integrity checks Implementing false data injection Implementing false data injection Agenda Background Motivation MOBAT Optimizing Features Experimental Results Conclusions & Future Work Conclusions & Future Work Ricardo J. Santos – A Data Masking Technique for Data Warehouses – IDEAS 2011 – ISEL, Lisbon – September/2011
19
19 THANK YOU! Questions and Comments? Ricardo Jorge Santos lionsoftware.ricardo@gmail.com ISEL, Lisbon – September/2011 INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM A Data Masking Technique for Data Warehouses
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.