Download presentation
Presentation is loading. Please wait.
Published byJayson Arron Phillips Modified over 9 years ago
1
TEMPLATE DESIGN © 2008 www.PosterPresentations.com Predicate-Tree based Pretty Good Protection of Data William Perrizo, Arjun G. Roy Department of Computer Science, North Dakota State University, Fargo, ND 58102, USA. Introduction Growth of Internet has caused vast amount of data generation Security of data is of prime concern Vertical Data processing has been in focus recently because of better performance than Horizontal Data processing in Data Mining Applications Predicate Trees patented by NDSU is an effective Data Mining Technology with use in Spatial Association Rule Mining, Text Clustering, etc. We propose a Predicate Tree based solution for Data security Predicate Trees Predicate Trees – Demonstration Vertical Slices R[A1] Predicate Tree - Operation Most frequently used operations in P-Trees are AND, OR and NOT Operations are carried on a level by level basis starting from root node We are able to compute following important operations vertically * Sum / Mean* Median / Rank-k * Square Sum / Variance* Maximum / Minimum * Top-k values* Mode * Addition, Subtraction, Multiplication of P-TreeSet by constant value * Comparison of P-TreeSets PGP-D Operation To retrieve P-Tree information, we require * ordering – mapping of bit position to table row * predicate – table column id and bit slice or bit map * location Key of the data is an array of two tuples storing location and the pad Typical key could be [{5, 54}, {7, 539}, {87, 3}, {209, 126}, {25, 896}, {888, 23}, …] We make all P-Trees of same length and pad in the front to hide the start position We also scramble location of the P-Trees For basic P-Trees, key K would reveal the offset and the pre-pad In the above example, first P-Tree is at offset 5, i.e. it has been shuffled 5 P-Tree slots and real information starts after 54 bits Since P-Trees are data-mining-ready data structures, we are not in favor of expensive encryption and decryption operation More the number of P-Trees, better the protection For a database with 5000 tables with 50 columns each and each column represented by 32 bits, we would have 8 million P-Trees In distributed database scenario, it would make sense to fully replicate the P-Trees to allow local retrievals A case could arise where hacker extracts first bit of every P-Tree and shuffle those bits until something means appears or starts to appear To get around this possibility, we store entire database as a “Big Bit String” and have it as a part of the key PGP-D Pretty Good Privacy of Data Mechanism to scramble P-Tree information without compromising on speed Scrambled data would be unrevealing of the raw data except for the person issuing data mining request Discussion References and Acknowledgement Predicate Trees or pTrees are:- * Compressed * Data-mining-ready * Vertical data structure 2761 3760 2751 2757 5214 2215 7014 7014 010111110001 011111110000 010111101001 010111101111 101010001100 010 001101 111000001100 111000001100 Ding Q., Ding Q., Perrizo W.: PARM - An Efficient Algorithm to Mine Association Rules from Spatial Data. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(6) 1513-1524 (2008) Rahal I., Perrizo W.: An optimized approach for KNN text categorization using P-trees. ACM Symposium on Applied Computing 613-617 (2004) Perrizo W: Predicate Count Tree Technology. Technical Report NDSU-CSOR-TR- 01-1 (2001) Wang Y., Lu T., Perrizo W.: A Novel Combinatorial Score for Feature Selection with P-Tree in DNA Microarray Data Analysis. 19th International Conference on Software Engineering and Data Engineering 295-300 (2010)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.