TEMPLATE DESIGN © 2008 www.PosterPresentations.com Predicate-Tree based Pretty Good Protection of Data William Perrizo, Arjun G. Roy Department of Computer.

Slides:

Advertisements

Similar presentations

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Advertisements

Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.

With PGP-D, to get pTree info, you need: the ordering (the mapping of bit position to table row) the predicate (e.g., table column id and bit slice or.

Edith C. H. Ngai1, Jiangchuan Liu2, and Michael R. Lyu1

Aki Hecht Seminar in Databases (236826) January 2009

Distributed Database Management Systems

An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.

Real Time Image Feature Vector Generator Employing Functional Cache Memory for Edge Takuki Nakagawa, Department of Electronic Engineering The University.

Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.

Chapter 8.  Cryptography is the science of keeping information secure in terms of confidentiality and integrity.  Cryptography is also referred to as.

Artificial Neural Network Applications on Remotely Sensed Imagery Kaushik Das, Qin Ding, William Perrizo North Dakota State University

Data Mining on Streams  We should use runlists for stream data mining (unless there is some spatial structure to the data, of course, then we need to.

Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.

A Secure Protocol for Computing Dot-products in Clustered and Distributed Environments Ioannis Ioannidis, Ananth Grama and Mikhail Atallah Purdue University.

Vertical Set Square Distance: A Fast and Scalable Technique to Compute Total Variation in Large Datasets Taufik Abidin, Amal Perera, Masum Serazi, William.

Database Design – Lecture 16

Packet Classification Using Multi-Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: COMPSACW, 2013 IEEE 37th Annual (Computer.

Secure Incremental Maintenance of Distributed Association Rules.

Clustering Analysis of Spatial Data Using Peano Count Trees Qiang Ding William Perrizo Department of Computer Science North Dakota State University, USA.

Bit Sequential (bSQ) Data Model and Peano Count Trees (P-trees) Department of Computer Science North Dakota State University, USA (the bSQ and P-tree technology.

Partitioning – A Uniform Model for Data Mining Anne Denton, Qin Ding, William Jockheck, Qiang Ding and William Perrizo.

Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.

© Pearson Education Limited, Chapter 15 Physical Database Design – Step 7 (Consider Introduction of Controlled Redundancy) Transparencies.

Ptree * -based Approach to Mining Gene Expression Data Fei Pan 1, Xin Hu 2, William Perrizo 1 1. Dept. Computer Science, 2. Dept. Pharmaceutical Science,

RDF: A Density-based Outlier Detection Method Using Vertical Data Representation Dongmei Ren, Baoying Wang, William Perrizo North Dakota State University,

EQC16: An Optimized Packet Classification Algorithm For Large Rule-Sets Author: Uday Trivedi, Mohan Lal Jangir Publisher: 2014 International Conference.

Major Disciplines in Computer Science Ken Nguyen Department of Information Technology Clayton State University.

Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North.

Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.

Efficient OLAP Operations for Spatial Data Using P-Trees Baoying Wang, Fei Pan, Dongmei Ren, Yue Cui, Qiang Ding William Perrizo North Dakota State University.

Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.

Sept. 5, 2012 Unit 1 UEQ.: Why is it important to know the factors of numbers? Ans.: Skip skip Concept #1 – Arrays and Factor Pairs LEQ: How do I use arrays.

A Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.

Our Approach  Vertical, horizontally horizontal data vertically)  Vertical, compressed data structures, variously called either Predicate-trees or Peano-trees.

Aggregate Function Computation and Iceberg Querying in Vertical Databases Yue (Jenny) Cui Advisor: Dr. William Perrizo Master Thesis Oral Defense Department.

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

Accelerating Multilevel Secure Database Queries using P-Tree Technology Imad Rahal and Dr. William Perrizo Computer Science Department North Dakota State.

Knowledge Discovery in Protected Vertical Information Dr. William Perrizo University Distinguished Professor of Computer Science North Dakota State University,

Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.

Packet Classification Using Dynamically Generated Decision Trees

1 MEVAL: A Practically Efficient System for Secure Multi-party Statistical Analysis Koki Hamada NTT Secure Platform Laboratories.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.

Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.

Efficient Quantitative Frequent Pattern Mining Using Predicate Trees Baoying Wang, Fei Pan, Yue Cui William Perrizo North Dakota State University.

Vertical Set Square Distance Based Clustering without Prior Knowledge of K Amal Perera,Taufik Abidin, Masum Serazi, Dept. of CS, North Dakota State University.

P Left half of rt half ? false  Left half pure1? false  Whole is pure1? false  0 5. Rt half of right half? true  1.

Item-Based P-Tree Collaborative Filtering applied to the Netflix Data

Decision Tree Classification of Spatial Data Streams Using Peano Count Trees Qiang Ding Qin Ding * William Perrizo Department of Computer Science.

Concurrency Control Managing Hierarchies of Database Elements (18.6)

Query-by-Example (QBE)

Fast Kernel-Density-Based Classification and Clustering Using P-Trees

Efficient Ranking of Keyword Queries Using P-trees

Efficient Ranking of Keyword Queries Using P-trees

North Dakota State University Fargo, ND USA

Yue (Jenny) Cui and William Perrizo North Dakota State University

Efficient Image Classification on Vertically Decomposed Data

Vertical K Median Clustering

A Fast and Scalable Nearest Neighbor Based Classification

Vertical K Median Clustering

North Dakota State University Fargo, ND USA

CLUSTER BY: A NEW SQL EXTENSION FOR SPATIAL DATA AGGREGATION

The Multi-hop closure theorem for the Rolodex Model using pTrees

Vertical K Median Clustering

Privacy preserving cloud computing

North Dakota State University Fargo, ND USA

Compact DFA Structure for Multiple Regular Expressions Matching

The P-tree Structure and its Algebra Qin Ding Maleq Khan Amalendu Roy

Path Oram An Extremely Simple Oblivious RAM Protocol

Presentation transcript:

TEMPLATE DESIGN © Predicate-Tree based Pretty Good Protection of Data William Perrizo, Arjun G. Roy Department of Computer Science, North Dakota State University, Fargo, ND 58102, USA. Introduction Growth of Internet has caused vast amount of data generation Security of data is of prime concern Vertical Data processing has been in focus recently because of better performance than Horizontal Data processing in Data Mining Applications Predicate Trees patented by NDSU is an effective Data Mining Technology with use in Spatial Association Rule Mining, Text Clustering, etc. We propose a Predicate Tree based solution for Data security Predicate Trees Predicate Trees – Demonstration Vertical Slices R[A1] Predicate Tree - Operation Most frequently used operations in P-Trees are AND, OR and NOT Operations are carried on a level by level basis starting from root node We are able to compute following important operations vertically * Sum / Mean* Median / Rank-k * Square Sum / Variance* Maximum / Minimum * Top-k values* Mode * Addition, Subtraction, Multiplication of P-TreeSet by constant value * Comparison of P-TreeSets PGP-D Operation To retrieve P-Tree information, we require * ordering – mapping of bit position to table row * predicate – table column id and bit slice or bit map * location Key of the data is an array of two tuples storing location and the pad Typical key could be [{5, 54}, {7, 539}, {87, 3}, {209, 126}, {25, 896}, {888, 23}, …] We make all P-Trees of same length and pad in the front to hide the start position We also scramble location of the P-Trees For basic P-Trees, key K would reveal the offset and the pre-pad In the above example, first P-Tree is at offset 5, i.e. it has been shuffled 5 P-Tree slots and real information starts after 54 bits Since P-Trees are data-mining-ready data structures, we are not in favor of expensive encryption and decryption operation More the number of P-Trees, better the protection For a database with 5000 tables with 50 columns each and each column represented by 32 bits, we would have 8 million P-Trees In distributed database scenario, it would make sense to fully replicate the P-Trees to allow local retrievals A case could arise where hacker extracts first bit of every P-Tree and shuffle those bits until something means appears or starts to appear To get around this possibility, we store entire database as a “Big Bit String” and have it as a part of the key PGP-D Pretty Good Privacy of Data Mechanism to scramble P-Tree information without compromising on speed Scrambled data would be unrevealing of the raw data except for the person issuing data mining request Discussion References and Acknowledgement Predicate Trees or pTrees are:- * Compressed * Data-mining-ready * Vertical data structure Ding Q., Ding Q., Perrizo W.: PARM - An Efficient Algorithm to Mine Association Rules from Spatial Data. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(6) (2008) Rahal I., Perrizo W.: An optimized approach for KNN text categorization using P-trees. ACM Symposium on Applied Computing (2004) Perrizo W: Predicate Count Tree Technology. Technical Report NDSU-CSOR-TR (2001) Wang Y., Lu T., Perrizo W.: A Novel Combinatorial Score for Feature Selection with P-Tree in DNA Microarray Data Analysis. 19th International Conference on Software Engineering and Data Engineering (2010)