Compressed Accessibility Map: Efficient Access Control for XML Ting Yu : University of Illinois Divesh Srivastava : AT&T Labs Laks V.S. Lakshmanan : University.

Slides:



Advertisements
Similar presentations
A View Based Security Framework for XML Wenfei Fan, Irini Fundulaki, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis University of Edinburgh Digital.
Advertisements

Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj.
BiG-Align: Fast Bipartite Graph Alignment
Fast Algorithms For Hierarchical Range Histogram Constructions
1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Schema Summarization cong Yu Department of EECS University of Michigan H. V. Jagadish Department of EECS University of Michigan
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
Basic Data Mining Techniques
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
Chapter 3: Data Storage and Access Methods
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Overview of Search Engines
Advanced Topics in Algorithms and Data Structures 1 An example.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
 Zhichun Li  The Robust and Secure Systems group at NEC Research Labs  Northwestern University  Tsinghua University 2.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Dept. Computer Science, Korea Univ. Intelligent Information System Lab. XML clustering methods Sohn Jong-Soo Intelligent Information.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
Introduction to Databases A line manager asks, “If data unorganized is like matter unorganized and God created the heavens and earth in six days, how come.
A TREE BASED ALGEBRA FRAMEWORK FOR XML DATA SYSTEMS
Querying Structured Text in an XML Database By Xuemei Luo.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
ReiserFS Hans Reiser
Chapter 4c, Database H Definition H Structure H Parts H Types.
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
FILES AND DATABASES. A FILE is a collection of records with similar characteristics, e.g: A Sales Ledger Stock Records A Price List Customer Records Files.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.
XML Access Control Koukis Dimitris Padeleris Pashalis.
CZ5225: Modeling and Simulation in Biology Lecture 3: Clustering Analysis for Microarray Data I Prof. Chen Yu Zong Tel:
Monitoring k-NN Queries over Moving Objects Xiaohui Yu University of Toronto Joint work with Ken Pu and Nick Koudas.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Session 1 Module 1: Introduction to Data Integrity
SF-Tree: An Efficient and Flexible Structure for Estimating Selectivity of Simple Path Expressions with Accuracy Guarantee Ho Wai Shing.
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
1 Storing and Maintaining Semistructured Data Efficiently in an Object- Relational Database Mo Yuanying and Ling Tok Wang.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
RE-Tree: An Efficient Index Structure for Regular Expressions
Integrating XML Data Sources Using Approximate Joins
9/22/2018.
Database management concepts
Indexing and Hashing Basic Concepts Ordered Indices
Huffman Coding CSE 373 Data Structures.
Database management concepts
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Structure and Content Scoring for XML
Instructor Materials Chapter 5: Ensuring Integrity
Course Instructor: Supriya Gupta Asstt. Prof
Relax and Adapt: Computing Top-k Matches to XPath Queries
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Compressed Accessibility Map: Efficient Access Control for XML Ting Yu : University of Illinois Divesh Srivastava : AT&T Labs Laks V.S. Lakshmanan : University of British Columbia H.V. Jagadish : University of Michigan

Information Sharing in business over the Internet XML as a standard information exchange/sharing format Direct access to XML documents  Offer advantages in terms of cost, accuracy and timeliness Security is crucial  Nature of selective access in this context is complex

Access Control for XML Fine-grained access control  Business relationship is sophisticated  Constraints on tag/attribute level instead of only on document level  Complex access control rules Efficient evaluation of data’s accessibility is desired  Focus of this talk

An Example XML Document with Access Control Info. … The purpose of … Access Control … … ….. *based on examples in [Damiani et al. 2000]

Two Potential Approaches Approach 1: use access control rules directly  Pros: Flexible  Cons: Time-inefficient Approach 2: fully materialized accessibility map (access control list)  Pros: Time-efficient  Cons: Space-inefficient

Our Approach Compressed Accessibility Map (CAM)  Take advantage of structural locality of accessibility  Index accessibility information in a compressed way  Both time-efficient and space-efficient

Structural Locality of Accessibility Data items grouped together have similar accessibility properties Common in hierarchically-structured data like XML [Bertino et al. 1999][Damiani et al. 2000]  Declarative authorization rules based on hierarchical structures  Accessibility propagation and overriding

Compressed Accessibility Map (CAM) Essentially an accessibility index Maintain a CAM for each user and access type Identify “crucial” data items and store extra accessibility information on them Other data items’ accessibility can be inferred efficiently

Identify Crucial Data Items A BG CD EF H IJ Accessible node Inaccessible node A B (d+,s+) (d-,s+)

Ancestor Accessibility and Unit Regions If a node is accessible, so are its ancestors A unit region is a maximal subgraph of an XML database such that ancestor accessibility holds Easy to partition an XML database into unit regions

Unit Region Partition A C EFIJ Accessible node Inaccessible node B D G H

CAM for Unit Regions Allowed labels in unit region cam  (d+,s+), (d-,s+) and (d-,s-) Inference rules  Label on a node is most specific, thus overrides other inferences  Ancestor accessibility overrides descendents’ inference  Nearest labeled ancestor overrides other labeled ancestors

J I A DKL CEFM GH B Valid CAM A DL F Accessible node Inaccessible node KB CE GH IM A D IF B E GHLK M (d-,s+) (d+,s+) J C J Accessibility Unknown

CAM Lookup Algorithm Given a node e, look up CAM  If e is labeled, check the sign of self label s  If e has labeled descendents, e is accessible  Get e’s nearest labeled ancestor f. e’s accessibility is determined by the sign of f’s label d. Complexity: proportion to the product of the depth of e in the XML tree and log of the size of CAM.

Optimal Unit Region CAM CAM with minimum size  Space-efficient  Also reduce lookup time Build optimal CAM  Assign labels to each data node in a bottom- up way  Remove redundant labels

Redundant Labels: Induced labels Labels that are the same as what is inferred from its ancestors’ labels A B C DE (d+,s+) (d-,s+) (d-,s-) redundant Accessible node Inaccessible node

Redundant Labels: Upward Redundant Labels labels that can be inferred from its descendents’ labels A B (d-,s+) (d+,s+) C E DF (d-,s+) Accessible node Inaccessible node redundant

Build Optimal CAM Assign labels in a bottom-up way  Accessible leaf (d+,s+), inaccessible leaf (d-,s-)  Internal nodes’ labels is assigned according to children’s labels Remove redundant labels  First remove induced labels  Then remove upward redundant labels

Build Optimal CAM Accessible node Inaccessible node A DL F KB CE GH IM J (d?,s+) (d+,s+) (d-,s+)(d+,s+) (d-,s-) (d-,s+) (d-,s-) (d+,s+) (d-,s-) (d-,s+)

CAM for Multi Unit Regions Only need to mark out those nodes (marker nodes) that start a unit region  Build optimal CAM for each unit region  Combine CAM for each unit regions Lookup algorithm is almost the same, but need to take marker nodes into consideration.  complexity remains the same

Further Compression in CAM for Multiple Unit Regions A C EFIJ B D G H HH (d+,s+)

Experimental Verification Metric – compression ratio  Size of CAM / fully materialized accessibility map Synthetic data set  Generated by IBM XML generator  Study accessibility locality’s impact on compression ratio of CAM Real data set  Large file systems with real access control data

Impact of Accessibility Locality Compression ratio when accessible nodes are uniformly distributed in the XML tree

Impact of Accessibility Locality Compression ratio when accessibility locality is high

Conclusion Compressed accessibility map as an efficient way to evaluate access control data for XML documents  Time-efficient and space-efficient Future work  Better support for incremental CAM updates  Take advantage of commonalities of users’ access rights and globally optimize CAM