Managing XML and Semistructured Data Lecture 19: Compressing XML Data Prof. Dan Suciu Spring 2001.

Slides:



Advertisements
Similar presentations
Database management system (DBMS)  a DBMS allows users and other software to store and retrieve data in a structured way  controls the organization,
Advertisements

CSE544 Database Statistics Tuesday, February 15 th, 2011 Dan Suciu , Winter
Managing XML and Semistructured Data Lecture 12: XML Schema Prof. Dan Suciu Spring 2001.
TREECHOP: A Tree- based Query-able Compressor for XML Gregory Leighton, Tomasz Müldner, James Diamond Acadia University June 6, 2005.
The Web of data with meaning... By Michael Griffiths.
Information Retrieval in Practice
BigBed/bigWig remote file access Hiram Clawson UCSC Center for Biomolecular Science & Engineering.
Data Management for XML: Research Directions By: Jennifer Widom Stanford University Reviewer: Kristin Streilein.
1 Database Research at the UW  Faculty: Alon Halevy and Dan Suciu. A dozen Ph.D students  Related faculty: Oren Etzioni, Pedro Domingos, Dan Weld and.
XML and Databases 198:541. XML Motivation  Huge amounts of unstructured data on the web: HTML documents  No structure information  Only format instructions.
Managing XML and Semistructured Data Lecture 1: Preliminaries and Overview Prof. Dan Suciu Spring 2001.
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
1 Compressing Query Results for Mobile Clients Zhiyuan Chen and Praveen Seshadri Cornell University.
1 Part 4: Compressing XML Data Managing XML and Semistructured Data.
Hippocratic Databases Paper by Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu CS 681 Presented by Xi Hua March 1st,Spring05.
XML Compression Aslam Tajwala Kalyan Chakravorty.
Overview of Search Engines
It refers to the software used to manage the database.
Database Systems Chapter 1 The Worlds of Database Systems.
The POSTGRES Next - Generation Database Management System Michael Stonebraker Greg Kemnitz Presented by: Nirav S. Sheth.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
Introduction to DBMS Purpose of Database Systems View of Data
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Dan SuciuTools for XML Data Exchange Dan Suciu AT&T Labs Joint work with Mary Fernandez.
Lecture 8: XML Compression COMP Semistructured Data / XML zSemistructured => yloosely structured (no restrictions on tags & nesting relationships)
Database Architecture Introduction to Databases. The Nature of Data Un-structured Semi-structured Structured.
A In-Memory Compressed XML Representation of Astronomical Data PPARC UK e-Science Postgraduate School ’05 O’Neil Delpratt – PhD Student University of Leicester.
Data Representation and Storage Lecture 5. Representations A number value can be represented in many ways: 5 Five V IIIII Cinq Hold up my hand.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
Multimedia Specification Design and Production 2012 / Semester 1 / L3 Lecturer: Dr. Nikos Gazepidis
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XML.
©Silberschatz, Korth and Sudarshan1.1Database System Concepts Chapter 1: Introduction Purpose of Database Systems View of Data Data Models Data Definition.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
1 XML eXtensible Markup Language. 2 XML vs. HTML HTML is a HyperText Markup language HTML is a HyperText Markup language Designed for a specific application,
XML Engr. Faisal ur Rehman CE-105T Spring Definition XML-EXTENSIBLE MARKUP LANGUAGE: provides a format for describing data. Facilitates the Precise.
1 Indexing The syntax for creating a index is: CREATE [UNIQUE] INDEX index_name ON table_name (column1, column2,... column_n) [ COMPUTE STATISTICS ]; Why.
Intro About Web. Web Definitions Web means the following: –HTTP (or HTTPS) protocol; HTTP server is called Web-server, HTTP clients are e.g. browsers.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
1 Information Retrieval LECTURE 1 : Introduction.
Web Server Design Week 7 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/24/10.
Comp 335 File Structures Data Compression. Why Study Data Compression? Conserves storage space Files can be transmitted faster because there are less.
Chapter 13.3: Databases Invitation to Computer Science, Java Version, Second Edition.
How Web Database Architectures Work CPS181s April 8, 2003.
RDF storages and indexes Maciej Janik September 1, 2005 Enterprise Integration – Semantic Web.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Jacob (Jack) Gryn - Presented November 28, Semi-Structured Data and XML.
RDF languages and storages part 2 - indexing semi-structure data Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Performance of Networked XML-Driven Cooperative Applications S. Ghandeharizadeh, C. Papadopoulos M. Cai, K. Chintalapudi, Parakshit Pol, S. Song, R. Schmidt,
General Architecture of Retrieval Systems 1Adrienn Skrop.
Spring Staff Lecturer: Prof. Sara Cohen Graders: Igor Lifshits, Arbel Moshe 2.
1 XML eXtensible Markup Language. 2 Introduction and Motivation Dr. Praveen Madiraju Modified from Dr.Sagiv’s slides.
Information Retrieval in Practice
Internet/Web Databases
Introduction to DBMS Purpose of Database Systems View of Data
Compressing XML Documents with Finite State Automata
Improving searches through community clustering of information
Lecture 9: XML Compression
RE-Tree: An Efficient Index Structure for Regular Expressions
Databases.
Lecture 16: Probabilistic Databases
Database Systems Instructor Name: Lecture-3.
Introduction to DBMS Purpose of Database Systems View of Data
Dr. Bhavani Thuraisingham The University of Texas at Dallas
ບົດທີ 6 ການຄຸ້ມຄອງຊັບພະຍາກອນຂໍ້ມູນ (Managing Data Resource)
Query Optimization.
Information Retrieval and Web Design
Presentation transcript:

Managing XML and Semistructured Data Lecture 19: Compressing XML Data Prof. Dan Suciu Spring 2001

In this lecture XML Compression –Motivation –XMill approach and results Resources XMILL: An Efficient Compressor for XML Data by Liefke and Suciu, in SIGMOD'2001XMILL: An Efficient Compressor for XML Data

Compression: The Problem XML for exchange (space or time) but XML is verbose users prefer application specific formats: –Web Server Logs –EMBL –G2 is XML doomed to fail ?

An Example:Web Server Logs |GET / HTTP/1.0|text/html|200|1997/10/01-00:00:02|-|4478|-|-| GET / HTTP/1.0 text/html /10/01-00:00: Mozilla/3.1$[$ja$]$(I) GET / HTTP/1.0 text/html /10/01-00:00: Mozilla/3.1$[$ja$]$(I) ASCII File 15.9 MB (gzipped 1.6MB): XML-ized inflates to 24.2 MB (gzipped 2.1MB):

XMill specialized compressor for XML data makes XML look “small” Download: –Now: –Soon:

How Xmill Works: Three Ideas GET / HTTP/1.0 text/html 200 … GET / HTTP/1.0 text/html 200 … gzip Structuregzip Data =1.75MB + Compress the structure separately from the data:

How Xmill Works: Three Ideas … … gzip Structuregzip Data1 =1.33MB + GET / HTTP/1.0 GET / HTTP/1.1 … GET / HTTP/1.0 GET / HTTP/1.1 … gzip Data2 + Group the data values according to their types:

How Xmill Works: Three Ideas gzip Structure + gzip c1(Data1) + gzip c2(Data2) +... =0.82MB Apply semantic (specialized) compressors: Examples: 8, 16, 32-bit integer encoding (signed/unsigned) differential compressing (e.g. 1999, 1995, 2001, 2000, 1995,...) compress lists, records (e.g  4 bytes) Need user input to select the semantic compressor

XML Compression

Compression Tradeoff

Summary of XML Data Management XML = –old data type (trees) –with new interpretation (data) We discussed traditional management techniques for XML: –Data model –Query language –Optimizations –... Many traditional problems still unsolved (storage, processing, optimization,...)

Summary of XML Data Management More interesting question: –what are the novel applications enabled by XML ? Some ideas: Approximate queries over unfamiliar data instances –“Search the database for a pattern similar to this one” –Rank results based on their similarity to the pattern –What is an appropriate query language for that ? Linking independent databases –We have Xlink, how do we use it ?