TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR.

Slides:



Advertisements
Similar presentations
Manage Scientific Metadata Using XML Yang, R., M. Kafatos and X. Wang, Managing Scientific Metadata Using XML, IEEE Internet Computing, Volume: 6, Issue:
Advertisements

XML DOCUMENTS AND DATABASES
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Introduction to Database Management  Department of Computer Science Northern Illinois University January 2001.
0 General information Rate of acceptance 37% Papers from 15 Countries and 5 Geographical Areas –North America 5 –South America 2 –Europe 20 –Asia 2 –Australia.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Managing Data Resources
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Distributed Database Management Systems
Summary. Chapter 9 – Triggers Integrity constraints Enforcing IC with different techniques –Keys –Foreign keys –Attribute-based constraints –Schema-based.
Chapter What is a Database? Collection of Dynamic Data –Large –Persistent –Integrated With Some Operations –to Maintain the Data –to Retrieve the.
Chapter 12 Distributed Database Management Systems
Automatic Data Ramon Lawrence University of Manitoba
Lecture Nine Database Planning, Design, and Administration
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Module 17 Storing XML Data in SQL Server® 2008 R2.
Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.
● Problem statement ● Proposed solution ● Proposed product ● Product Features ● Web Service ● Delegation ● Revocation ● Report Generation ● XACML 3.0.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Systems analysis and design, 6th edition Dennis, wixom, and roth
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
Database System Concepts and Architecture
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Master Thesis Defense Jan Fiedler 04/17/98
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Announcements. Data Management Chapter 12 Traditional File Approach  Structure Field  Record  File  Fixed All records have common fields, and a field.
S.Sathya M.Victor Jose Department of Computer Science and Engineer Noorul Islam Centre for Higher Education Kumaracoil,Tamilnadu,IndiaPROCEEDINGS OF ICETECT.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Databases and DBMSs Todd S. Bacastow January 2005.
Big Data Enterprise Patterns
Chapter 2: Database System Concepts and Architecture - Outline
Cloud based linked data platform for Structural Engineering Experiment
An Open Source Project Commonly Used for Processing Big Data Sets
Chapter 2 Database System Concepts and Architecture
Open Source distributed document DB for an enterprise
Chapter 13 The Data Warehouse
CHAPTER 3 Architectures for Distributed Systems
Data Warehouse.
Database System Concepts and Architecture
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Basic Concepts in Data Management
Data, Databases, and DBMSs
MANAGING DATA RESOURCES
Chapter 1 Database Systems
MANAGING DATA RESOURCES
Data Warehouse.
Chapter 1 Database Systems
Introduction of Week 14 Return assignment 12-1
Database System Concepts and Architecture
Course Instructor: Supriya Gupta Asstt. Prof
Distributed Databases
Presentation transcript:

TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR

PROBLEM STATEMENT EXPLORING THE RESEARCH SCOPE FOR IMPROVING THE PERFORMANCE OF THE DISTRIBUTED QUERY PROCESS FOR XML DATABASE. THE RESEARCH PAPER DESCRIBES: THE ISSUES AND CONSIDERATIONS FOR DISTRIBUTED XML QUERY PROCESSING. EXPLORING CLASSICAL QUERY OPTIMIZATION TECHNIQUES PRESENTING SIMILAR RESEARCH WORK DONE BY OTHERS. ANALYZED THE RESEARCH SCOPE AND DIRECTIONS.

DISTRIBUTED XML DATABASE XML FILES ARE IDEAL FOR DESCRIBING SEMI STRUCTURED DATA. WITH THE INCREASE AMOUNT OF DATA, THE XML DATABASES ARE EXPANDED [1][1] STORAGE OF A LARGE NUMBER OF XML FILES PRESERVING THE HIERARCHICAL FORMAT. DATA IS DISTRIBUTED OR FRAGMENTED IN DIFFERENT LOCATIONS, CAN BE EVEN DIFFERENT GEOGRAPHIC LOCATION. DATA INTEGRATION IS NEEDED WHEN PROCESSING A QUERY ON DISTRIBUTED DATABASE [2].[2]

WHY DISTRIBUTED XML DATABASE IS NEEDED [6][6] LOWER COSTS INCREASED SCALABILITY INCREASED AVAILABILITY DISTRIBUTION OF SOFTWARE MODULES NEW APPLICATIONS BASED ON DISTRIBUTION MARKET FORCES

XML DATABASE AND QUERY PROCESSING XML DDL – DTD XML SCHEMA - XSD XML DML XML QUERY LANGUAGES (EXAMPLE XQUERY) ATTRIBUTES OF XML DATABASE: MULTIPLE LEVELS OF VALIDITY ENTITIES AND URI TRANSFORMATIONS

DISTRIBUTED XML QUERY PROCESSING CONSIDERATIONS [7][7] ARCHITECTURE OF DISTRIBUTED QUERY PROCESSING SYSTEMS CENTRALIZED VS. DISTRIBUTED PROCESSING OF DISTRIBUTED QUERY STATIC VS. DYNAMIC QUERY PROCESSING DATA VS. QUERY SHIPPING

DISTRIBUTED XML QUERY PROCESSING ISSUES [7][7] DIFFERENT QUERY PROCESSING CAPABILITIES OF THE DATA SOURCES UNAVAILABILITY OF STATISTICAL INFORMATION ON THE DATA SOURCES UNRELIABLE RESPONSE TIMES DATA REDUNDANCY TIME TO LAST VS. TIME TO FIRST ELEMENT

POPULAR PERFORMANCE IMPROVEMENT TECHNIQUE FOR DISTRIBUTED XML QUERY [6][6] SELECTIVITY: FACILITATE QUERY PLANNER WITH ABILITY OF SELECTIVITY ESTIMATION SELECTION PUSHDOWN: PERFORM SELECTIONS AS SOON AS POSSIBLE IN THE QUERY TREE INCREMENTAL UPDATES: THE MATERIALIZED VIEW IS UPDATED TO REFLECT THE CHANGES VIEW QUERYING: QUERIES CAN BENEFIT FROM EXPLOITING EXISTING MATERIALIZED VIEWS QUERY CONTAINMENT: FIND THE COMMON SUB-QUERIES AND EXECUTE THOSE JUST ONCE

APPROACHES TAKEN BY OTHERS AN OPTIMIZING QUERY PROCESSING WITH AN EFFECTIVE CACHING MECHANISM FOR DISTRIBUTED DATABASE [5][5] EFFICIENTLY PROCESSING XML QUERIES OVER FRAGMENTED REPOSITORIES WITH PARTIX [8][8] A METHODOLOGY FOR QUERY PROCESSING OVER DISTRIBUTED XML DATABASES [4][4] SCALABLE AND DISTRIBUTED PROCESSING OF SCIENTIFIC XML DATA [3][3]

AN OPTIMIZING QUERY PROCESSING WITH AN EFFECTIVE CACHING MECHANISM FOR DISTRIBUTED DATABASE [5][5] DATABASE OPTIMIZATION FRAMEWORK HAS BEEN DESCRIBED. THE SQL STATEMENT CONTAINS ELEMENTS WHICH IS ACCEPTED BY AN XML ORIENTED COMMON DATA. A HISTORICAL DATABASE AND QUERY BASED CACHE REPLACEMENT HAS BEEN USED. AN XML DATABASE SYSTEM IS SUITABLE FOR THE IMPLEMENTATION OF DATA ANALYSIS APPLICATION. A COMMON OPTIMIZATION QUERY PROCESSING MODEL IS ALSO USED.

EFFICIENTLY PROCESSING XML QUERIES OVER FRAGMENTED REPOSITORIES WITH PARTIX [8][8] THE DATA VOLUME OF XML REPOSITORIES AND THE RESPONSE TIME OF QUERY PROCESSING HAVE BECOME AS CRITICAL ISSUES. THE TRADITIONAL FRAGMENTATION DEFINITIONS DON NOT DIRECTLY USE FOR XML DOCUMENTS. HIGH PERFORMANCE OF XML DATA SERVERS IS FOCUSED. PATRIX IS USED FOR EXPERIMENT.

A METHODOLOGY FOR QUERY PROCESSING OVER DISTRIBUTED XML DATABASES [4][4] THE METHODOLOGY FOR XQUERY QUERY PROCESSING OVER DISTRIBUTED XML DATABASES. THE TECHNIQUE CAN BE USED IN AN XML DATABASE WHICH ALLOWS FRAGMENTATION AND HOMOGENEOUS XML DATABASES. AN ARCHITECTURE BASED MEDIATOR WITH ADAPTORS ATTACHED TO REMOTE DATABASES IS PROPOSED. THREE TYPES OF FRAGMENTATION SUCH AS HORIZONTAL, VERTICAL AND HYBRID WERE USED FOR SEVERAL EXPERIMENTS.

SCALABLE AND DISTRIBUTED PROCESSING OF SCIENTIFIC XML DATA [3][3] THE BIG DATA TECHNIQUE IN XML METADATA INDEXING FOR DISTRIBUTED XML DATABASE. THE MAPREDUCE PROCESSING IS INCORPORATED. THE DATASET PROCESSING IS A CRITICAL TO ENSURE EFFECTIVE USE. AN AUTOMATED PROCESS CAN BE HELPFUL. THIS PAPER TESTED THE PERFORMANCE RESULTS USING TWO MAPREDUCE IMPLEMENTATIONS, APACHE HADOOP AND LEMO-MR.

RESEARCH SCOPE IN DISTRIBUTED XML QUERY PROCESSING PERFORMANCE STRUCTURED-NESS – HOW TO DETERMINE THE STRUCTURE AND THE INDEXES. SCHEMA HETEROGENEITY – HOW TO INTEGRATE HETEROGENEOUS SCHEMA. RELATION DEFINITION – HOW TO DEFINE RELATIONS AND COMPARISON BETWEEN XML ELEMENTS DATA SOURCE PROCESSING POWER - HOW TO DO DISTRIBUTED QUERY PROCESSING PLANNING ANSWER QUALITY – HOW TO PRODUCE AND VERIFY THE BEST RESULT. ANSWERING SPEED – HOW TO KEEP DB STATISTICS AND IMPROVE OPERATIONS. DATA SOURCE AND USER QUANTITY – PARALLEL QUERY PROCESSING ALGORITHM.

CONCLUSION XML IS A HIGHLY ACCEPTABLE FORMAT TO STORE DATA AND IS WIDELY USED WITH THE LARGE AMOUNT OF DATA PRODUCED FROM DIFFERENT LOCATION, A DISTRIBUTED XML DATABASE IS OFTEN USED. IT IS IMPORTANT TO MAINTAIN A REASONABLE PERFORMANCE FOR QUERY PROCESSING IN DISTRIBUTED DATABASE. THE GOAL OF THE PAPER IS TO, IDENTIFY THE RESEARCH SCOPE FOR DISTRIBUTED XML QUERY PROCESSING PERFORMANCE IMPROVEMENT.

REFERENCES 1. G. FIGUEIREDO, V. BRAGANHOLO, M. MATTOSO.PROCESSING, "PROCESSING QUERIES OVER DISTRIBUTED XML DATABASES." JOURNAL OF INFORMATION AND DATA MANAGEMENT,1(3): , OCTOBER A. M. KULKARNI, J. THIRUNAVUKKARASU, P. S. PILLAI, S. S. SULEGAI, S. RAO "INSERTION AND QUERYING MECHANISM FOR A DISTRIBUTED XML DATABASE SYSTEM" IN: PROCEEDINGS OF THE 5TH ACM COMPUTE 3. E. DEDE, Z. FADIKA, C. GUPTA, M. GOVINDARAJU, "SCALABLE AND DISTRIBUTED PROCESSING OF SCIENTIFIC XML DATA", TH IEEE/ACM INTERNATIONAL CONFERENCE ON GRID COMPUTING (GRID), VOL., NO., 4. G. FIGUEIREDO1, V. BRAGANHOLO2, M. MATTOSO1, "A METHODOLOGY FOR QUERY PROCESSING OVER DISTRIBUTED XML DATABASES" PROGRAMA DE ENGENHARIA DE SISTEMAS E COMPUTAR IM/UFRJ, BRAZIL 5. S. PRABHA, A.KANNAN, P.A. KUMAR, "AN OPTIMIZING QUERY PROCESSING WITH AN EFFECTIVE CACHING MECHANISM FOR DISTRIBUTED DATABASE" 6. DONALD KOSSMANN, "THE STATE OF THE ART IN DISTRIBUTED QUERY PROCESSING," ACM COMPUTING SURVEYS, VOL. 32, NO. 4, 2000, PP M. SMILJANIĆ, H. BLANKEN, M V. KEULEN, W. JONKER, "DISTRIBUTED XML DATABASE SYSTEMS" 8. R. ANDRADE, G. RUBERG, A. BAI˜AO, V. BRAGANHOLO, AND M. MATTOSO. PARTIX: PROCESSING XQUERY QUERIES OVER FRAGMENTED XML REPOSITORIES. TECHNICAL REPORT ES-691, DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING - COPPE/FEDERAL UNIVERSITY OF RIO DE JANEIRO, BRAZIL, DEPARTMENT OF APPLIED INFORMATICS - UNIRIO, BRAZIL, DEC J. SMITH AND P. WATSON. FAULT-TOLERANCE IN DISTRIBUTED QUERY PROCESSING. IN 9TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATION SYMPOSIUM, IDEAS 2005., PAGES 329 – 338, JULY 2005.