EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) 931-9198 M.

Slides:



Advertisements
Similar presentations
Minnesota Department of Education Metadata Registry Case Study Date: October 31 st, 2008 Dan McCreary President Dan McCreary & Associates
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Open Office.Org What is the Open Office.org Source Project? Open source project through which Sun Microsystems is releasing the technology for the popular.
Native XML Database or RDBMS. Data or Document orientation If you are primarily storing documents, then a Native XML Database may be the best option.
XML: Extensible Markup Language
Senior Solutions Architect, MongoDB James Kerr Security Features Preview Field Level Access Control.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Using oXygen Using oXygen to build and execute XQuery applications on eXist Date: September 2008 Dan McCreary President Dan McCreary & Associates
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
XML Prashant Karmarkar Brendan Nolan Alexander Roda.
Mark Graves Leveraging Existing DBMS Storage for XML DBMS.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Enterprise Search. Search Architecture Configuring Crawl Processes Advanced Crawl Administration Configuring Query Processes Implementing People Search.
4/20/2017.
New “Collaborate” Button Integrate UI directly into the browser. Preferred target: Firefox Easiest browser to extend in terms of UI.
JSP Standard Tag Library
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
Enterprise Service Bus Lowering the cost of integration Date: 9/1/2009 Dan McCreary President Dan McCreary & Associates (952)
Sample Auto-generated XForms With XQuery Date: 1/17/2009 Dan McCreary President Dan McCreary & Associates (952) M D Metadata.
ITCS 6010 SALT. Speech Application Language Tags (SALT) Speech interface markup language Extension of HTML and other markup languages Adds speech and.
Styling XForms Using CSS to make your forms look great Date: 10/9/2008 Dan McCreary President Dan McCreary & Associates (952)
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
Processing of structured documents Spring 2002, Part 2 Helena Ahonen-Myka.
Metadata Management Case Study Date: 10/21/2008 Dan McCreary President Dan McCreary & Associates (952) M D Metadata Solutions.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Sébastien François, EPrints Lead Developer EPrints Developer Powwow, ULCC.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Freemarker ● Introduction ● Core features ● Java part example ● Template example ● Expressions ● Builtins ● Assigning value ● Conditions ● Loops ● Macros.
Recursive Functions Creating Hierarchical Reports Date: 9/30/2008 Dan McCreary President Dan McCreary & Associates (952) M.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
PI Data Archive Server COM Points Richard Beeson.
XQuery Functions Reusing XQuery Code Date: September, 2008 Dan McCreary President Dan McCreary & Associates (952) M D Metadata.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Auto-generation of Repeated Elements Part 2 of a series of XForms auto generation Date: 1/25/2008 Dan McCreary President Dan McCreary & Associates
Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
XRX Basic CRUDS Create, Read, Update and Delete and Search XML Data Date: May 2011 Dan McCreary President Dan McCreary & Associates
A university for the world real R © 2009, Chapter 9 The Runtime Environment Michael Adams.
Using oXygen 12 with XQuery Using oXygen to build and execute XQuery XQuery applications on eXist Date: April 2011 Dan McCreary President Dan McCreary.
XML and Database.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
XQuery Labs Basic Reporting Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M D Metadata Solutions.
ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.
Martin Kruliš by Martin Kruliš (v1.1)1.
XPath --XML Path Language Motivation of XPath Data Model and Data Types Node Types Location Steps Functions XPath 2.0 Additional Functionality and its.
Copyright © 2007, Oracle. All rights reserved. Using Document Management and Collaboration Appendix B.
Using DSDL plus annotations for Netconf (+) data modeling Rohan Mahy draft-mahy-canmod-dsdl-01.
MIX: A Meta-Data Indexing System for XML SungRan Cho, L3S Nick Koudas, University of Toronto Divesh Srivastava, AT&T Labs-Research.
#SummitNow Building a Quick Solution with Alfresco Workdesk 13. November 2013 Richard McKnight - Alfresco Christian Finzel - Alfresco.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
XML QUESTIONS AND ANSWERS
Dan McCreary President Dan McCreary & Associates (952) M D
XML in Web Technologies
Prepared for Md. Zakir Hossain Lecturer, CSE, DUET Prepared by Miton Chandra Datta
Sequences in XQuery Core data structure of XQuery Date: 8/25/2009
Information Retrieval and Web Design
XRX Diagrams Application Architecture Diagrams Date: Aug 21st, 2008
Presentation transcript:

eXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M D Metadata Solutions

M D Copyright 2008 Dan McCreary & Associates2 Overview Using eXist Indexes Types of indexes Configuring indexes Testing indexes

M D Copyright 2008 Dan McCreary & Associates3 Index Types Structural Indexes: These index the nodal structure, elements (tags) and attributes, of the documents in a collection. Range Indexes: Ideal for indexing measurements (integers, doubles, floats, currency or discrete value measurements). Full Text Indexes: These map specific text nodes and attributes of the documents in a collection to text tokens. NGram Indexes: These map specific text nodes and attributes of the documents in a collection to split tokens of n-characters (where n = 3 by default). Very efficient for exact substring searches and for queries on software program code which can not be easily split into whitespace separated tokens and are thus a bad match for the full text index. Spatial Indexes (Experimental): These map elements of the documents in a collection containing geo-referenced geometries to dedicated data structures that allow efficient spatial queries.

M D Copyright 2008 Dan McCreary & Associates4 Structural Indexes Keeps track of the elements (tags), attributes, and nodal structure for all XML documents in a collection It is created and maintained automatically in eXist Can not be reconfigured nor disabled by the user Used by all non-wildcard XPath and XQuery expressions in eXist (not “//*”) Stored in the database file elements.dbx

M D Copyright 2008 Dan McCreary & Associates5 How Do Structural Indexes Work? Maps every element and attribute qname (or qualified name) in a document collection to a list of pairs. This mapping is used by the query engine to resolve queries for a given XPath expression. Example: –//book/section –eXist uses two index lookups: the first for the node, and the second for the node –eXist computes the structural join between these node sets to determine which elements are in fact children of elements

M D Copyright 2008 Dan McCreary & Associates6 Range Index Range indexes provide a shortcut for the database to directly select nodes based on their typed values. Used when matching or comparing nodes by way of standard XPath operators and functions. Without a range index, comparison operators like =, > or < will default to a "brute-force" inspection of the DOM, which can be extremely slow if eXist has to search through maybe millions of nodes: each node has to be loaded and cast to the target type.

M D Copyright 2008 Dan McCreary & Associates7 Example You have a catalog of items that contain 50,000 items You want to find all items that have a price under $100 XPath: //item[price < 100.0] Without a range index you would have to do up to 50,000 comparisons for each search With a range index it would quickly find the subset that have a price under $100 with a single lookup

M D Copyright 2008 Dan McCreary & Associates8 Restriction on Ranges All collections that are included in the search must be indexed The data types must match Their must be no context dependencies

M D Copyright 2008 Dan McCreary & Associates9 All Collections Must be Indexes The range index must be defined on all items in the input sequence –If you search collections A and B but only A is range indexed, the query will not use the indexes Collection A Collection B with range index no range index XQuery

M D Copyright 2008 Dan McCreary & Associates10 Fulltext Fallback If all collections do not have the exact same type of range index the search will automatically revert to using the default fulltext indexes (slow)

M D Copyright 2008 Dan McCreary & Associates11 Data Types Must Match The index data type (first argument type) must match the test data type (second argument type) Wrong –//item[price = '1000.0'] Right –//item[price < xs:double($max-price)]

M D Copyright 2008 Dan McCreary & Associates12 Context Dependencies The right-hand argument must not have dependencies on the current context item. Wrong: –//item[price = self] Right: –//item[xf:double($max-price) < price]

M D Copyright 2008 Dan McCreary & Associates13 Fulltext Index Used to query for a sequence of separate "words" or tokens in a longer stream of text. While building the index, the text is parsed into single tokens which are then stored in the index. Historically, eXist has been creating a default full text index on all text nodes and attribute values. This will likely change in the future as the index is undergoing a major redesign. As the index becomes more configurable, we may drop the current default indexing behaviour. Anyway, as for the other index types, you can configure the full text index in the collection configuration and we will try to keep the configuration of the new index backwards compatible. We thus recommend to create a collection configuration file, disable the default index-all behaviour and define some explicit full text indexes on your documents. The details of this process will be described below. The full text index is only used in combination with eXist's fulltext search extensions. In particular, you can use the following eXist-specific operators and functions that apply a fulltext index:

M D Copyright 2008 Dan McCreary & Associates14 Fulltext Operators and Functions Operators: –&= –|= Main Functions –text:match-all() –text:match-any() –near()

M D Copyright 2008 Dan McCreary & Associates15 Disabling Indexes If you have disabled full text indexing for certain elements, these operators and functions will also be effectively disabled, and will not return matches. eXist will not return results for queries that normally would have results provided fulltext indexing was enabled. This is in direct contrast to the operation of range indexing, which does fallback to full searching of the document if no range index applies

M D Copyright 2008 Dan McCreary & Associates16 Geospatial Indexing (Beta) A working proof-of-concept index, which listens for spatial geometries described through the Geography Markup Language (GML)

M D Copyright 2008 Dan McCreary & Associates17 Sample Geospatial Data , , , , , , , , , ,

M D Copyright 2008 Dan McCreary & Associates18 Sample of Geospatial Queries What is the distance from point X to point Y? What items are within X miles of this point? What are inside county Y?

M D Copyright 2008 Dan McCreary & Associates19 Custom Indexing eXist version 1.2 and later feature a modularized indexing architecture Allows arbitrary indexes to be plugged into an indexing pipeline Required Java development skills See –

M D Copyright 2008 Dan McCreary & Associates20 For the eXist Database Administrator For each collection you want to administer –/db/foo - create a file collection.xconf and store it as /db/system/config/db/foo/collection.xconf Inheritance –Subcollections which do not have a collection.xconf file of their own will be governed by the configuration policy specified for the closest ancestor collection which does have such a file

M D Copyright 2008 Dan McCreary & Associates21 Inheritance Example /db /db/foo /db/foo/bar /db/system/config/db/foo/collection.xconf If no collection exists for this collection it will default to the parent’s collection configuration.

M D Copyright 2008 Dan McCreary & Associates22 Thank You! Please contact me for more information: Native XML Databases Metadata Management Metadata Registries Service Oriented Architectures Business Intelligence and Data Warehouse Semantic Web Dan McCreary, President Dan McCreary & Associates Metadata Strategy Development (952)

M D Copyright 2008 Dan McCreary & Associates23 Index Creation and Updates The eXist index system automatically maintains and updates indexes defined by the user You therefore do not need to update an index when you update a database document or collection. eXist will even update indexes following partial document updates via XUpdate or XQuery Update expressions. The only exception to eXist's automatic update occurs when you add a new index definition to an existing database collection

M D Copyright 2008 Dan McCreary & Associates24 Sample Collection Index