Unlocking Hidden Gems in Oracle Text

Slides:



Advertisements
Similar presentations
9 Creating and Managing Tables. Objectives After completing this lesson, you should be able to do the following: Describe the main database objects Create.
Advertisements

Data Definition Language (DDL)
Unlocking Hidden Gems in Oracle Text
Benchmarking Oracle 8i Intermedia Text Background for this benchmark Interesting new features in OIMT Benchmarking, methodology and problems Results Conclusions.
Going Mainstream Data Access Europe Nick Nikijuluw.
Microsoft Access Course 1. Introduction to the user interface.
Feature requests for Case Manager By Spar Nord Bank A/S IBM Insight 2014 Spar Nord Bank A/S1.
9 Copyright © Oracle Corporation, All rights reserved. Creating and Managing Tables.
DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 2 Hidden Gems of APEX David Gale Software Engineer Oracle Application Express November,
Oracle Text Operations J. Molka-Danielsen Sept. 30, 2002.
1 Agenda Views Pages Web Parts Navigation Office Wrap-Up.
07/19/04 NorCal OAUG Training Day, Paper 2.4 John Peters, JRPJR, Inc.1 Oracle Workflow Notifications John Peters JRPJR, Inc.
CERN – European Organization for Nuclear Research Administrative Support - Advanced Information Systems Introduction to Oracle interMedia-Text By Derek.
Oracle Text NoCOUG Presentation August 15, Session Objectives Review Oracle Text basics Index Options Compare Oracle Text with interMedia and ConText.
Oracle Text saves your time Oracle Text Search saves your time Anna Suwalska European Organization for Nuclear Research - Geneva OracleWorld Paris 2003.
CHAPTER 11 Large Objects. Need for Large Objects Data type to store objects that contain large amount of text, log, image, video, or audio data. Most.
Database Design for DNN Developers Sebastian Leupold.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Advanced searching with Oracle Text Indexing and searching in text and documents Author: Krasen Paskalev Certified Oracle DBA Semantec.
Open Your Mind to Open Source MPDO’s & EOPR’s Centre for IT & eGovernance AMR-APARD Hyderabad Welcome!
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
The Internet 8th Edition Tutorial 4 Searching the Web.
´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr.
Digas Digital Archiving System. Digas is the database program used for research and fact checking in the Research Department (“Dokumentation”, ~ 60 researchers)
Demo: Power Tools for P8 Presenter: Jay Bowen Demonstration Topic: Choice List Features Demo URL below Power Tools Choice List Support 1. Native P8 Choice.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
SupervisorStudent Prof. Atilla ElciHussam Hussein ABUAZAB June 2007 Using ORACLE XML Parser to Access Ontology CMPE 588 Engineering Semantic for.
1Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Oracle 8i interMedia Text Presented by Jorge Rimblas 4-Feb-2002 SSI Worldwide.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
9 Copyright © Oracle Corporation, All rights reserved. Creating and Managing Tables.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
Chapter 4 Logical & Physical Database Design
Transactions, Roles & Privileges Oracle and ANSI Standard SQL Lecture 11.
Hyperion Artifact Life Cycle Management Agenda  Overview  Demo  Tips & Tricks  Takeaways  Queries.
Experience with XML Schema Ashok Malhotra Schema Usage  Mapping XML Schema and XML documents controlled by the Schema to object classes and instances.
1 PDMLink Application - User Features & Functions Module 6: Search Capabilities.
1 11g NEW FEATURES ByVIJAY. 2 AGENDA  RESULT CACHE  INVISIBLE INDEXES  READ ONLY TABLES  DDL WAIT OPTION  ADDING COLUMN TO A TABLE WITH DEFAULT VALUE.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
 CONACT UC:  Magnific training   
IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.
CHAPTER 7 LESSON B Creating Database Reports. Lesson B Objectives  Describe the components of a report  Modify report components  Modify the format.
Web Database Programming Using PHP
CARA 3.10 Major New Features
Data Virtualization Tutorial: XSLT and Streaming Transformations
Web Database Programming Using PHP
Using Data Dictionary and Dynamic Performance Views
UFC #1433 In-Memory tables 2014 vs 2016
Searching Business Data with MOSS 2007 Enterprise Search
Database Performance Tuning and Query Optimization
Database Vocabulary Terms.
What is that service I never turn on?
Searching Business Data with MOSS 2007 Enterprise Search
Multi-host Internet Access Portal (MIAP) Enhancement Guide
5 Tips for Upgrading Reports to v 6.3
Introduction to Oracle Application Express
Benchmark Series Microsoft Word 2016 Level 2
Contents Preface I Introduction Lesson Objectives I-2
PL/SQL Declaring Variables.
Chapter 11 Database Performance Tuning and Query Optimization
Welcome USAS – R March 20th, 2019 Valley View 4/7/2019.
Database Systems: Design, Implementation, and Management Tenth Edition
Rational Publishing Engine RQM Multi Level Report Tutorial
Oracle and XML Mingzhu Wei /7/2019.
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Presentation transcript:

Unlocking Hidden Gems in Oracle Text Presenter: Bill Coulam (www.dbartisans.com)

Agenda What is Oracle Text Installation CONTEXT Indexes & Features CONTAINS Queries Multi-column index Multi-table, multi-column index

Oracle Text Built into Oracle DB (PE, SE, SE One, EE) Free to use with existing DB license ConText Cartridge (8) interMedia Text (8i). Named “Oracle Text” since 9i Technology built into Oracle that extends indexing capabilities to text, XML, CLOB, documents stored as BLOB, BFILE and web pages. Build document classification and cataloging applications. Special XML and HTML features as well.

Oracle Text Has searching text columns ever been hindered by the limitations of =, LIKE, SUBSTR, INSTR? User misspellings Case problems International characters Search many columns at once for term(s) Querying large LOB columns inefficient. Oracle CONTEXT indexes solves these

Misspelled Search Terms/Data Traditional: Maybe SOUNDEX Generally “No Data for You!” Oracle Text: Partial term with basic CONTAINS, substring index or wildcarding EQUIV, Fuzzy, Stemming, Soundex, Thesaurus

Case Problems Traditional: Oracle Text: Store original source as all one case. All searches converted to single case. Data looks nasty when displayed, printed Store single case copy columns in table or MV Extra space required, trigger, moving parts, violates redundancy. Oracle Text: All tokens indexed in UPPER case by default. Searches are case-insensitive by default

International Characters/Diacritics Traditional: Equivalence (=) by NLSSORT(UPPER(column),'NLS_SORT=BINARY') LIKE by removing diacritics from both sides of the equation using frontend libraries and/or Oracle TRANSLATE: TRANSLATE(column,'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçşèéêëìíîïðñóôõöøùúûüýÿ', 'AAAAAACEEEEIIIIDNOOOOOOUUUUYaaaaaacseeeeiiiionooooouuuuyy') Oracle Text: base_letter attribute of context index Host of features for language nuances

Proximity, Relevance, Theme, Exclusion Searches Traditional: Only exclusion by use of AND NOT LIKE Oracle Text: ACCUM, SCORE, ABOUT, boolean operators, MINUS, NOT, NEAR, and THRESHOLD.

Search multiple columns or tables Traditional: Joins with OR LIKE View using UNION [ALL] or concatenation Materialized view Concatenated copy column Oracle Text: MULTI_COLUMN_DATASTORE USER_DATASTORE

Check Installation Installed by default if DB created with DBCA. CTXSYS schema 9iR2: 260+ objects 10gR2: 340+ objects Security issues fixed in 10g Account usually in locked state Check version SELECT * FROM ctxsys.ctx_version;

Installation Drop CTXSYS if it exists. As SYS, run As CTXSYS, run $ORACLE_HOME/ctx/admin/catctx.sql pwd SYSAUX TEMP NOLOCK As CTXSYS, run $ORACLE_HOME/ctx/admin/defaults/drdefus.sql Now check your installation 260 objects in 9iR2 345 objects in 10gR2 340 objects in 11gR1

First Context Index Prerequisite: A source of text to index: Character column (VARCHAR2, CLOB, etc) Binary column that contains text (BLOB, BFILE) XMLType URIType Column that is path to file Column that is path to web page

First Context Index Set up/verify the account that contains the data to index Must have EXECUTE on CTXSYS.CTX_DDL Can have CTXAPP role (necessary < 10g, optional after) Might want private synonyms to CTXSYS packages, especially CTX_DDL. Set up/verify the data to be indexed Create the index CREATE INDEX place_nm_cidx ON places(place_nm) INDEXTYPE IS CTXSYS.CONTEXT; CREATE INDEX place_notes_cidx ON places(place_notes) INDEXTYPE IS CTXSYS.CONTEXT;

What Happened? What is going on when using the defaults? Detects the column type and filters binary column types. Decides text language is same as DB lang Uses the default stoplist Enables fuzzy and stemming queries Feeds data from datastore to filter, sectioner, lexer, then indexer. Datastore Filter Sectioner Lexer Indexer Context Index!

What Happened? Added some metadata to tables in CTXSYS: DR$INDEX, DR$INDEX_VALUE and DR$INDEX_OBJECT Created some DR$indexname$ tables in index-owning account: DR$PLACE_NM_CIDX$I (tokens) DR$PLACE_NM_CIDX$K (keymap) DR$PLACE_NM_CIDX$N (negative list) DR$PLACE_NM_CIDX$R (rowid) DR$indexname$P (substrings indexes) DR$indexname$S (new to 11g, filtered/ordered)

CONTAINS Queries Having a Context index opens up the world of CONTAINS queries for you. CONTAINS(indexed_column, query expr [, label]) Returns a numeric relevance between 0 and 100. If 0, the row will not be included in the result set. SCORE(label) Optional. Used in the SELECT list, returns the relevance as a virtual column SELECT col1, col2, SCORE(i) FROM table WHERE CONTAINS(col1,expr,i); CONTAINS sports all sorts of query options.

Customizing the Context Index CREATE INDEX... name ON table(column(s)) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS (' [DATASTORE datastore_pref] [FILTER filter_pref] [LEXER lexer_pref] [FORMAT COLUMN format_column_name] [SECTION GROUP section_group] [STOPLIST stoplist] [WORDLIST wordlist_pref] [STORAGE storage_pref] [CHARSET COLUMN charset_column_name] [LANGUAGE COLUMN language_column_name] [MEMORY memsize] [POPULATE | NOPOPULATE] -- 11g [SYNC (MANUAL | EVERY "interval" | ON COMMIT)] -- 10g [TRANSACTIONAL]') -- 10g

Keeping a Text Index Current <= 9i Write jobs to sync CTX_DDL.sync_index CTX_DDL.optimize_index >= 10g Allow Oracle to write the job SYNC (on commit or by interval) >= 10g In-memory cache of changes TRANSACTIONAL

Multi-Column Text Index Pre-requisites: Data Dummy column Trigger Preferences The virtual document is composed of the contents of the columns concatenated in the listing order with column name tags automatically added.

Multi-Column, Multi-Table Index Pre-requisites: More tables, more data Preferences and optional Section Group Procedure to concatenate text IN ROWID OUT [CLOB, BLOB, CLOB_LOC, BLOB_LOC, or VARCHAR2] Dummy column Triggers

Questions?

Thank You for Attending! Please fill out your evaluation form. Contact: bcoulam@yahoo.com