´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr. 32 D Herrenberg, Germany Search within your Oracle table data like searching the web with Google
2 Agenda Motivation Applications contain valuable data How difficult it is to search for it How easy it is in Google What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements
3 Applications contain valuable data
4 Classical approach - Instring search with LIKE Too complex to use Too slow – often results in full table scan No advanced search expressions No text fragments CAT finds also: APPLICATION VACATION Not flexible – expensive to add or remove searchable fields
5 How easy it is in Google Results presented in pages Link to open the document Highlighted text fragments Full document location (document context)
6 How to search here?
7 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements Agenda
8 Fast search Order by relevance Options to narrow and judge the hits Advanced search expressions More information about the object hit Text fragments with highlighted keywords Keyword context – where is the keyword found Object context - extended object information Search by object type Search within specific object attribute Direct access to the object found Accessible – to wide user group What makes a good search engine?
9 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements Agenda
10 Direct Info Framework developed by Semantec Builds on Oracle Text platform Built with pure PL/SQL All code is stored in Oracle
11 Data Model
12 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements What is Oracle Text Indexing data Search results presentation Agenda
13 What is Oracle Text? Formerly known as ConText (8.0) and interMedia Text (8i) Uses standard SQL to index, search and analyze text and documents stored in the Oracle database, in files and on the Web Allows advanced searching including keyword search, pattern matching, boolean expressions, etc. Supports multiple languages
14 Oracle Text Index Usage CREATE INDEX DOC_INDEX_01 ON DOC_TABLE_01(location) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('DATASTORE USER_DATASTORE_01'); SELECT doc_name FROM DOC_TABLE_01 WHERE CONTAINS(location,'mouse AND wireless', 1) > 0 ORDER BY score(1) DESC Oracle Text index creation: Oracle Text index search:
15 Boolean expressions, Proximity search AND (&) – mouse AND wireless OR (|) – mouse OR wireless NOT (~) – mouse NOT wireless ACCUMulate (,) – mouse, monitor, cd NEAR – NEAR((mouse,wireless),5)
16 Expansion operators Allow to expand the word list searched for Wildcard (%, _) – only portion of the word _ing -> sing king ping monito% -> monitor monitoring Soundex (!) – words that sound similarly !sing -> sing sink Fuzzy – words that are spelled similarly fuzzy(sing,70,10,weight) -> sing king sink Stem ($) – words having the same linguistic root $sing -> sing sang sung
17 Thesauri examples Theme search – ABOUT(economics) Broader term – BT(cat) -> animal Narrower term – NT(animal) -> cat dog Associative relation – RT(cat) -> kitten Translated term – TR(cat) -> cat gato Synonym – SYN(cat) -> cat tiger
18 Datastore Direct and Multi-column documents doc_nameauthortext documents doc_nameauthortext DirectMulti-column Allowed datatypes: CHAR VARCHAR VARCHAR2 BLOB CLOB BFILE XMLType
19 Datastore Detail and Nested documents doc_nameauthor doc_details doc_nameseq_notext Detail { { documents doc_nameauthordoc_nst seq_notext Nested
20 Indexing data - Data Model
21 Indexing Data Oracle Text Features User datastore – PL/SQL procedure delivers the contents to be indexed AUTO_SECTION_GROUP – Instructs Oracle to create separate section for each XML tag and index only its value
22 Indexing data Putting it all together Person Jurgen Claus Software Engineer Germany Germany München Dachauer Str. 665 Germany... Data + Metadata Extraction Data Indexing Oracle Text Index
23 How easy it is in Google Results presented in pages Link to open the document Highlighted text fragments Full document location (document context)
24 Search Results Presentation Results presented in pages Link to open the customer edit application Location of the keyword found Extended customer info in balloon window Most important info: Address and contacts Highlighted text fragments
25 Summary Direct Info uses Oracle Text as a solid platform for creating an advanced full text search solution Powerful text search capabilities Advanced results presentation features Rich features to judge the results Plugable into existing applications
26 Want to know more? Semantec GmbH. Krasen Paskalev, Armin Singer Benzstr. 32 D Herrenberg, Germany +49(7032) (7032) (7032) Company: Name: Address: Telephone: Fax: Internet: Meet us here -> booth C10 on the ground floor