Download presentation
Presentation is loading. Please wait.
Published byRosalind Austin Modified over 9 years ago
1
www.semantec.de ´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr. 32 D-71083 Herrenberg, Germany www.semantec.de Search within your Oracle table data like searching the web with Google
2
www.semantec.de 2 Agenda Motivation Applications contain valuable data How difficult it is to search for it How easy it is in Google What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements
3
www.semantec.de 3 Applications contain valuable data
4
www.semantec.de 4 Classical approach - Instring search with LIKE Too complex to use Too slow – often results in full table scan No advanced search expressions No text fragments CAT finds also: APPLICATION VACATION Not flexible – expensive to add or remove searchable fields
5
www.semantec.de 5 How easy it is in Google Results presented in pages Link to open the document Highlighted text fragments Full document location (document context)
6
www.semantec.de 6 How to search here?
7
www.semantec.de 7 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements Agenda
8
www.semantec.de 8 Fast search Order by relevance Options to narrow and judge the hits Advanced search expressions More information about the object hit Text fragments with highlighted keywords Keyword context – where is the keyword found Object context - extended object information Search by object type Search within specific object attribute Direct access to the object found Accessible – to wide user group What makes a good search engine?
9
www.semantec.de 9 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements Agenda
10
www.semantec.de 10 Direct Info Framework developed by Semantec Builds on Oracle Text platform Built with pure PL/SQL All code is stored in Oracle
11
www.semantec.de 11 Data Model
12
www.semantec.de 12 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements What is Oracle Text Indexing data Search results presentation Agenda
13
www.semantec.de 13 What is Oracle Text? Formerly known as ConText (8.0) and interMedia Text (8i) Uses standard SQL to index, search and analyze text and documents stored in the Oracle database, in files and on the Web Allows advanced searching including keyword search, pattern matching, boolean expressions, etc. Supports multiple languages
14
www.semantec.de 14 Oracle Text Index Usage CREATE INDEX DOC_INDEX_01 ON DOC_TABLE_01(location) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('DATASTORE USER_DATASTORE_01'); SELECT doc_name FROM DOC_TABLE_01 WHERE CONTAINS(location,'mouse AND wireless', 1) > 0 ORDER BY score(1) DESC Oracle Text index creation: Oracle Text index search:
15
www.semantec.de 15 Boolean expressions, Proximity search AND (&) – mouse AND wireless OR (|) – mouse OR wireless NOT (~) – mouse NOT wireless ACCUMulate (,) – mouse, monitor, cd NEAR – NEAR((mouse,wireless),5)
16
www.semantec.de 16 Expansion operators Allow to expand the word list searched for Wildcard (%, _) – only portion of the word _ing -> sing king ping monito% -> monitor monitoring Soundex (!) – words that sound similarly !sing -> sing sink Fuzzy – words that are spelled similarly fuzzy(sing,70,10,weight) -> sing king sink Stem ($) – words having the same linguistic root $sing -> sing sang sung
17
www.semantec.de 17 Thesauri examples Theme search – ABOUT(economics) Broader term – BT(cat) -> animal Narrower term – NT(animal) -> cat dog Associative relation – RT(cat) -> kitten Translated term – TR(cat) -> cat gato Synonym – SYN(cat) -> cat tiger
18
www.semantec.de 18 Datastore Direct and Multi-column documents doc_nameauthortext documents doc_nameauthortext DirectMulti-column......... Allowed datatypes: CHAR VARCHAR VARCHAR2 BLOB CLOB BFILE XMLType
19
www.semantec.de 19 Datastore Detail and Nested documents doc_nameauthor doc_details doc_nameseq_notext Detail { { documents doc_nameauthordoc_nst seq_notext Nested
20
www.semantec.de 20 Indexing data - Data Model
21
www.semantec.de 21 Indexing Data Oracle Text Features User datastore – PL/SQL procedure delivers the contents to be indexed AUTO_SECTION_GROUP – Instructs Oracle to create separate section for each XML tag and index only its value
22
www.semantec.de 22 Indexing data Putting it all together 50 635 Person Jurgen Claus Software Engineer Germany 28.05.1935 Germany 80995 München Dachauer Str. 665 Germany... Data + Metadata Extraction Data Indexing Oracle Text Index
23
www.semantec.de 23 How easy it is in Google Results presented in pages Link to open the document Highlighted text fragments Full document location (document context)
24
www.semantec.de 24 Search Results Presentation Results presented in pages Link to open the customer edit application Location of the keyword found Extended customer info in balloon window Most important info: Address and contacts Highlighted text fragments
25
www.semantec.de 25 Summary Direct Info uses Oracle Text as a solid platform for creating an advanced full text search solution Powerful text search capabilities Advanced results presentation features Rich features to judge the results Plugable into existing applications
26
www.semantec.de 26 Want to know more? Semantec GmbH. Krasen Paskalev, Armin Singer Benzstr. 32 D-71083 Herrenberg, Germany +49(7032)9130-0 +49(7032)9130-12 +49(7032)9130-22 krasen.paskalev@semantec.bg singer@semantec.de www.semantec.de Company: Name: Address: Telephone: Fax: E-Mail: Internet: Meet us here -> booth C10 on the ground floor
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.