Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.semantec.de ´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr.

Similar presentations


Presentation on theme: "Www.semantec.de ´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr."— Presentation transcript:

1 www.semantec.de ´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr. 32 D-71083 Herrenberg, Germany www.semantec.de Search within your Oracle table data like searching the web with Google

2 www.semantec.de 2 Agenda Motivation Applications contain valuable data How difficult it is to search for it How easy it is in Google What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements

3 www.semantec.de 3 Applications contain valuable data

4 www.semantec.de 4 Classical approach - Instring search with LIKE Too complex to use Too slow – often results in full table scan No advanced search expressions No text fragments CAT finds also: APPLICATION VACATION Not flexible – expensive to add or remove searchable fields

5 www.semantec.de 5 How easy it is in Google Results presented in pages Link to open the document Highlighted text fragments Full document location (document context)

6 www.semantec.de 6 How to search here?

7 www.semantec.de 7 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements Agenda

8 www.semantec.de 8 Fast search Order by relevance Options to narrow and judge the hits Advanced search expressions More information about the object hit Text fragments with highlighted keywords Keyword context – where is the keyword found Object context - extended object information Search by object type Search within specific object attribute Direct access to the object found Accessible – to wide user group What makes a good search engine?

9 www.semantec.de 9 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements Agenda

10 www.semantec.de 10 Direct Info Framework developed by Semantec Builds on Oracle Text platform Built with pure PL/SQL All code is stored in Oracle

11 www.semantec.de 11 Data Model

12 www.semantec.de 12 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements What is Oracle Text Indexing data Search results presentation Agenda

13 www.semantec.de 13 What is Oracle Text? Formerly known as ConText (8.0) and interMedia Text (8i) Uses standard SQL to index, search and analyze text and documents stored in the Oracle database, in files and on the Web Allows advanced searching including keyword search, pattern matching, boolean expressions, etc. Supports multiple languages

14 www.semantec.de 14 Oracle Text Index Usage CREATE INDEX DOC_INDEX_01 ON DOC_TABLE_01(location) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('DATASTORE USER_DATASTORE_01'); SELECT doc_name FROM DOC_TABLE_01 WHERE CONTAINS(location,'mouse AND wireless', 1) > 0 ORDER BY score(1) DESC Oracle Text index creation: Oracle Text index search:

15 www.semantec.de 15 Boolean expressions, Proximity search AND (&) – mouse AND wireless OR (|) – mouse OR wireless NOT (~) – mouse NOT wireless ACCUMulate (,) – mouse, monitor, cd NEAR – NEAR((mouse,wireless),5)

16 www.semantec.de 16 Expansion operators Allow to expand the word list searched for Wildcard (%, _) – only portion of the word _ing -> sing king ping monito% -> monitor monitoring Soundex (!) – words that sound similarly !sing -> sing sink Fuzzy – words that are spelled similarly fuzzy(sing,70,10,weight) -> sing king sink Stem ($) – words having the same linguistic root $sing -> sing sang sung

17 www.semantec.de 17 Thesauri examples Theme search – ABOUT(economics) Broader term – BT(cat) -> animal Narrower term – NT(animal) -> cat dog Associative relation – RT(cat) -> kitten Translated term – TR(cat) -> cat gato Synonym – SYN(cat) -> cat tiger

18 www.semantec.de 18 Datastore Direct and Multi-column documents doc_nameauthortext documents doc_nameauthortext DirectMulti-column......... Allowed datatypes: CHAR VARCHAR VARCHAR2 BLOB CLOB BFILE XMLType

19 www.semantec.de 19 Datastore Detail and Nested documents doc_nameauthor doc_details doc_nameseq_notext Detail { { documents doc_nameauthordoc_nst seq_notext Nested

20 www.semantec.de 20 Indexing data - Data Model

21 www.semantec.de 21 Indexing Data Oracle Text Features User datastore – PL/SQL procedure delivers the contents to be indexed AUTO_SECTION_GROUP – Instructs Oracle to create separate section for each XML tag and index only its value

22 www.semantec.de 22 Indexing data Putting it all together 50 635 Person Jurgen Claus Software Engineer Germany 28.05.1935 Germany 80995 München Dachauer Str. 665 Germany... Data + Metadata Extraction Data Indexing Oracle Text Index

23 www.semantec.de 23 How easy it is in Google Results presented in pages Link to open the document Highlighted text fragments Full document location (document context)

24 www.semantec.de 24 Search Results Presentation Results presented in pages Link to open the customer edit application Location of the keyword found Extended customer info in balloon window Most important info: Address and contacts Highlighted text fragments

25 www.semantec.de 25 Summary Direct Info uses Oracle Text as a solid platform for creating an advanced full text search solution Powerful text search capabilities Advanced results presentation features Rich features to judge the results Plugable into existing applications

26 www.semantec.de 26 Want to know more? Semantec GmbH. Krasen Paskalev, Armin Singer Benzstr. 32 D-71083 Herrenberg, Germany +49(7032)9130-0 +49(7032)9130-12 +49(7032)9130-22 krasen.paskalev@semantec.bg singer@semantec.de www.semantec.de Company: Name: Address: Telephone: Fax: E-Mail: Internet: Meet us here -> booth C10 on the ground floor


Download ppt "Www.semantec.de ´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr."

Similar presentations


Ads by Google