Www.semantec.de ´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr.

Slides:



Advertisements
Similar presentations
Getting Your Web Site Found. Meta Tags Description Tag This allows you to influence the description of your page with the web crawlers.
Advertisements

Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
Benchmarking Oracle 8i Intermedia Text Background for this benchmark Interesting new features in OIMT Benchmarking, methodology and problems Results Conclusions.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
Search Techniques Boolean Logic and Keyword Searching.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
CAPTURE SOFTWARE Please take a few moments to review the following slides. Please take a few moments to review the following slides. The filing of documents.
Advanced Searching Engineering Village.
Engineering Village ™ Basic Searching.
An introduction to Cambridge Collections Online… Full online access to collections of classic and newly- published scholarly titles in PDF format Contains.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
Information Retrieval in Practice
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
World Wide Web1 Applications World Wide Web. 2 Introduction What is hypertext model? Use of hypertext in World Wide Web (WWW) – HTML. WWW client-server.
Basics Computer Internet Search Strategy. Computer Basics IP address: Internet Protocol Address An identifier for a computer or device on a network The.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Microsoft Visio is diagramming software for Microsoft Windows. It uses vector graphics to create diagrams. The 2007 Standard and Professional editions.
Eric Sieverts University Library Utrecht IT Department Institute for Media & Information Management (Hogeschool van Amsterdam)
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
Oracle Text Operations J. Molka-Danielsen Sept. 30, 2002.
Googalize your Search with DirectInfo Documents DirectInfo Documents - New Features Author: Kiril Rusev Software Architect Semantec Bulgaria OOD Semantec.
Overview of Search Engines
Winter Consolidated Server Deployment Guide for Hosted Messaging and Collaboration version 3.5 Philippe Maurent Principal Consultant Microsoft.
Microsoft Office System UK Developers Conference Radisson Edwardian, Heathrow 29 th & 30 th June 2005.
CERN – European Organization for Nuclear Research Administrative Support - Advanced Information Systems Introduction to Oracle interMedia-Text By Derek.
Databases & Data Warehouses Chapter 3 Database Processing.
Adding metadata to web pages Please note: this is a temporary test document for use in internal testing only.
CEDROM-SNi’s DITA- based Project From Analysis to Delivery By France Baril Documentation Architect.
Searching the Internet Using Google Tips and Tricks.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Oracle 8i/9i features which support Data Warehousing Author: Krasen Paskalev Certified Oracle DBA Semantec GmbH. D Herrenberg.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Oracle vs SQL Server Dr. Alex Wang. Oracle Text Oracle Text uses standard SQL to do almost everything. Full-text retrieval technology, deal with unstructured.
A Survey of Patent Search Engine Software Jennifer Lewis April 24, 2007 CSE 8337.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Advanced searching with Oracle Text Indexing and searching in text and documents Author: Krasen Paskalev Certified Oracle DBA Semantec.
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
The Key to Successful Searching Software patents pending. ™ Trademarks of SLICCWARE Corporation All rights reserved. SM Service Mark of SLICCWARE Corporation.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
SEARCH OPTIMIZER By JAGANI RAJ 7 th /I.T. Guided By: Mrs. Darshana H. Patel.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Oracle 8i interMedia Text Presented by Jorge Rimblas 4-Feb-2002 SSI Worldwide.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
How to Navigate Search Tips More Acrobat Help Back to Main Menu More More Adobe Acrobat 4.0 Help BEHAVIORAL SCIENCE UNIT Domestic Violence by Police Officers.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
WINDOW SEARCH SERVER Topics  Topology  High-level Architecture  Performance  WSS vs. MOSS Search Comparison  Search Server 2008.
CS520 Web Programming Full Text Search Chengyu Sun California State University, Los Angeles.
11 Copyright © 2004, Oracle. All rights reserved. Managing XML Data in an Oracle 10g Database.
Introduction to Enterprise Search Corey Roth Blog: Twitter: twitter.com/coreyrothtwitter.com/coreyroth.
Information Retrieval in Practice
Building Enterprise Applications Using Visual Studio®
Information Architecture
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
New free text search engine for
Building Search Systems for Digital Library Collections
Unlocking Hidden Gems in Oracle Text
Prepared by Rao Umar Anwar For Detail information Visit my blog:
ITE 130 Web Searching.
Search Techniques and Advanced tools for Researchers
Eric Sieverts University Library Utrecht Institute for Media &
CAB Abstracts, Medline & Zoological Record
Introduction to Information Retrieval
Information Retrieval and Web Design
Lesson 2: Gathering and Organizing Information Using ICT KEY QUESTION: HOW DO YOU GATHER AND ORGANIZE INFORMATION USING THE COMPUTER AND INTERNET?
Presentation transcript:

´Google-ized´ search in your business data Author: Krasen Paskalev Certified Oracle 8i/9i DBA Seniour Oracle Consultant Semantec GmbH Benzstr. 32 D Herrenberg, Germany Search within your Oracle table data like searching the web with Google

2 Agenda Motivation Applications contain valuable data How difficult it is to search for it How easy it is in Google What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements

3 Applications contain valuable data

4 Classical approach - Instring search with LIKE Too complex to use Too slow – often results in full table scan No advanced search expressions No text fragments CAT finds also: APPLICATION VACATION Not flexible – expensive to add or remove searchable fields

5 How easy it is in Google Results presented in pages Link to open the document Highlighted text fragments Full document location (document context)

6 How to search here?

7 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements Agenda

8 Fast search Order by relevance Options to narrow and judge the hits Advanced search expressions More information about the object hit Text fragments with highlighted keywords Keyword context – where is the keyword found Object context - extended object information Search by object type Search within specific object attribute Direct access to the object found Accessible – to wide user group What makes a good search engine?

9 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements Agenda

10 Direct Info Framework developed by Semantec Builds on Oracle Text platform Built with pure PL/SQL All code is stored in Oracle

11 Data Model

12 Motivation What makes a good search engine Semantec‘s Direct Info – demo Direct Info concepts and architectural elements What is Oracle Text Indexing data Search results presentation Agenda

13 What is Oracle Text? Formerly known as ConText (8.0) and interMedia Text (8i) Uses standard SQL to index, search and analyze text and documents stored in the Oracle database, in files and on the Web Allows advanced searching including keyword search, pattern matching, boolean expressions, etc. Supports multiple languages

14 Oracle Text Index Usage CREATE INDEX DOC_INDEX_01 ON DOC_TABLE_01(location) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('DATASTORE USER_DATASTORE_01'); SELECT doc_name FROM DOC_TABLE_01 WHERE CONTAINS(location,'mouse AND wireless', 1) > 0 ORDER BY score(1) DESC Oracle Text index creation: Oracle Text index search:

15 Boolean expressions, Proximity search AND (&) – mouse AND wireless OR (|) – mouse OR wireless NOT (~) – mouse NOT wireless ACCUMulate (,) – mouse, monitor, cd NEAR – NEAR((mouse,wireless),5)

16 Expansion operators Allow to expand the word list searched for Wildcard (%, _) – only portion of the word _ing -> sing king ping monito% -> monitor monitoring Soundex (!) – words that sound similarly !sing -> sing sink Fuzzy – words that are spelled similarly fuzzy(sing,70,10,weight) -> sing king sink Stem ($) – words having the same linguistic root $sing -> sing sang sung

17 Thesauri examples Theme search – ABOUT(economics) Broader term – BT(cat) -> animal Narrower term – NT(animal) -> cat dog Associative relation – RT(cat) -> kitten Translated term – TR(cat) -> cat gato Synonym – SYN(cat) -> cat tiger

18 Datastore Direct and Multi-column documents doc_nameauthortext documents doc_nameauthortext DirectMulti-column Allowed datatypes: CHAR VARCHAR VARCHAR2 BLOB CLOB BFILE XMLType

19 Datastore Detail and Nested documents doc_nameauthor doc_details doc_nameseq_notext Detail { { documents doc_nameauthordoc_nst seq_notext Nested

20 Indexing data - Data Model

21 Indexing Data Oracle Text Features User datastore – PL/SQL procedure delivers the contents to be indexed AUTO_SECTION_GROUP – Instructs Oracle to create separate section for each XML tag and index only its value

22 Indexing data Putting it all together Person Jurgen Claus Software Engineer Germany Germany München Dachauer Str. 665 Germany... Data + Metadata Extraction Data Indexing Oracle Text Index

23 How easy it is in Google Results presented in pages Link to open the document Highlighted text fragments Full document location (document context)

24 Search Results Presentation Results presented in pages Link to open the customer edit application Location of the keyword found Extended customer info in balloon window Most important info: Address and contacts Highlighted text fragments

25 Summary Direct Info uses Oracle Text as a solid platform for creating an advanced full text search solution Powerful text search capabilities Advanced results presentation features Rich features to judge the results Plugable into existing applications

26 Want to know more? Semantec GmbH. Krasen Paskalev, Armin Singer Benzstr. 32 D Herrenberg, Germany +49(7032) (7032) (7032) Company: Name: Address: Telephone: Fax: Internet: Meet us here -> booth C10 on the ground floor