Download presentation
Presentation is loading. Please wait.
1
DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)
2
Sudeshna Banerji (CIS 595: Bioinformatics) Topics to discuss: – Information retrieval – Text-indexing – DB2 Text Extenders – DB2 Net Search Extender – References – Questions
3
Sudeshna Banerji (CIS 595: Bioinformatics) A Little Background… Information Retrieval(IR): Extraction of “relevant” information from huge volumes of data scattered across different databases. Examples: Textual search, image search, video search etc. Efficiency(time and speed) of IR is based on different INDEXING technologies. Indexing increases performance of system. An example of indexing technology: Text-indexing used for textual-search.
4
Sudeshna Banerji (CIS 595: Bioinformatics) A Little Background… Text-Indexing : Process of deciding what will be used to represent a given document. A text index consists of significant terms extracted from the text documents, each term stored together with information about the document that contains it. The search is then handled as a query to look up the index.
5
Sudeshna Banerji (CIS 595: Bioinformatics) A Little Background… Text-Indexing (continued): Involves the following: –Parsing the documents to recognize the structure. E.g title, date, other fields. –Scan for word tokens: numbers, special characters, hyphenation, capitalization etc. –Stopword removal: based on short list of common words like “the”, “and”, “or”.
6
Sudeshna Banerji (CIS 595: Bioinformatics) Indexing only Significant Terms
7
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Extenders – Product of IBM family that provide support to data beyond traditional character and numeric data types. – Extenders available for images, voice, video, complex documents (full-text search), spatial objects etc. – Trial and beta versions available for testing. – Link for extenders: http://www-3.ibm.com/software/data/db2/extenders/index.html
8
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Text Extenders – To meet the increasing demands of content management, IBM has introduced 3 full-text retrieval applications available for DB2 Universal Database (DB2 UDB). DB2 Net Search Extender DB2 Text Information Extender DB2 Text Extender – When to use what? Link for comparisons of the above: http://www-3.ibm.com/software/data/db2/extenders/fulltextcomparison.html
9
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender Replaces DB2 Text Information Extender Version 7.2 Some important features: – Indexing speed of about 1GB per hour. – Different text formats: ASCII Plain text, HTML,XML, GPP – Base support for 37 languages including English, Spanish, French, Japanese and Chinese. – Sub-second search response times. – No decrease in search performance with up to 1000 concurrent queries per second.
10
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender Some text-search capabilities: – Search can be performed using SQL (fourth generation language…almost like English query). – Searches can include: Boolean operations. Proximity search for words in the same sentence or paragraph: for HTML,XML and GPP. “Fuzzy” searches for words having a similar spelling as the search term: Andrew & Andru Thesaurus related search. Restrict searching to sections within documents. User can limit the search results with a “hit count”, and can also specify how the results are to be sorted.
11
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender System requirements – DB2 Version 8.1 – Java Runtime Environment (JRE) Version 1.3.1 Windows Installation – Administrative rights required. – Call db2text start to start the DB2 Net Search Extender Instance Services.
12
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender Simple example with the SQL queries – Following steps are required to do a basic textual- search in DB2 Net Search Extender: 1. Creating a database 2. Enabling a database for text search 3. Creating a table 4. Creating a full-text index 5. Loading sample data 6. Synchronizing the text index 7. Searching with the text index
13
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender 1. Creating a database: db2 "create database sample" 2. Enabling a database for text search: To start Net Search Extender Service db2text "START “ To prepare the database for use with DB2 Net Search Extender: db2text "ENABLE DATABASE FOR TEXT CONNECT TO sample"
14
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender 3. Creating a table: db2 "CREATE TABLE books (isbn VARCHAR(18) not null PRIMARY KEY, author VARCHAR(30), story LONG VARCHAR, year INTEGER)" 4. Creating a full-text index: db2text "CREATE INDEX db2ext.myTextIndex FOR TEXT ON books (story) CONNECT TO sample"
15
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender 5. Loading sample data: db2 "INSERT INTO books VALUES (‘0-13-086755- 1’,’John’,’ A man was running down the street.’,2001)“ db2 "INSERT INTO books VALUES (‘0-13-086755-2’, ‘Mike’, ’The cat hunts some mice.’, 2000)“ 6. Synchronizing the text index: db2text "UPDATE INDEX db2ext.myTextIndex FOR TEXT CONNECT TO sample“
16
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender 7. Searching with the text index: Using CONTAINS scalar search function: db2 "SELECT author, story FROM books WHERE CONTAINS (story, ‘”cat“’) = 1 AND year >= 2000" The following result table is returned: AUTHOR STORY Mike The cat hunts some mice. NOTE: – To create a text-index, the text columns must be one of the following data types: CHAR, VARCHAR, LONG VARCHAR, CLOB.
17
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender Thesaurus Support: – A thesaurus is structured like a network of nodes linked together by relations: Associative relations: RELATED_TO Synonym relations: SYNONYM_OF Hierarchical relations: LOWER_THAN, HIGHER_THAN – Creating and compiling a thesaurus: 1. Create a thesaurus definition file (explained below). 2. Compile the definition file into a thesaurus dictionary using DB2EXTTH utility.
18
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender Create a thesaurus definition file. – Define its content in a definition file using a text editor. Example of some definition groups: :WORDS football.RELATED_TO goal.SYNONYM_OF soccer :WORDS chapel.LOWER_THAN skyscraper.HIGHER_THAN house
19
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender An example of a structure of a Thesaurus: Game Ball Game Tennis Soccer HIGHER_THAN Football HIGHER_THAN SYNONYM_OF
20
Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender References: -http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/ document.d2w/report?fn=desu9m03.htm#ToC -Information Retrieval Site containing good lecture slides: http://ciir.cs.umass.edu/cmpsci646/ -Net Search Extender Administration and User’s Guide, Version 8.1 (can be downloaded with the software)
21
Sudeshna Banerji (CIS 595: Bioinformatics) ANY QUESTIONS????
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.