Presentation is loading. Please wait.

Presentation is loading. Please wait.

DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)

Similar presentations


Presentation on theme: "DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)"— Presentation transcript:

1 DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)

2 Sudeshna Banerji (CIS 595: Bioinformatics)  Topics to discuss: – Information retrieval – Text-indexing – DB2 Text Extenders – DB2 Net Search Extender – References – Questions

3 Sudeshna Banerji (CIS 595: Bioinformatics) A Little Background…  Information Retrieval(IR): Extraction of “relevant” information from huge volumes of data scattered across different databases. Examples: Textual search, image search, video search etc. Efficiency(time and speed) of IR is based on different INDEXING technologies. Indexing increases performance of system. An example of indexing technology: Text-indexing used for textual-search.

4 Sudeshna Banerji (CIS 595: Bioinformatics) A Little Background…  Text-Indexing : Process of deciding what will be used to represent a given document. A text index consists of significant terms extracted from the text documents, each term stored together with information about the document that contains it. The search is then handled as a query to look up the index.

5 Sudeshna Banerji (CIS 595: Bioinformatics) A Little Background…  Text-Indexing (continued): Involves the following: –Parsing the documents to recognize the structure. E.g title, date, other fields. –Scan for word tokens: numbers, special characters, hyphenation, capitalization etc. –Stopword removal: based on short list of common words like “the”, “and”, “or”.

6 Sudeshna Banerji (CIS 595: Bioinformatics) Indexing only Significant Terms

7 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Extenders – Product of IBM family that provide support to data beyond traditional character and numeric data types. – Extenders available for images, voice, video, complex documents (full-text search), spatial objects etc. – Trial and beta versions available for testing. – Link for extenders: http://www-3.ibm.com/software/data/db2/extenders/index.html

8 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Text Extenders – To meet the increasing demands of content management, IBM has introduced 3 full-text retrieval applications available for DB2 Universal Database (DB2 UDB). DB2 Net Search Extender DB2 Text Information Extender DB2 Text Extender – When to use what? Link for comparisons of the above: http://www-3.ibm.com/software/data/db2/extenders/fulltextcomparison.html

9 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender  Replaces DB2 Text Information Extender Version 7.2  Some important features: – Indexing speed of about 1GB per hour. – Different text formats: ASCII Plain text, HTML,XML, GPP – Base support for 37 languages including English, Spanish, French, Japanese and Chinese. – Sub-second search response times. – No decrease in search performance with up to 1000 concurrent queries per second.

10 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender  Some text-search capabilities: – Search can be performed using SQL (fourth generation language…almost like English query). – Searches can include: Boolean operations. Proximity search for words in the same sentence or paragraph: for HTML,XML and GPP. “Fuzzy” searches for words having a similar spelling as the search term: Andrew & Andru Thesaurus related search. Restrict searching to sections within documents. User can limit the search results with a “hit count”, and can also specify how the results are to be sorted.

11 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender  System requirements – DB2 Version 8.1 – Java Runtime Environment (JRE) Version 1.3.1  Windows Installation – Administrative rights required. – Call db2text start to start the DB2 Net Search Extender Instance Services.

12 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender  Simple example with the SQL queries – Following steps are required to do a basic textual- search in DB2 Net Search Extender: 1. Creating a database 2. Enabling a database for text search 3. Creating a table 4. Creating a full-text index 5. Loading sample data 6. Synchronizing the text index 7. Searching with the text index

13 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender 1. Creating a database: db2 "create database sample" 2. Enabling a database for text search: To start Net Search Extender Service db2text "START “ To prepare the database for use with DB2 Net Search Extender: db2text "ENABLE DATABASE FOR TEXT CONNECT TO sample"

14 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender 3. Creating a table: db2 "CREATE TABLE books (isbn VARCHAR(18) not null PRIMARY KEY, author VARCHAR(30), story LONG VARCHAR, year INTEGER)" 4. Creating a full-text index: db2text "CREATE INDEX db2ext.myTextIndex FOR TEXT ON books (story) CONNECT TO sample"

15 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender 5. Loading sample data: db2 "INSERT INTO books VALUES (‘0-13-086755- 1’,’John’,’ A man was running down the street.’,2001)“ db2 "INSERT INTO books VALUES (‘0-13-086755-2’, ‘Mike’, ’The cat hunts some mice.’, 2000)“ 6. Synchronizing the text index: db2text "UPDATE INDEX db2ext.myTextIndex FOR TEXT CONNECT TO sample“

16 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender 7. Searching with the text index: Using CONTAINS scalar search function: db2 "SELECT author, story FROM books WHERE CONTAINS (story, ‘”cat“’) = 1 AND year >= 2000" The following result table is returned: AUTHOR STORY Mike The cat hunts some mice.  NOTE: – To create a text-index, the text columns must be one of the following data types: CHAR, VARCHAR, LONG VARCHAR, CLOB.

17 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender  Thesaurus Support: – A thesaurus is structured like a network of nodes linked together by relations: Associative relations: RELATED_TO Synonym relations: SYNONYM_OF Hierarchical relations: LOWER_THAN, HIGHER_THAN – Creating and compiling a thesaurus: 1. Create a thesaurus definition file (explained below). 2. Compile the definition file into a thesaurus dictionary using DB2EXTTH utility.

18 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender  Create a thesaurus definition file. – Define its content in a definition file using a text editor. Example of some definition groups: :WORDS football.RELATED_TO goal.SYNONYM_OF soccer :WORDS chapel.LOWER_THAN skyscraper.HIGHER_THAN house

19 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender  An example of a structure of a Thesaurus: Game Ball Game Tennis Soccer HIGHER_THAN Football HIGHER_THAN SYNONYM_OF

20 Sudeshna Banerji (CIS 595: Bioinformatics) DB2 Net Search Extender  References: -http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/ document.d2w/report?fn=desu9m03.htm#ToC -Information Retrieval Site containing good lecture slides: http://ciir.cs.umass.edu/cmpsci646/ -Net Search Extender Administration and User’s Guide, Version 8.1 (can be downloaded with the software)

21 Sudeshna Banerji (CIS 595: Bioinformatics)  ANY QUESTIONS????


Download ppt "DB2 Net Search Extender Presenter: Sudeshna Banerji (CIS 595: Bioinformatics)"

Similar presentations


Ads by Google