Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究.

Similar presentations


Presentation on theme: "By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究."— Presentation transcript:

1 by Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究

2 AGENDA Introduction Comparison of the DBMS and IR approaches for document retrieval Proposed signature based IR technique System architecture Integration method Conclusions

3 User query Query formulation Signature of query Matching of similarity Results of retrieval Relevance feedback Signature of documents Indexing Document collections A Document Retrieval Model

4 Information Retrieval Database proprietary application dependent rich search capability standardization application independent limited search capability standardization application independent powerful modeling capability powerful search capability rich application development tools Convergence of Information Retrieval and Database

5 A text-search extension to the ORION OODBMS developed by Lee (1991). The integration of the INQUERY text retrieval system and the IRIS OODBMS proposed by Croft (1992). Mapping the SGML document structures into OODBMS ’ s data models: –Christophides (1994). –Macleod (1995). –Volz (1996), etc. Differing from some of the above efforts with the aims to model only SGML documents in DBMS, our system is particularly aimed at handling heterogeneous types of documents, such as textual and multimedia documents, and providing content-based retrieval functions to describe the stored document objects. Related work

6 The core features of OODBMS supported by most such systems are: 1. Complex objects 2. Object identity 3. Encapsulation 4. Types and Classes 5. Class or Type Inheritance 6. Overriding, overloading and late binding 7. Computational Completeness Why OODBMS ?

7 Signature file approach

8 document signatures are generated according to their composed Chinese characters the document signatures are divided into two segments: the first segment represents the occurrence of commonly-used Chinese characters, while the second segment represents the occurrence of the remaining Chinese characters and the English character bigrams the signature size can be adjusted with the average length of each document Concept of the scalable signature file approach

9 document input retrieved document objects interface for full-text search OODBMS GUI System Architecture (1)

10 IR queries (word, term OODBMS phrase) OQL queries retrieved document objects Search Engine & OODB Search engine Signature file Key features:-  Two stage search  Both IR and OQL queries are available  Signature file as a preprocessor for IR queries  Documents are stored as BLOB object representation in the OODBMS System Architecture (2)

11 Signature file as a pre-processor of the database queries

12 How the system formulates the query:- The system transforms Quasi-Natural language queries incrementally into complex structured queries in the query language. Goal: Free format queries Related techniques:- Key term extraction from the queries IR-queries-to-OQL conversion Query optimization User interface NLP Query text processing

13 The distinctive features of underlying system developed :- IR-OODBMS integration –OODBMS based document repository –a loose coupling approach –signature file filter as a preprocessor for query processing –two stage search –a novel query model –easy to maintain, including the signature file and database schema Signature generation –a character based signature method designed for Chinese and English documents Applicable to a digital library infrastructure Conclusions


Download ppt "By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究."

Similar presentations


Ads by Google