23 March March 1994 Article Type:FBIS Document Type:FOREIGN MEDIA NOTE--FB PN JAPAN JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY
Download presentation
Presentation is loading. Please wait.
1
1 IR Project 90522017 黃楹芸 90522017 孫怡明 90522026
2
2 Reference Collections The TREC Collection The TREC Collection Built under the TIPSTER program Documents from all sub-collections are tagged with SGML to allow easy parsing. FBIS (Foreign Broadcast Information Service) Size : 470 Mb Number : 130,471 Docs Words/Doc. (median) : 322 Words/Docs. (mean) : 543.6
3
3 Document Parsing: sample document <DOC> FBIS3-50 FBIS3-50 "cr00000015994001" "cr00000015994001" <HEADER> 23 March 1994 23 March 1994 Article Type:FBIS Document Type:FOREIGN MEDIA NOTE--FB PN 94-036--JAPAN JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY </HEADER><TEXT> The Japan Association of Defense Industry (JADI), existing in its present form since 1988 and tracing its origin back to 1951, is an industry association under the supervision of the Ministry of International Trade and Industry (MITI) and the Japan Defense Agency (JDA). JADI promotes the development of Japanese defense technology and equipment, monitors foreign technology, lobbies on behalf of its corporate members for government defense spending, and cooperates with the government on export controls. (AUTHOR: MERCADO. QUESTIONS AND/OR COMMENTS, PLEASE CALL CHIEF, (AUTHOR: MERCADO. QUESTIONS AND/OR COMMENTS, PLEASE CALL CHIEF,</TEXT></DOC>
4
4 Document Parsing Process each document to extract: Process each document to extract: Document ID Segment the text into tokens In our case, separate the text by white-spaces and newlines Case conversion (make all tokens lowercase) Discard stopwords and other non-content words (e.g. numbers) Word stemming Count term frequencies, record positions Update indices Write out the index to file, according to alphabetical order from a to z Write out the index to file, according to alphabetical order from a to z
5
5 Project Introduction 作業平台 作業平台 a. CPU : Celeron 450 MHz b. RAM 大小: 256 RAM c. 作業系統: Win 2000 Server d. 處理程式: Java + JDBC e. 資料儲存: SQL Server 2000 使用的 Indexing 方法 使用的 Indexing 方法 Inverted indexing
6
6 System Architecture
7
7 Implement Our Use Interface Our Use Interface Our Use Interface Our Use Interface http://140.115.156.81/IR/ http://140.115.156.81/IR/ Indexing Time Indexing Time 120 sec ~ 140 sec per file Total ~ 16 Hour Searching Time Searching Time “Information”- 13999 Records ~ 15 sec “mobilize” – 866 Records ~ 3 sec Indexing File Indexing File 850 MB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.