23 March March 1994 Article Type:FBIS Document Type:FOREIGN MEDIA NOTE--FB PN JAPAN JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY The Japan Association of Defense Industry (JADI), existing in its present form since 1988 and tracing its origin back to 1951, is an industry association under the supervision of the Ministry of International Trade and Industry (MITI) and the Japan Defense Agency (JDA). JADI promotes the development of Japanese defense technology and equipment, monitors foreign technology, lobbies on behalf of its corporate members for government defense spending, and cooperates with the government on export controls. (AUTHOR: MERCADO. QUESTIONS AND/OR COMMENTS, PLEASE CALL CHIEF, (AUTHOR: MERCADO. QUESTIONS AND/OR COMMENTS, PLEASE CALL CHIEF,"> 23 March March 1994 Article Type:FBIS Document Type:FOREIGN MEDIA NOTE--FB PN JAPAN JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY The Japan Association of Defense Industry (JADI), existing in its present form since 1988 and tracing its origin back to 1951, is an industry association under the supervision of the Ministry of International Trade and Industry (MITI) and the Japan Defense Agency (JDA). JADI promotes the development of Japanese defense technology and equipment, monitors foreign technology, lobbies on behalf of its corporate members for government defense spending, and cooperates with the government on export controls. (AUTHOR: MERCADO. QUESTIONS AND/OR COMMENTS, PLEASE CALL CHIEF, (AUTHOR: MERCADO. QUESTIONS AND/OR COMMENTS, PLEASE CALL CHIEF,">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 IR Project 90522017 黃楹芸 90522017 孫怡明 90522026. 2 Reference Collections The TREC Collection The TREC Collection  Built under the TIPSTER program  Documents.

Similar presentations


Presentation on theme: "1 IR Project 90522017 黃楹芸 90522017 孫怡明 90522026. 2 Reference Collections The TREC Collection The TREC Collection  Built under the TIPSTER program  Documents."— Presentation transcript:

1 1 IR Project 90522017 黃楹芸 90522017 孫怡明 90522026

2 2 Reference Collections The TREC Collection The TREC Collection  Built under the TIPSTER program  Documents from all sub-collections are tagged with SGML to allow easy parsing.  FBIS (Foreign Broadcast Information Service)  Size : 470 Mb  Number : 130,471 Docs  Words/Doc. (median) : 322  Words/Docs. (mean) : 543.6

3 3 Document Parsing: sample document <DOC> FBIS3-50 FBIS3-50 "cr00000015994001" "cr00000015994001" <HEADER> 23 March 1994 23 March 1994 Article Type:FBIS Document Type:FOREIGN MEDIA NOTE--FB PN 94-036--JAPAN JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY JAPAN: SPOTLIGHT ON JAPAN ASSOCIATION OF DEFENSE INDUSTRY </HEADER><TEXT> The Japan Association of Defense Industry (JADI), existing in its present form since 1988 and tracing its origin back to 1951, is an industry association under the supervision of the Ministry of International Trade and Industry (MITI) and the Japan Defense Agency (JDA). JADI promotes the development of Japanese defense technology and equipment, monitors foreign technology, lobbies on behalf of its corporate members for government defense spending, and cooperates with the government on export controls. (AUTHOR: MERCADO. QUESTIONS AND/OR COMMENTS, PLEASE CALL CHIEF, (AUTHOR: MERCADO. QUESTIONS AND/OR COMMENTS, PLEASE CALL CHIEF,</TEXT></DOC>

4 4 Document Parsing Process each document to extract: Process each document to extract:  Document ID  Segment the text into tokens  In our case, separate the text by white-spaces and newlines  Case conversion (make all tokens lowercase)  Discard stopwords and other non-content words (e.g. numbers)  Word stemming  Count term frequencies, record positions  Update indices Write out the index to file, according to alphabetical order from a to z Write out the index to file, according to alphabetical order from a to z

5 5 Project Introduction 作業平台 作業平台  a. CPU : Celeron 450 MHz  b. RAM 大小: 256 RAM  c. 作業系統: Win 2000 Server  d. 處理程式: Java + JDBC  e. 資料儲存: SQL Server 2000 使用的 Indexing 方法 使用的 Indexing 方法  Inverted indexing

6 6 System Architecture

7 7 Implement Our Use Interface Our Use Interface Our Use Interface Our Use Interface  http://140.115.156.81/IR/ http://140.115.156.81/IR/ Indexing Time Indexing Time  120 sec ~ 140 sec per file  Total ~ 16 Hour Searching Time Searching Time  “Information”- 13999 Records ~ 15 sec  “mobilize” – 866 Records ~ 3 sec Indexing File Indexing File  850 MB


Download ppt "1 IR Project 90522017 黃楹芸 90522017 孫怡明 90522026. 2 Reference Collections The TREC Collection The TREC Collection  Built under the TIPSTER program  Documents."

Similar presentations


Ads by Google