義守大學資訊工程學系 作者:郭東黌, 張佑康 報告人:徐碩利 Date: 2006/11/01 2018/2/15 適用於FTP之全文檢索系統實作與分析 Implementation and Analysis of an FTP-specific Full-text Search System 義守大學資訊工程學系 作者:郭東黌, 張佑康 報告人:徐碩利 Date: 2006/11/01
Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Outline Introduction Proposed architecture Experimental result and analysis Conclusion Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Introduction FTP sites with large amount of files Filename search Full-text search Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Problem Indexing a large amount of data Improving search performance Reforming ranking results Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Purpose FTP-specific full-text search system Performance benchmark Precision evaluation Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Current systems Google Proposed architecture in 1998 Large-scale web search engine Inverted index scheme Gais FTPLocate SmartArchie An intelligent file search engine ProxyLog-based file search engine Implementation and Analysis of an FTP-specific Full-text Search System
Proposed architecture 2018/2/15 Proposed architecture Data indexing (data crawler) File list retrieval and analysis File description extraction from FTP File name and description indexing Query processing Query pre-parsing Data searching Result sorting Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System
Data indexing (data crawler) File list retrieval and analysis File list retrieval in many ways FTP – LIST Rsync Unix – find、ls File description extraction Specific formats of file descriptions XML Package Metadata RFC-index 00_index Implementation and Analysis of an FTP-specific Full-text Search System
Data indexing (data crawler) File name and description indexing Simple string segmentation method Separating Chinese serial words into Chinese characters one by one Separating English words by space or symbols Full-text indexing Based on the modified full-text parser of MySQL Multi-language full-text index support Stopwords Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Query processing Query pre-parsing Chinese segmentation with exhaustive search using Libtabe dictionary Term frequency clustering Using k-means algorithm Score computation Group A is required, Group B is optional Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System k-means clustering Implementation and Analysis of an FTP-specific Full-text Search System
Query results cache method 2018/2/15 Query results cache method Cache method Query segmentation and clustering MD5 Benefits Reducing search time Avoiding duplicated search Implementation and Analysis of an FTP-specific Full-text Search System
Experimental result and analysis 2018/2/15 Experimental result and analysis Platform CPU:Dual P4 2.6 GHz Ram:8 GBytes OS:Gentoo Linux (Linux 2.4.25) Storage:3320.26GB used Statistics Files:5,414,639 Segmented terms:219,167,352 Terms in the dictionary:236,275 Implementation and Analysis of an FTP-specific Full-text Search System
Frequency distribution Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System Average search time Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Search result (1) Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Search result (2) Implementation and Analysis of an FTP-specific Full-text Search System
Performance evaluation 2018/2/15 Performance evaluation Search and download times Hit rate Implementation and Analysis of an FTP-specific Full-text Search System
Search and download times Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System Hit rate Implementation and Analysis of an FTP-specific Full-text Search System
Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Conclusion We implement the method on I-Shou University FTP server http://ftp.isu.edu.tw Effects using full-text search on FTP server is better than traditional search. The average of hit rate is greater than 0.6 Thank you for your attention! Implementation and Analysis of an FTP-specific Full-text Search System