義守大學資訊工程學系 作者:郭東黌, 張佑康 報告人:徐碩利 Date: 2006/11/01

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Chapter 5: Introduction to Information Retrieval
Final Project of Information Retrieval and Extraction by d 吳蕙如.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Information Retrieval in Practice
張書銘 D 陳佑任 D 系統安全 Sniffer-Wireshark 之實作.
Properties of Text CS336 Lecture 3:. 2 Information Retrieval Searching unstructured documents Typically text –Newspaper articles –Web pages Other documents.
Design and Implementation of Web Switch
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Databases & Data Warehouses Chapter 3 Database Processing.
Nutch Search Engine Tool. Nutch overview A full-fledged web search engine Functionalities of Nutch  Internet and Intranet crawling  Parsing different.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Terrier: TERabyte RetRIevER An Introduction By: Kavita Ganesan (Last Updated April 21 st 2009)
Master Thesis Defense Jan Fiedler 04/17/98
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Query Routing in Peer-to-Peer Web Search Engine Speaker: Pavel Serdyukov Supervisors: Gerhard Weikum Christian Zimmer Matthias Bender International Max.
南台科技大學 資訊工程系 Automatic Website Summarization by Image Content: A Case Study with Logo and Trademark Images Evdoxios Baratis, Euripides G.M. Petrakis, Member,
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.
By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
生物資訊程式語言應用 Part 5 Perl and MySQL Applications. Outline  Application one.  How to get related literature from PubMed?  To store search results in database.
「串流代理伺服器平台」之設計與實做 Design and Implementation of a Streaming Proxy Server Platform for Internet Video Streaming 國科會自由軟體專案計畫 (NSC E ) 國立屏東科技大學資訊管理系.
資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 A Static Hand Gesture Recognition Algorithm Using K- Mean Based Radial Basis Function Neural Network 作者 :Dipak Kumar Ghosh,
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
護理品質線上題庫與評量系統 之開發與測試 報告人 : 徐南麗 慈濟大學護理學系教授 慈濟大學護研所所長.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
Design a full-text search engine for a website based on Lucene
Web Mining ( 網路探勘 ) WM06 TLMXM1A Wed 8,9 (15:10-17:00) U705 Information Retrieval and Web Search ( 資訊檢索與網路搜尋 ) Min-Yuh Day 戴敏育 Assistant Professor.
1 Language Specific Crawler for Myanmar Web Pages Pann Yu Mon Management and Information System Engineering Department Nagaoka University of Technology,
CFTP - A Caching FTP Server Mark Russell and Tim Hopkins Computing Laboratory University of Kent Canterbury, CT2 7NF Kent, UK 元智大學 資訊工程研究所 系統實驗室 陳桂慧.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
CS798: Information Retrieval Charlie Clarke Information retrieval is concerned with representing, searching, and manipulating.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Information Retrieval and Extraction 2010 Term Project – Modern Web Search Advisor: 陳信希 TA: 許名宏 & 王界人.
資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 A new social and momentum component adaptive PSO algorithm for image segmentation Expert Systems with Applications 38 (2011)
南台科技大學 資訊工程系 An effective solution for trademark image retrieval by combining shape description and feature matching 指導教授:李育強 報告者 :楊智雁 日期 : 2010/08/27.
1 CS 8803 AIAD (Spring 2008) Project Group#22 Ajay Choudhari, Avik Sinharoy, Min Zhang, Mohit Jain Smart Seek.
1 Google: Case Study cs430 lecture 15 03/13/01 Kamen Yotov.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Search Engine Architecture
利用iBeacon設計感知教室學習活動歷程雲端服務系統應用於翻轉教室教學 Using iBeacon to Develop a Cloud-based Awareness Classroom Learning Activity Portfolio System Applied in Flipped Classroom.
Building Search Systems for Digital Library Collections
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Martin Rajman, EPFL Switzerland & Martin Vesely, CERN Switzerland
Data Mining Chapter 6 Search Engines
Chapter 5: Information Retrieval and Web Search
The Search Engine Architecture
Information Retrieval and Web Design
Edit Distance 張智星 (Roger Jang)
Presentation transcript:

義守大學資訊工程學系 作者:郭東黌, 張佑康 報告人:徐碩利 Date: 2006/11/01 2018/2/15 適用於FTP之全文檢索系統實作與分析 Implementation and Analysis of an FTP-specific Full-text Search System 義守大學資訊工程學系 作者:郭東黌, 張佑康 報告人:徐碩利 Date: 2006/11/01

Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Outline Introduction Proposed architecture Experimental result and analysis Conclusion Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Introduction FTP sites with large amount of files Filename search Full-text search Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Problem Indexing a large amount of data Improving search performance Reforming ranking results Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Purpose FTP-specific full-text search system Performance benchmark Precision evaluation Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Current systems Google Proposed architecture in 1998 Large-scale web search engine Inverted index scheme Gais FTPLocate SmartArchie An intelligent file search engine ProxyLog-based file search engine Implementation and Analysis of an FTP-specific Full-text Search System

Proposed architecture 2018/2/15 Proposed architecture Data indexing (data crawler) File list retrieval and analysis File description extraction from FTP File name and description indexing Query processing Query pre-parsing Data searching Result sorting Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System

Data indexing (data crawler) File list retrieval and analysis File list retrieval in many ways FTP – LIST Rsync Unix – find、ls File description extraction Specific formats of file descriptions XML Package Metadata RFC-index 00_index Implementation and Analysis of an FTP-specific Full-text Search System

Data indexing (data crawler) File name and description indexing Simple string segmentation method Separating Chinese serial words into Chinese characters one by one Separating English words by space or symbols Full-text indexing Based on the modified full-text parser of MySQL Multi-language full-text index support Stopwords Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Query processing Query pre-parsing Chinese segmentation with exhaustive search using Libtabe dictionary Term frequency clustering Using k-means algorithm Score computation Group A is required, Group B is optional Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System k-means clustering Implementation and Analysis of an FTP-specific Full-text Search System

Query results cache method 2018/2/15 Query results cache method Cache method Query segmentation and clustering MD5 Benefits Reducing search time Avoiding duplicated search Implementation and Analysis of an FTP-specific Full-text Search System

Experimental result and analysis 2018/2/15 Experimental result and analysis Platform CPU:Dual P4 2.6 GHz Ram:8 GBytes OS:Gentoo Linux (Linux 2.4.25) Storage:3320.26GB used Statistics Files:5,414,639 Segmented terms:219,167,352 Terms in the dictionary:236,275 Implementation and Analysis of an FTP-specific Full-text Search System

Frequency distribution Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System Average search time Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Search result (1) Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Search result (2) Implementation and Analysis of an FTP-specific Full-text Search System

Performance evaluation 2018/2/15 Performance evaluation Search and download times Hit rate Implementation and Analysis of an FTP-specific Full-text Search System

Search and download times Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System Hit rate Implementation and Analysis of an FTP-specific Full-text Search System

Implementation and Analysis of an FTP-specific Full-text Search System 2018/2/15 Conclusion We implement the method on I-Shou University FTP server http://ftp.isu.edu.tw Effects using full-text search on FTP server is better than traditional search. The average of hit rate is greater than 0.6 Thank you for your attention! Implementation and Analysis of an FTP-specific Full-text Search System