Employing Web Search indexing for fast creation of filtered view of large text files Mostafa Agbaria, Ahmad Atamlh Department of Electrical engineering, Technion Software System Laboratory, Spring 2010 Supervisor : Oved Itzhak, Lab Engineer : Dr.Ilana David Multi-Threaded V.S Single-Thread Single Thread running Page Fault,Disk access, CPU idle. In ideal world Single Thread Thread 1 running Multi Threading Thread 2 running Thread 3 running Thread 4 running Time Abstract The following figure shows the time for building the database using various number of threads (file size = 100Mb). Multi Threaded Indexing In this project we plan to implement a new type of Index to the VLTFV Application that supports fast creation of filtered view of large text files using a Web Search Indexing technique. The implementation is in Microsoft.NET and C#. Creating a database using inverted indexing for pre-processing the data in the log files, by this providing the user with easy and fast way to search the log file. Project Goals The indexer takes more time to build the database than expected using serial parsing. We built the database using Multi-Threading, meaning that the indexing of the file made in parallel using specific number of threads, each indexing a different part of the file, for faster indexing. Each thread Creates new database for its section in the file Sends the database to Web Technique Searcher. After getting all the sub-databases, we merge them into a Main Database. Summary Using the plug-in that have been developed in this project make the searching and the inspecting in very large text file easier and faster and more reliable, using an Advance Algorithm based on Web indexing Technique with the use of the VLTF, making the process of the switching between lines in such large text file more practical for humans. The conventional approach previously used requires going over the entire text file to perform the search, which is time consuming and not practical. This originate a pre-processing for the text file, it can enable us to perform a search in a faster and more reliable way. The index which is the pre- processed database solve the problem of speed and doesn't require us for going over the entire file and from here the save of time is gotten. Pre-Processing Data Sub Database Main Database Inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database.index data structuredatabase filefull text searches Inverted Indexing User Interface Open File Go to Line SearchSearch Conventional Scroll Bar Scroll Knob Line Numbers Search Results Pane Progress Bar File lines counter Text view area In today’s Internet-scale services it’s not uncommon to have logs that contain huge amounts of data. Inspecting such logs can easily overwhelm a human. Therefore, specialized tools that make it easier to manage all the data are essential. In this project we implement a Plug-in to the existing VLTF application which takes the text file and creates an Index that enables very fast search in the file, using inverted indexing. The VLTF provides the GUI for searching and quickly navigating to the found locations in the text file.VLTF Very Large Text File Viewer As network bandwidth increase, network servers (e.g. Web, Mail etc) create exceedingly large log files. The problem of searching in such files resembles the Web Search problem were it is prohibitively long to search all the data simplistically. This project is continuing for VLTFV project (Very Large Text File Viewer), Application responsiveness is independent of input file size. Background