Download presentation
Presentation is loading. Please wait.
1
Networked Software Systems Laboratory
Department of Electrical engineering, Technion Employing Web Search indexing for fast creation of filtered view of large text files Students : Agbaria Mostafa, Atamlh Ahmad. Superviser : Oved Itzhak Lab Engineer : Dr .Ilana David
2
Background As network bandwidth increase , network servers (e.g. Web, Mail etc) create exceedingly large log files . The problem of searching in such files resembles the Web Search problem were it is prohibitively long to search all the data simplistically. This project is continuing for VLTFV project (Very Large Text File Viewer), Application responsiveness is independent of input file size.
3
Motivation Log requires human inspection for analyzing incidents as well as getting insight into the server operation for tuning. Inspecting very large log files verbatim by humans is impractical . Simplistic filtering (a-la grep) requires going over the entire file for every filter. which is very time consuming.
4
Project Goals In this project we plan to implement a new type of Index to the VLTFV Application that supports fast creation of filtered view of large text files using a Web Search Indexing technique. The implementation is in Microsoft .NET and C#. Creating a database using inverted indexing for pre-processing the data in the log files, by this providing the user with easy and fast way to search the log file .
5
The reverse-index data structure
Dictionary of words Lists of containing lines . The dictionary store all the different words. Each one has a list containing lines . List of appearances : contains lines’ number for the word’ s appearance.
6
Architecture
7
Sequence Overview (initialization)
8
Multi-Threaded Indexing
Problem Run-time is very important. Run-time is mainly due to I/O requests and not CPU processing. The indexer takes more time to build the database than expected. Solution : Multi-Threaded Indexing
9
Multi-Threaded Indexing
We built the database using Multi-Threading, meaning that the indexing of the file made in parallel using specific number of threads, each indexing a different part of the file, for faster indexing. Each thread Creates new database for its section in the file Sends the database to Web Technique Searcher. After getting all the sub-databases, we merge them into a Main Database. The following figure shows the time for building the database using various number of threads (file size = 100Mb). Time[sec] Threads Number
10
Multi-Threaded V.S Single-Thread
Single Thread running Single Thread Page Fault ,Disk access, CPU idle. Time In ideal world Thread 1 running Multi Threading Thread 2 running Thread 3 running Thread 4 running
11
Testing Using Multi-Threaded indexing originates necessity for cautious checking. Partitioning the file to different threads, based on size in bytes, presents several interesting cases, e.g. : The chunk ends at the middle of the line. Line ends on exactly the end of chunk. File size smaller than chuck size (1). Trying to use more threads than permitted. Storing the correct chunks’ total lines. Empty lines. Changing the original file with additional lines at the end of the file. Serialization & De-Serialization. (1) chunk size was set to be 1Mb and used for setting the number of threads running in the program. Note that there is limitation for the maximum threads number.
12
Web Technique Searching Plug-In
* When starting the program, the user can choose the Web Technique Searching.
13
3.4. Conventional Scroll Bar
User Interface Design 3.1. Open File 3.2. Go to Line 3.3. Search 3.4. Conventional Scroll Bar 3.5. Scroll Knob 3.6. Line Numbers 3.7. Search Results Pane 3.9. Progress Bar 3.8. File lines counter 3.10 Text view area
14
Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.