Networked Software Systems Laboratory

Slides:



Advertisements
Similar presentations
Sharpdesk Overview Desktop Composer Search Imaging      
Advertisements

Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
N-gram Search Engine on Wikipedia Satoshi Sekine (NYU) Kapil Dalwani (JHU)
Chapter 10 Performance and Reliability. Objectives Explain performance, workload, throughput, capacity, response time, and latency Describe a process.
Toolbox Mirror -Overview Effective Distributed Learning.
Presented By: Shatsman Yuri Leibovitz Amit Supervised By: Oved Itzhak Winter 2009/10 Networked Software Systems Lab, EE Department, Technion – Israel Institute.
Task Scheduling and Distribution System Saeed Mahameed, Hani Ayoub Electrical Engineering Department, Technion – Israel Institute of Technology
People Technical AdvisorsAcademic AdvisorFinal Project By Prof. Shlomi Dolev Prof. Ehud Gudes Boaz Hilemsky Dr. Aryeh Kontorovich Moran Cohavi Gil Sadis.
CADDLAB Medical Imaging on Remote Compute Servers.
CompuNet Grid Computing Milena Natanov Keren Kotlovsky Project Supervisor: Zvika Berkovich Lab Chief Engineer: Dr. Ilana David Spring, /
Employing Web Search indexing for fast creation of filtered view of large text files Mostafa Agbaria, Ahmad Atamlh Department of Electrical engineering,
Software Systems Lab Department of Electrical Engineering Technion - Israel Institute of Technology By: David Nasi & Amitay Svetlit Supervisor: Oved Itzhak.
Kerim KORKMAZ A. Tolga KILINÇ H. Özgür BATUR Berkan KURTOĞLU.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
1 DOS with Windows 3.1 and 3.11 Operating Environments n Designed to allow applications to have a graphical interface DOS runs in the background as the.
SQL Forms Engine Koifman Eran Egri Ozi Supervisor: Ilana David.
1 Introduction to Web Development. Web Basics The Web consists of computers on the Internet connected to each other in a specific way Used in all levels.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
A Web Crawler Design for Data Mining
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Improving Efficiency of I/O Bound Systems More Memory, Better Caching Newer and Faster Disk Drives Set Object Access (SETOBJACC) Reorganize (RGZPFM) w/
ENHANCED MONITORING TOOL PROJECT Project Presentation By: David Nasi & Amitay Svetlit Supervisor: Oved Itzhak Software Systems Lab Department of Electrical.
10/25/20151 Single Sign-On Web Service Supervisors: Viktor Kulikov Alexander Sherman Liana Lipstov Pavel Bilenko.
Parallel Database Systems Instructor: Dr. Yingshu Li Student: Chunyu Ai.
Producing a high-impact web experience by integrate Macromedia Flash and ASP By Katie Tuttle CS 330: Internet Architecture and Programming Project.
1 MSRBot Web Crawler Dennis Fetterly Microsoft Research Silicon Valley Lab © Microsoft Corporation.
K-tree/forest: Efficient Indexes for Boolean Queries Rakesh M. Verma and Sanjiv Behl University of Houston
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Copyright Sammamish Software Services All rights reserved. 1 Prog 140  SQL Server Performance Monitoring and Tuning.
Configuring SQL Server for a successful SharePoint Server Deployment Haaron Gonzalez Solution Architect & Consultant Microsoft MVP SharePoint Server
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Retele de senzori Curs 1 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
Supercomputing versus Big Data processing — What's the difference?
Getting the Most out of Scientific Computing Resources
Getting the Most out of Scientific Computing Resources
Table General Guidelines for Better System Performance
Processes and threads.
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Diskpool and cloud storage benchmarks used in IT-DSS
MCTS Guide to Microsoft Windows 7
William Stallings Computer Organization and Architecture
Fastdroid Produced by : Firas Abdalhaq Mohammad Amour Supervised by : Dr. Raed Alqadi.
Database Performance Tuning and Query Optimization
Operating System Concepts
MG4J – Managing GigaBytes for Java Introduction
Operating Systems (CS 340 D)
Programmable Logic Controllers (PLCs) An Overview.
湖南大学-信息科学与工程学院-计算机与科学系
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
Building a Database on S3
Table General Guidelines for Better System Performance
Starting Design: Logical Architecture and UML Package Diagrams
In Memory OLTP Not Just for OLTP.
Multithreaded Programming
Operating Systems (CS 340 D)
Prof. Leonardo Mostarda University of Camerino
Chapter 11 Database Performance Tuning and Query Optimization
THE GOOGLE FILE SYSTEM.
Performance And Scalability In Oracle9i And SQL Server 2000
5/7/2019 Map Reduce Map reduce.
Database Systems (資料庫系統)
Database System Architecture
Department of Computer Science
Map Reduce, Types, Formats and Features
Presentation transcript:

Networked Software Systems Laboratory Department of Electrical engineering, Technion Employing Web Search indexing for fast creation of filtered view of large text files Students : Agbaria Mostafa, Atamlh Ahmad. Superviser : Oved Itzhak Lab Engineer : Dr .Ilana David

Background As network bandwidth increase , network servers (e.g. Web, Mail etc) create exceedingly large log files . The problem of searching in such files resembles the Web Search problem were it is prohibitively long to search all the data simplistically. This project is continuing for VLTFV project (Very Large Text File Viewer), Application responsiveness is independent of input file size.

Motivation Log requires human inspection for analyzing incidents as well as getting insight into the server operation for tuning. Inspecting very large log files verbatim by humans is impractical . Simplistic filtering (a-la grep) requires going over the entire file for every filter. which is very time consuming.

Project Goals In this project we plan to implement a new type of Index to the VLTFV Application that supports fast creation of filtered view of large text files using a Web Search Indexing technique.  The implementation is in Microsoft .NET and C#. Creating a database using inverted indexing for pre-processing the data in the log files, by this providing the user with easy and fast way to search the log file .

The reverse-index data structure Dictionary of words Lists of containing lines . The dictionary store all the different words. Each one has a list containing lines . List of appearances : contains lines’ number for the word’ s appearance.

Architecture

Sequence Overview (initialization)

Multi-Threaded Indexing Problem Run-time is very important. Run-time is mainly due to I/O requests and not CPU processing. The indexer takes more time to build the database than expected. Solution : Multi-Threaded Indexing

Multi-Threaded Indexing We built the database using Multi-Threading, meaning that the indexing of the file made in parallel using specific number of threads, each indexing a different part of the file, for faster indexing. Each thread Creates new database for its section in the file Sends the database to Web Technique Searcher. After getting all the sub-databases, we merge them into a Main Database. The following figure shows the time for building the database using various number of threads (file size = 100Mb). Time[sec] Threads Number

Multi-Threaded V.S Single-Thread Single Thread running Single Thread Page Fault ,Disk access, CPU idle. Time In ideal world Thread 1 running Multi Threading Thread 2 running Thread 3 running Thread 4 running

Testing Using Multi-Threaded indexing originates necessity for cautious checking. Partitioning the file to different threads, based on size in bytes, presents several interesting cases, e.g. : The chunk ends at the middle of the line. Line ends on exactly the end of chunk. File size smaller than chuck size (1). Trying to use more threads than permitted. Storing the correct chunks’ total lines. Empty lines. Changing the original file with additional lines at the end of the file. Serialization & De-Serialization. (1) chunk size was set to be 1Mb and used for setting the number of threads running in the program. Note that there is limitation for the maximum threads number.

Web Technique Searching Plug-In * When starting the program, the user can choose the Web Technique Searching.

3.4. Conventional Scroll Bar User Interface Design 3.1. Open File 3.2. Go to Line 3.3. Search 3.4. Conventional Scroll Bar 3.5. Scroll Knob 3.6. Line Numbers 3.7. Search Results Pane 3.9. Progress Bar 3.8. File lines counter 3.10 Text view area

Questions