Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC.

Slides:



Advertisements
Similar presentations
Munich IETF, August 1997 Fluid A Java Version of Nifty Siegfried Löffler Rechenzentrum Universität Stuttgart.
Advertisements

Technical and design issues in implementation Dr. Mohamed Ally Director and Professor Centre for Distance Education Athabasca University Canada New Zealand.
Chapter 17: WEB COMPONENTS
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 22 World Wide Web and HTTP.
Web Visualization Technology Horner APG Ver 1.0.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Kyung Hee University 1 1 Application Layer. 2 Kyung Hee University Position of Application Layer.
Technical Architectures
1 Pertemuan 13 Servers for E-Business Matakuliah: M0284/Teknologi & Infrastruktur E-Business Tahun: 2005 Versi: >
© 2009 Research In Motion Limited Methods of application development for mobile devices.
INTERNET DATABASE Chapter 9. u Basics of Internet, Web, HTTP, HTML, URLs. u Advantages and disadvantages of Web as a database platform. u Approaches for.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Portal Technologies An overview of portal products and other software.
Passage Three Introduction to Microsoft SQL Server 2000.
Understanding and Managing WebSphere V5
UNIT-V The MVC architecture and Struts Framework.
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
1 Modular Software/ Component Software 2 Modular Software Code developed in modules. Modules can then be linked together to produce finished product/program.
Configuration Management and Server Administration Mohan Bang Endeca Server.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
ASP.NET + Ajax Jesper Tørresø ITNET2 F08. Ajax Ajax (Asynchronous JavaScript and XML) A group of interrelated web development techniques used for creating.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Internet, intranet, and multimedia database processing l Database processing across local and wide area networks l Alternative architectures for distributing.
M1G Introduction to Database Development 6. Building Applications.
Master Thesis Defense Jan Fiedler 04/17/98
Victor Mushkatin, MCSE, MCSD CORPORATION Alexander Zakonov, MCSE, MCSD Stephen Pelletier, MCSE.
Ramiro Voicu December Design Considerations  Act as a true dynamic service and provide the necessary functionally to be used by any other services.
Copyright, 1996 © Dale Carnegie & Associates, Inc. Presented by Hsiuling Hsieh Christine Liu.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
SE-02 COMPONENTS – WHY? Object-oriented source-level re-use of code requires same source code language. Object-oriented source-level re-use may require.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
Proprietary & Confidential Java WebStart Created by Bob Hays.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Framework for Virtual Web Laboratory I. Petković M. Rajković.
Web Technologies Lecture 8 Server side web. Client Side vs. Server Side Web Client-side code executes on the end-user's computer, usually within a web.
JS (Java Servlets). Internet evolution [1] The internet Internet started of as a static content dispersal and delivery mechanism, where files residing.
1 Applets are small applications that are accessed on an Internet server, transported over the internet, automatically installed and run as a part of web.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
REST By: Vishwanath Vineet.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
IPS Infrastructure Technological Overview of Work Done.
Module 1: Introduction to Microsoft SQL Server Reporting Services
Google Code Libraries Dima Ionut Daniel. Contents What is Google Code? LDAPBeans Object-ldap-mapping Ldap-ODM Bug4j jOOR Rapa jongo Conclusion Bibliography.
The Internet Salihu Ibrahim Dasuki (PhD) CSC102 INTRODUCTION TO COMPUTER SCIENCE.
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 22 World Wide Web and HTTP.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
The Holmes Platform and Applications
J2EE Platform Overview (Application Architecture)
The Object-Oriented Thought Process Chapter 13
Chapter 8 Environments, Alternatives, and Decisions.
WWW and HTTP King Fahd University of Petroleum & Minerals
Understanding SOAP and REST calls The types of web service requests
Java Servlets By: Tejashri Udavant..
Open Source distributed document DB for an enterprise
Processes The most important processes used in Web-based systems and their internal organization.
PHP / MySQL Introduction
WEB API.
Chapter 27 WWW and HTTP.
Distributed System Using Java 2 Enterprise Edition (J2EE)
COMPONENTS – WHY? Object-oriented source-level re-use of code requires same source code language. Object-oriented source-level re-use may require understanding.
Performance and Scalability Issues of Multimedia Digital Library
Presentation transcript:

Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC

Introduction Since the WWW has grown exponentially, data retrieval has become an intensive research topic. However, mechanisms and tools that give users more power over the data on the web have not grown in parallel with data increase. For example, no tools exists that allow the user to extract data from HTML context and use in an external application.

A tool created to allow a more coherent and wider set of automatic data extraction, was the Data Extractor system, which treats any Web sites as a data source. Data extractor has two kinds of implementation, as a standalone server solution and a set of functionality that can be embedded in applications and provide them with data from the internet.

Data Extractor Inefficiencies Performance in multi client conditions Network performance issues Legal issues Installing exclusive local server for clients is a, however, it is expensive. Our alternative, is MDRA: Mobile Data Retrieval Agents.

MDRA Composition and Delivery The mobile agents server, contains a wrapper portal and a knowledgebase Functionality is as follows: 1) Users connect to wrapper portal and request wrapper 2) In response, package to extract data is constructed and sent to client 3) Data extraction takes place in client

Wrapper portals: List and package wrappers, authenticates users, and allows them to change and save their queries (references to wrappers). Knowledgebase: Contains information about available wrappers, their parameters and status. Wrappers can be thought of as lightweight programs which use a predefined OO library to “strip” desired information.

MDRA Architecture Mobile wrapper controller: Responsible for controlling behavior of wrappers and flow of data Wrappers: Same as the ones used in Data Extractor, process which strips data from web site. Data Extraction Library: Contains functionality essential for extraction and network operations. Compact; can be cached if no update is required. Outer packaging: Interface for uniting numerous wrappers and controllers.

How does execution take place? 1)Query formulation 2)Agent construction and delivery 3)Agent Execution 4)Data Delivery

Query Formulation User connects to wrapper portal, wrappers are listed, user selects desired wrapper(s) as well configures execution parameters. This configuration can be saved for future reference.

Agent construction and delivery Wrapper portal begins packaging including outer packaging module, wrapper parameter information, wrapper controller, wrapper and Data Extraction Library. Components that change frequently are packaged separately from the one that do (aids caching). Compression or digital signatures take place.

Agent execution Once delivered to the client, wrappers interact with WWW sites, and extract the desired data. Data is passed to outer packaging controller where it can be used in applications or stored in various mediums.

Data Delivery Data retrieved may be transferred to other applications programmatically, stored in various mediums (Excel, XML, Text), or stored in databases. May be used for statistical data collection.

Source Code Implementation Because the system needs to have a high degree of portability, JAVA language was used to perform the implmentation. Previous Data Extractor was written in Java, so in order to reuse modules, JAVA was again used. Speed Performance issues were addressed [7].

MDRA Framework In order to deliver MDRA to clients, the method of delivery is that of a Java Applet. Applets allow to portability which allows clients of different platforms to participate in this data retrieval. Since framework code and libraries do not change often, browsers that cache java applets will keep parts that do not change

Security Applets must be digitally signed in order to for them to access system and network resources needed for the retrieval. Proxy servers may be created where the applet was downloaded from in order to give Applets ability to download third party web sites. However, this option is prone to a high bottleneck congestion.

Conclusion MDRA “lease” data extraction services to users, which retrieve data that can be exported to other applications, This distributed approach takes the load on the centralized server architecture. Future research includes different MDRA implementations (standalone, embedded in client side), and tuning of agent performance.