Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC.

Similar presentations


Presentation on theme: "Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC."— Presentation transcript:

1 Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC

2 Introduction Since the WWW has grown exponentially, data retrieval has become an intensive research topic. However, mechanisms and tools that give users more power over the data on the web have not grown in parallel with data increase. For example, no tools exists that allow the user to extract data from HTML context and use in an external application.

3 A tool created to allow a more coherent and wider set of automatic data extraction, was the Data Extractor system, which treats any Web sites as a data source. Data extractor has two kinds of implementation, as a standalone server solution and a set of functionality that can be embedded in applications and provide them with data from the internet.

4 Data Extractor Inefficiencies Performance in multi client conditions Network performance issues Legal issues Installing exclusive local server for clients is a, however, it is expensive. Our alternative, is MDRA: Mobile Data Retrieval Agents.

5 MDRA Composition and Delivery The mobile agents server, contains a wrapper portal and a knowledgebase Functionality is as follows: 1) Users connect to wrapper portal and request wrapper 2) In response, package to extract data is constructed and sent to client 3) Data extraction takes place in client

6 Wrapper portals: List and package wrappers, authenticates users, and allows them to change and save their queries (references to wrappers). Knowledgebase: Contains information about available wrappers, their parameters and status. Wrappers can be thought of as lightweight programs which use a predefined OO library to “strip” desired information.

7 MDRA Architecture Mobile wrapper controller: Responsible for controlling behavior of wrappers and flow of data Wrappers: Same as the ones used in Data Extractor, process which strips data from web site. Data Extraction Library: Contains functionality essential for extraction and network operations. Compact; can be cached if no update is required. Outer packaging: Interface for uniting numerous wrappers and controllers.

8 How does execution take place? 1)Query formulation 2)Agent construction and delivery 3)Agent Execution 4)Data Delivery

9 Query Formulation User connects to wrapper portal, wrappers are listed, user selects desired wrapper(s) as well configures execution parameters. This configuration can be saved for future reference.

10 Agent construction and delivery Wrapper portal begins packaging including outer packaging module, wrapper parameter information, wrapper controller, wrapper and Data Extraction Library. Components that change frequently are packaged separately from the one that do (aids caching). Compression or digital signatures take place.

11 Agent execution Once delivered to the client, wrappers interact with WWW sites, and extract the desired data. Data is passed to outer packaging controller where it can be used in applications or stored in various mediums.

12 Data Delivery Data retrieved may be transferred to other applications programmatically, stored in various mediums (Excel, XML, Text), or stored in databases. May be used for statistical data collection.

13 Source Code Implementation Because the system needs to have a high degree of portability, JAVA language was used to perform the implmentation. Previous Data Extractor was written in Java, so in order to reuse modules, JAVA was again used. Speed Performance issues were addressed [7].

14 MDRA Framework In order to deliver MDRA to clients, the method of delivery is that of a Java Applet. Applets allow to portability which allows clients of different platforms to participate in this data retrieval. Since framework code and libraries do not change often, browsers that cache java applets will keep parts that do not change

15 Security Applets must be digitally signed in order to for them to access system and network resources needed for the retrieval. Proxy servers may be created where the applet was downloaded from in order to give Applets ability to download third party web sites. However, this option is prone to a high bottleneck congestion.

16 Conclusion MDRA “lease” data extraction services to users, which retrieve data that can be exported to other applications, This distributed approach takes the load on the centralized server architecture. Future research includes different MDRA implementations (standalone, embedded in client side), and tuning of agent performance.


Download ppt "Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC."

Similar presentations


Ads by Google