Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Automatic Wrapper Constructor Agent for E-trading

Similar presentations


Presentation on theme: "An Automatic Wrapper Constructor Agent for E-trading"— Presentation transcript:

1 An Automatic Wrapper Constructor Agent for E-trading
Elektrotehniška in Računalniška Konferenca 2002 Portorož, Slovenija An Automatic Wrapper Constructor Agent for E-trading Aleksander Pivk Department of Intelligent Systems Jozef Stefan Institute Ljubljana, Slovenia 25. september 2002

2 What is an (intelligent) agent?
An intelligent agent is a computer system capable of flexible, autonomous action in some environment. Examples: Environment: internet agent, OS agent, desktop agent, www agent, etc. Task: information agent, shopping agent, interface agent, agent, notification agent, etc. PICTURE: an ongoing process, where a system takes data as an input from the env, transforms the data (performs actions) and returns the output to the environment. The process is a never-ending loop where the agent exploits the benefits of the environment dynamics. PROPERTIES: autonomy: capability of independent acting, and exhibiting control over its internal state; reactiveness: maintains an ongoing interaction with its env., and responds to changes that occur in it (in time for the response to be useful); pro-activeness: ability to generate and achieve goals, and to take the initiative, when recognizes an opportunity; social ability: ability to interact with other agents (and/or humans) via some kind of agent-communication language, and perhaps cooperate with others; intelligence: ability to acquire knowledge through learning

3 Information agents Task: Types:
access/integrate information from a variety of data sources Types: Information Retrieval Agents search engines Information Filtering Agents mail agents, news-delivery agents Information Extraction Agents wrappers Information Integration Agents meta-search engine, comparison-shopping

4 Information Extraction
IE is the task of identifying the specific fragments of a single document that constitute its core semantic content. Examples: a) from weather report  identify locations, dates, temperatures (high and low); b) from online stores  get product names, their images, and prices. Constitute – doloca,predstavlja NAME Casablanca Restaurant STREET 220 Lincoln Boulevard CITY Venice PHONE (310)

5 Wrappers A wrapper is … Why using wrappers?
a procedure or a rule that explains how to extract information from an information source tailored to a particular document collection appropriate to semi-structured information source Why using wrappers? heterogeneous information sources different styles of user interface and different formats of output display As the quantity and diversity of the information available online increases, more of the common information access tasks are done by program such as web wrappers. Wrappers faciliate access to Web-based information sources by providing a uniform querying and data extraction capability. For example, a Web wrapper for the yellow pages source can take a query for a Mexican restaurant near Marina del Rey, CA, and extract the restaurant’s name, its address and the phone number, in the same way as the information is extracted from a database.

6 Implemented Systems EMA – Employment Agent
memory-based approach hand-coded wrappers depends upon the profession ontology (domain-knowledge) ShinA – Customized Comparison Shopping Agent simple heuristic-based approach little domain-knowledge used

7 ShinA – Shopping Assistant

8 Our focus Wrapper learning in real time Little use of domain knowledge
to realize customized comparison shopper Little use of domain knowledge rather use simple heuristics exploit the characteristics of semi-structured documents Flexible and Practical handle both table-type and list-type displays handle noisy product description (missing attributes) handle single product description in multiple lines

9 Learning Query Scheme Templates
<form site= "amazon.com"> <name>searchform</name> <method>post</method> <action> <input type= "text" name="field-keywords" size=“15" /> <input type= "image" name= "Go"/> <select name= "index"> <option value= “all products" selected /> <option value= "books" /><option value= "…" /> </select> </form>

10 Learning product descriptions
Table-type display of 5 different PDU’s Task recognize each PDU recognize attributes within PDU learn rules to extract attributes PDU - Product Description Unit

11 PDU Pattern Learning: Algorithm
First phase remove irrelevant parts of HTML source (header, advertisements, footer) the remaining HTML source is broken into logical lines Second phase categorize each logical line 9 different categories (PRICE, TITLE, IMAGE, URL_LINK, TTAG, LBTAG, etc.) Third phase find most frequent pattern(s) for PDU(s) in the sequence of logical line categories

12 PDU Pattern Learning: Example
A fragment of the HTML source of the search result for the query “intelligent agent“ to Amazon bookstore. <img src=" width="80“ height="80" vspace="2" alt=""> </td> <td> <p> <a href=" --3 Intelligent Internet Agents: Agent-Based Information Discovery on the Internet --1 </a> <br> $ { 0:price; 1:title; 2:image; 3:link; 4:table tag; 5:line tag, 9:other tag; } Extracted PDU pattern:

13 Simple Heuristics Recognizing a title Recognizing a price
contains at least one query word text line that corresponds to pre-determined pattern’s title Recognizing a price contains a currency symbol ($, €) contains a currency token (EUR, SIT) contains digit(s) with relevant delimiters (‘,’; ‘.’) Recognizing an image unique image url-address within pattern Able to recognize attributes with heuristic rules examples: ISBN numbers, dates, discount rates Unable to recognize other attributes authors, review comments, recommendation status

14 Conclusion Limitations Future work query search box must exist
price information must exist extracts only a few attributes (title,price,image,link,…) Future work more use of domain knowledge (ontologies) extract other non-price attributes use of XML-based wrappers applications to other domains


Download ppt "An Automatic Wrapper Constructor Agent for E-trading"

Similar presentations


Ads by Google