Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National.

Similar presentations


Presentation on theme: "Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National."— Presentation transcript:

1 Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National Science Foundation

2 Motivation Web information is stored in databases Databases are accessed through forms Automated agents are of great value Process is difficult because of nature of forms

3 System Flowchart Input Analyzer Retrieved Page(s) Application Ontology User Query Site Form Output Analyzer Extracted Information

4 User Query Acquisition Our system provides a form created based on application-specific ontology

5 Site Form Analysis Understand type, name, and/or values for each field

6 Form Filling Name matching Regular Expressions – for fields with values provided Stemming Levenshtein Edit Distance Longest Common Subsequences Soundex Wordnet Value matching

7 Value Matching: Case 1

8 Value Matching: Case 2 ? ?

9 Value Matching: Case 3 Color? ? ?

10 Value Matching: Case 4

11 Value Matching: Case 5 ?

12 Value Matching: Case 6

13 Value Matching: Case 7

14 Measurements Matching Efficiency Submission Efficiency Post-processing Efficiency

15 Measurements (cont’) Matching Efficiency

16 Measurements (cont’) Matching Efficiency Submission Efficiency

17 Measurements (cont’) Matching Efficiency Submission Efficiency Post-processing Efficiency

18 Contributions It enhances the effectiveness of the data- extraction process It presents another technique, in addition to [RGa01], to access data behind HTML forms.


Download ppt "Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National."

Similar presentations


Ads by Google