Download presentation
Presentation is loading. Please wait.
1
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National Science Foundation
2
Motivation Web information is stored in databases Databases are accessed through forms Automated agents are of great value Process is difficult because of nature of forms
3
System Flowchart Input Analyzer Retrieved Page(s) Application Ontology User Query Site Form Output Analyzer Extracted Information
4
User Query Acquisition Our system provides a form created based on application-specific ontology
5
Site Form Analysis Understand type, name, and/or values for each field
6
Form Filling Name matching Regular Expressions – for fields with values provided Stemming Levenshtein Edit Distance Longest Common Subsequences Soundex Wordnet Value matching
7
Value Matching: Case 1
8
Value Matching: Case 2 ? ?
9
Value Matching: Case 3 Color? ? ?
10
Value Matching: Case 4
11
Value Matching: Case 5 ?
12
Value Matching: Case 6
13
Value Matching: Case 7
14
Measurements Matching Efficiency Submission Efficiency Post-processing Efficiency
15
Measurements (cont’) Matching Efficiency
16
Measurements (cont’) Matching Efficiency Submission Efficiency
17
Measurements (cont’) Matching Efficiency Submission Efficiency Post-processing Efficiency
18
Contributions It enhances the effectiveness of the data- extraction process It presents another technique, in addition to [RGa01], to access data behind HTML forms.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.