Download presentation
Presentation is loading. Please wait.
1
Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF
2
Motivation Semi-structured Web data need to be extracted for further manipulations. Contrast to other wrapper generation techniques, BYU ontology-based data-extraction technique is resilient. By-Example approach makes it possible to help common users generate ontologies easily.
3
Web-based System GUI Canon PowerShot S40 4.0 1600 x 1200 1024 x 768 640 x 480
4
Architecture Data Frame Library User Defined Form System GUI Sample Pages Ontology Generator Extraction EngineTest PagesPopulated Database Extraction Ontology
5
Extraction Ontology Object and Relationship Sets and Constraints Extraction Patterns Keywords and Context Expressions
6
Ontology Generation Object and Relationship Sets and Constraints Base [0:1] A [1:*] Base [0:2] B [1:*] Base [0:2] D1 [1:*] D2 [1:*] Base [0:*] C [1:*] Base [0:*] E1 [1:*] E2 [1:*]
7
Ontology Generation Object and Relationship Sets and Constraints A [0:1] F [1:*] B1 [0:1] G [1:*] B2 [0:1] H [1:*] I [1:*] … …… … B1, B2 : B
8
Ontology Generation Extraction Patterns Data Frame Library Lexicons Synonym Dictionaries or thesauri Regular Expressions Matching extraction patterns: Only one More than one (use extraction pattern filters) None (create one)
9
3.5x optical zoom (2.5x digital) a superior 4x Optical Zoom Nikkor lens, plus 4x stepless digital zoom optical 3X /digital 6X zoom Ontology Generation Keywords and Context Expressions
10
User Defined Forms Object and Relationship Sets and Constraints DigitalCamera [-> object] DigitalCamera [0:1] Brand [1:*] DigitalCamera [0:1] Model [1:*] DigitalCamera [0:1] CCDResolution [1:*] DigitalCamera [0:1] ImageResolution [1:*] DigitalCamera [0:1] Zoom [1:*] Zoom [0:1] DigitalZoom [1:*] Zoom [0:1] OpticalZoom [1:*] Sample Web Page PowerShot G2 Canon 4.0 2272 x 1074 3 2
11
DigitalCamera [-> object]; DigitalCamera [0:1] Brand [1:*]; DigitalCamera [0:1] ImageResolution [1:*]; DigitalCamera [0:1] Zoom [1:*]; DigitalCamera [0:1] CCDResolution [1:*]; Zoom[0:1] OpticalZoom[1:*]; Brand matches [10] constant{ extract "\bNikon\b";}, { extract "\bCanon\b";}, { extract "\bOlympus\b";}, { extract "\bMinolta\b";}, { extract "\bSony\b";}; end; CCD Resolution matches [20] constant{ extract "\b\d(\.\d{1,2})?\b"; }; keyword "\bMegapixel\b”, "\bCCD\b", "\bCCD Resolution\b"; end; OpticalZoom matches [10] constant{ extract "\b\d(\.\d)"; context "\b\d(\.\d)?(x)\b"; }; keyword "\boptical\b"; end; Extraction Ontology
12
Measurements How much of the ontology was generated with respect to how much could have been generated? How many components generated should not have been generated? What comparisons can we make about the precision and recall ratios of extraction data between a system- generated ontology and an expert-generated ontology? How many sample pages are necessary for acceptable system performance?
13
Contributions Proposes a by-example approach to semi- automatically generate data-extraction ontologies Constructs a Web-based tool to generate data-extraction ontologies
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.