Download presentation
Presentation is loading. Please wait.
1
Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF
2
Motivation Semi-structured Web data need to be extracted for further manipulations. Contrast to other wrapper generation techniques, BYU ontology-based data-extraction technique is resilient. By-Example approach makes it possible to help common users generate ontologies easily.
3
Web-based System GUI CanonPowerShot S40 4.01600 x 1200 1024 x 768 640 x 480
4
Architecture Data Frame Library User Defined Form System GUI Sample Pages Ontology Generator Extraction EngineTest PagesPopulated Database Extraction Ontology
5
Extraction Ontology Object and Relationship Sets and Constraints Extraction Patterns Keywords Context Expressions
6
Base A B C D1D2 E1E2 Base [0:1] A [1:*] Base [0:2] B [1:*] Base [0:*] C [1:*] Base [0:2] D1 [1:*] D2 [1:*] Base [0:*] E1 [1:*] E2 [1:*] Ontology Generation Object and Relationship Sets and Constraints
7
Base A B … A B1 B2 B1, B2 : B G HI F A [0:1] F [1:*] B1 [0:1] G [1:*] B2 [0:1] H [1:*] I [1:*] Ontology Generation Object and Relationship Sets and Constraints
8
Sample Web PageUser Created Form CCD ResolutionImage Resolution Optical Zoom Digital Zoom Digital Camera Brand Model Zoom PowerShot G2Canon 4.02272 x 1074 3 2 Object and Relationship Sets and Constraints DigitalCamera [-> object] DigitalCamera [0:1] Brand [1:*] DigitalCamera [0:1] Model [1:*] DigitalCamera [0:1] CCDResolution [1:*] DigitalCamera [0:1] ImageResolution [1:*] DigitalCamera [0:1] Zoom [1:*] Zoom [0:1] DigitalZoom [1:*] Zoom [0:1] OpticalZoom [1:*]
9
Ontology Generation Extraction Patterns Data Frame Library Lexicons Synonym Dictionaries or thesauri Regular Expressions Matching extraction patterns: Only one (bingo!) More than one (use extraction pattern filters) No matching extraction pattern (create one)
10
Features a high-quality 4.0 Megapixel Resolution CCD The new Nikon Coolpix 995 boasts of a 3.34 Megapixel CCD 3 effective megapixel Ontology Generation Keywords
11
3.5x optical zoom (2.5x digital) a superior 4x Optical Zoom Nikkor lens, plus 4x stepless digital zoom optical 3X /digital 6X zoom Ontology Generation Context Expressions
12
DigitalCamera [-> object]; DigitalCamera [0:1] Brand [1:*]; DigitalCamera [0:1] ImageResolution [1:*]; DigitalCamera [0:1] Zoom [1:*]; DigitalCamera [0:1] CCDResolution [1:*]; Zoom[0:1] OpticalZoom[1:*]; Brand matches [10] constant{ extract "\bNikon\b";}, { extract "\bCanon\b";}, { extract "\bOlympus\b";}, { extract "\bMinolta\b";}, { extract "\bSony\b";}; end; CCD Resolution matches [20] constant{ extract "\b\d(\.\d{1,2})?\b"; }; keyword "\bMegapixel\b“, "\bCCD\b", "\bCCD Resolution\b"; end; OpticalZoom matches [10] constant{ extract "\b\d(\.\d)"; context "\b\d(\.\d)?(x)\b"; }; keyword "\boptical\b"; end; Extraction Ontology
13
Measurements How much of the ontology was generated with respect to how much could have been generated? How many components generated should not have been generated? What comparisons can we make about the precision and recall ratios of extraction data between a system- generated ontology and an expert-generated ontology? How many sample pages are necessary for acceptable system performance?
14
Contributions Proposes a by-example approach to semi- automatically generate data-extraction ontologies Constructs a Web-based tool to generate data-extraction ontologies
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.