"19"; },. End;"> "19"; },. End;">
Download presentation
Presentation is loading. Please wait.
1
Filtering Multiple-Record Web Documents Based on Application Ontologies Presenter: L. Xu Advisor: D.W.Embley
2
Examples D1: CarD2: Item for Sale or Rent
3
Car Ontology Car[->object]; Car[0..0.975..1] has Year; Car[0..0.925..1] has Make; Car[0..0.908..1] has Model; Car[0..0.45..1] has Mileage; Car[0..2.1..*] has Feature; Car[0..0.8..1] has Price; PhoneNr is for Car[1..1.15..*]; Year matches [4] constant {extract “\d{2}”; context "([^\$\d]|^)[4-9]\d,[^\d]"; substitute "^" -> "19"; },. End;
4
Filtering Heuristics H1: Density H2: Expected-values H3: Grouping
5
H1: Density Car Total Number of Characters: 2048 Number of Matched Characters: 626 Density: 0.306 Item for Rent or Sale Total Number of Characters: 196 Number of Matched Characters: 2671 Density: 0.073
6
H2: Expected-values OV D1D2 Year 0.98 16 6 Make 0.93 10 0 Model 0.91 12 0 Mileage 0.45 6 2 Price 0.80 11 8 Feature 2.10 29 0 PhoneNr 1.15 1511 D1: 0.996 D2: 0.567 ov D1 D2
7
H3: Grouping Year: 2000 Year: 1989 Make: Subaru Model: SW------ Nr of Distinct "One Max" Object:3 Price: 1900 Year: 1998 Model: Elantra Year: 1994------ Nr of Distinct "One Max" Object:3. Grouping Factor is: 0.865 Year: 1999 Year: 1998 Year: 1960 Mileage: 10000 Nr of Distinct "One Max" Object:2 Mileage: 401000 Year: 1940 Price: 17500 Year: 10971 Nr of Distinct "One Max" Object: 3. Grouping Factor is: 0.5
8
Combining Heuristics Decision tree learning algorithm C4.5 –Learning task: suitability –Performance measure: accuracy –Training experience: human classified documents Training set –20 positive examples (from 10 geographical regions of US States) –30 negative examples Test set –10 positive examples –20 negative examples
9
Generated Rules Car application –H2 <= 0.8767:NO –H2 > 0.8767:YES Obituary application –H2 <= 0.6793:NO –H2 > 0.6793 –| H1 <= 0.2171:NO –| H1 > 0.2171:YES Universal rule –H3 <= 0.625 –| H1 <= 0.369: NO –| H1 > 0.369 –| | H2 <= 0.6263: NO –| | H2 > 0.6263: YES –H3 > 0.625: YES
10
Experiment Results Car application –accuracy96.7% –precision100% –recall91% Obituary application –accuracy96.7% –precision91% –recall100% Universal rule –accuracy93.4% –precision84% –recall100%
11
False Drop Example
12
False Positive Example
13
Summary Objective : Automatically filter multiple-record web documents. Approach: Filtering heuristics –Density –Expected-values –Grouping Result : ~95% accuracy
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.