"19"; },. End;"> "19"; },. End;">

Presentation is loading. Please wait.

Presentation is loading. Please wait.

Filtering Multiple-Record Web Documents Based on Application Ontologies Presenter: L. Xu Advisor: D.W.Embley.

Similar presentations


Presentation on theme: "Filtering Multiple-Record Web Documents Based on Application Ontologies Presenter: L. Xu Advisor: D.W.Embley."— Presentation transcript:

1 Filtering Multiple-Record Web Documents Based on Application Ontologies Presenter: L. Xu Advisor: D.W.Embley

2 Examples D1: CarD2: Item for Sale or Rent

3 Car Ontology Car[->object]; Car[0..0.975..1] has Year; Car[0..0.925..1] has Make; Car[0..0.908..1] has Model; Car[0..0.45..1] has Mileage; Car[0..2.1..*] has Feature; Car[0..0.8..1] has Price; PhoneNr is for Car[1..1.15..*]; Year matches [4] constant {extract “\d{2}”; context "([^\$\d]|^)[4-9]\d,[^\d]"; substitute "^" -> "19"; },. End;

4 Filtering Heuristics H1: Density H2: Expected-values H3: Grouping

5 H1: Density Car Total Number of Characters: 2048 Number of Matched Characters: 626 Density: 0.306 Item for Rent or Sale Total Number of Characters: 196 Number of Matched Characters: 2671 Density: 0.073

6 H2: Expected-values OV D1D2 Year 0.98 16 6 Make 0.93 10 0 Model 0.91 12 0 Mileage 0.45 6 2 Price 0.80 11 8 Feature 2.10 29 0 PhoneNr 1.15 1511 D1: 0.996 D2: 0.567 ov D1 D2

7 H3: Grouping Year: 2000 Year: 1989 Make: Subaru Model: SW------ Nr of Distinct "One Max" Object:3 Price: 1900 Year: 1998 Model: Elantra Year: 1994------ Nr of Distinct "One Max" Object:3. Grouping Factor is: 0.865 Year: 1999 Year: 1998 Year: 1960 Mileage: 10000 Nr of Distinct "One Max" Object:2 Mileage: 401000 Year: 1940 Price: 17500 Year: 10971 Nr of Distinct "One Max" Object: 3. Grouping Factor is: 0.5

8 Combining Heuristics Decision tree learning algorithm C4.5 –Learning task: suitability –Performance measure: accuracy –Training experience: human classified documents Training set –20 positive examples (from 10 geographical regions of US States) –30 negative examples Test set –10 positive examples –20 negative examples

9 Generated Rules Car application –H2 <= 0.8767:NO –H2 > 0.8767:YES Obituary application –H2 <= 0.6793:NO –H2 > 0.6793 –| H1 <= 0.2171:NO –| H1 > 0.2171:YES Universal rule –H3 <= 0.625 –| H1 <= 0.369: NO –| H1 > 0.369 –| | H2 <= 0.6263: NO –| | H2 > 0.6263: YES –H3 > 0.625: YES

10 Experiment Results Car application –accuracy96.7% –precision100% –recall91% Obituary application –accuracy96.7% –precision91% –recall100% Universal rule –accuracy93.4% –precision84% –recall100%

11 False Drop Example

12 False Positive Example

13 Summary Objective : Automatically filter multiple-record web documents. Approach: Filtering heuristics –Density –Expected-values –Grouping Result : ~95% accuracy


Download ppt "Filtering Multiple-Record Web Documents Based on Application Ontologies Presenter: L. Xu Advisor: D.W.Embley."

Similar presentations


Ads by Google