Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora."— Presentation transcript:

1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora S. Tsai, Wenyin Tang, Kap Luk Chan Information Sciences, InS (2010)

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Introduction Motivation Objective Methodology Compare study Experiments Conclusion Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction 3 Define Novelty?  Novelty is the opposite of “similarity ” or “redundancy” Novelty:  Given the set of relevant sentences in all documents, identify all novel sentence. How to identify Novelty sentences?  A novelty score: Measured and Scored by a novelty metric

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation 4 Sentence 1: U.S. Stocks set for big sell-off Sentence 2 (incoming sentence) : U.S. Stocks *S2 is covered by S1 Novelty(S1, S2) = 1 – similarity(S1, S2) There is low similarity between S1 and S2 SO S2 is novelty ???

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives 5 How to choose a novelty metric? How to set a suitable threshold automatically?

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology - Novelty Metrics 6 Symmetric (1 – similarity)  S1 is novelty to S2  S2 is novelty to S1 Asymmetric  S1 is not novelty to S2  S2 is novelty to S1

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology - Symmetric metrics 7 Cosine similarity Jaccard Similarity

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology - ASymmetric metrics 8 Overlap metric New word count metric

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Compare study 9 Performance Requirements (trade-off) : high (recall / precision / F-score) The distribution: (high / medium / low) novelty ratio

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Compare study – Performance Require 10 F-Score/precisionF-Score/recall

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Compare study – Prior probability 11

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Compare study – Prior probability 12

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – A new Framework Combine symmetic and asymmetric metrics Two problems:  The scaling problem: comparable and consistent of metrics  The combining strategy 13

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Mixed metrics vs. individual metrics 14 M3 (jacc+new) tf.isf

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Mixed metric M3 vs. individual metrics for novelty ratio 15

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Mixed metric M3 vs. mixture of two symmetric metrics vs. mixture of two asymmetric metrics vs. mixture of all metrics for novelty ratio 16

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Weight 17

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions Comparative study  Different types of novelty metrics  Symmetric: cosine / Jaccard  Asymmetric: new word count / overlap Observes Its strengths Introduce  Mixture of two types of novelty metrics More stable than using individual metric 18

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantage  A Comparative study  Mixture  Intuitive Drawback  … Application  Novelty mining 20


Download ppt "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora."

Similar presentations


Ads by Google