Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A 24-h forecast of solar irradiance using artificial neural.
Advertisements

Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Intelligent Database Systems Lab Presenter: HONG, CHIA-TSE Authors: Yen-Hsien Lee, Chih-Ping Wei, Tsang-Hsiang Cheng, Ching-Ting Yang DSS Nearest-neighbor-based.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Quality evaluation of product reviews using an information.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text classification based on multi-word with support vector.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. BNS Feature Scaling: An Improved Representation over TF·IDF for SVM Text Classification Presenter : Lin,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TANGENT: A Novel, “Surprise-me”, Recommendation Algorithm.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Web usage mining: extracting unexpected periods from web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab Presenter : JHOU, YU-LIANG Authors :Shady Shehata, Fakhri Karray, Mohamed S. Kamel, Fellow 2012, IEEE An Efficient Concept-Based.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2008.NN.10 Modeling propagation delays in the development.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. SpotSigs: Robust and Efficient Near Duplicate Detection in Large Web Collections Presenter: Tsai Tzung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Topology Preservation in Self-Organizing Feature Maps: Exact.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A language modeling framework for expert finding Presenter : Lin, Shu-Han Authors : Krisztian Balog,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A quantitative stock prediction system based on financial news Presenter : Chun-Jung Shih Authors :Robert.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter.
Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A semantic similarity metric combining features and intrinsic information content Presenter: Chun-Ping.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An information-pattern-based approach to novelty detection Presenter : Lin, Shu-Han Authors : Xiaoyan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Plagiarism Detection Technique for Java Program Using.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
Intelligent Database Systems Lab Presenter : Kung, Chien-Hao Authors : Medhdi Khashei, Mehdi Bijari 2011, ASOC A novel hybridization of artificial neural.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Development of a reading material recommendation system based on a knowledge engineering approach Presenter.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Study on Automatic Recognition of Road Signs Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab Presenter : Chang,Chun-Chih Authors : CHRISTOS BOURAS, VASSILIS TSOGKAS 2012, KBS A clustering technique for news articles.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Intelligent Database Systems Lab Presenter : Kung, Chien-Hao Authors : Eghbal G. Mansoori 2011,IEEE FRBC: A Fuzzy Rule-Based Clustering Algorithm.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Kevin Meijer, Flavius Frasincar, Frederik Hogenboom 2014.DSS. A semantic approach.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,
Intelligent Database Systems Lab Presenter : JIAN-REN CHEN Authors : Wen Zhang, Taketoshi Yoshida, Xijin Tang 2011.ESWA A comparative study of TF*IDF,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Longzhuang Li, Yi Shang, Wei Zhang 2002.ACM. Improvement of HITS-based Algorithms.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A new data clustering approach- Generalized cellular automata.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Identifying Domain Expertise of Developers from Source Code Presenter : Wu, Jia-Hao Authors : Renuka.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Modeling Semantic Similarities in Multiple Maps Presenter.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Providing Justifications in Recommender Systems Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Study of Learning a Merge Model for Multilingual Information.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Community self-Organizing Map and its Application to Data Extraction Presenter: Chun-Ping Wu Authors:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text Classification, Business Intelligence, and Interactivity:
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Junping Zhang, Hua Huang and Jue Wang IEEE INTELLIGENT SYSTEMS Manifold Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Extraction from Wikipedia: Moving Down the Long.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An Integrated Machine Learning Approach to Stroke Prediction Presenter: Tsai Tzung Ruei Authors: Aditya.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge Presenter : Jiang-Shan Wang Authors.
Intelligent Database Systems Lab Presenter : CHANG, SHIH-JIE Authors : Chun Fu Lin, Yu-chu Yeh, Yu Hsin Hung, Ray I Chang 2013.CE. Data mining for providing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Intelligent Database Systems Lab Presenter: YU-TING LU Authors: Yong-Bin Kang, Pari Delir Haghighi, Frada Burstein ESA CFinder: An intelligent key.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.
Presentation transcript:

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora S. Tsai, Wenyin Tang, Kap Luk Chan Information Sciences, InS (2010)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Introduction Motivation Objective Methodology Compare study Experiments Conclusion Comments

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction 3 Define Novelty?  Novelty is the opposite of “similarity ” or “redundancy” Novelty:  Given the set of relevant sentences in all documents, identify all novel sentence. How to identify Novelty sentences?  A novelty score: Measured and Scored by a novelty metric

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation 4 Sentence 1: U.S. Stocks set for big sell-off Sentence 2 (incoming sentence) : U.S. Stocks *S2 is covered by S1 Novelty(S1, S2) = 1 – similarity(S1, S2) There is low similarity between S1 and S2 SO S2 is novelty ???

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives 5 How to choose a novelty metric? How to set a suitable threshold automatically?

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology - Novelty Metrics 6 Symmetric (1 – similarity)  S1 is novelty to S2  S2 is novelty to S1 Asymmetric  S1 is not novelty to S2  S2 is novelty to S1

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology - Symmetric metrics 7 Cosine similarity Jaccard Similarity

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology - ASymmetric metrics 8 Overlap metric New word count metric

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Compare study 9 Performance Requirements (trade-off) : high (recall / precision / F-score) The distribution: (high / medium / low) novelty ratio

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Compare study – Performance Require 10 F-Score/precisionF-Score/recall

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Compare study – Prior probability 11

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Compare study – Prior probability 12

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – A new Framework Combine symmetic and asymmetric metrics Two problems:  The scaling problem: comparable and consistent of metrics  The combining strategy 13

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Mixed metrics vs. individual metrics 14 M3 (jacc+new) tf.isf

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Mixed metric M3 vs. individual metrics for novelty ratio 15

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Mixed metric M3 vs. mixture of two symmetric metrics vs. mixture of two asymmetric metrics vs. mixture of all metrics for novelty ratio 16

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Weight 17

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions Comparative study  Different types of novelty metrics  Symmetric: cosine / Jaccard  Asymmetric: new word count / overlap Observes Its strengths Introduce  Mixture of two types of novelty metrics More stable than using individual metric 18

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantage  A Comparative study  Mixture  Intuitive Drawback  … Application  Novelty mining 20