Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Advertisements

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Discovering Leaders from Community Actions Presenter : Wu, Jia-Hao Authors : Amit Goyal, Francesco Bonchi,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text classification based on multi-word with support vector.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Graph self-organizing maps for cyclic and unbounded graphs.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Detecting, Assessing and Monitoring Relevant Topics in Virtual.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Looking inside self-organizing map ensembles with resampling.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology CONTOUR: an efficient algorithm for discovering discriminating.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Web 2.0-based collaborative annotation system for enhancing knowledge sharing in collaborative learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2008.NN.10 Modeling propagation delays in the development.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Topology Preservation in Self-Organizing Feature Maps: Exact.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A self-organizing neural network using ideas from the immune.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A quantitative stock prediction system based on financial news Presenter : Chun-Jung Shih Authors :Robert.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Empirical Study of Learning from Imbalanced Data Using.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A semantic similarity metric combining features and intrinsic information content Presenter: Chun-Ping.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. How valuable is medical social media data? Content analysis of the medical web Presenter :Tsai Tzung.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. TurSOM: A Turing Inspired Self-organizing Map Presenter: Tsai Tzung Ruei Authors: Derek Beaton, Iren.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Word sense disambiguation of WordNet glosses Presenter: Chun-Ping Wu Author: Dan Moldovan, Adrian Novischi.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Adaptation of the Vector-Space Model for Ontology-Based.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Study on Automatic Recognition of Road Signs Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Efficient Optimal Linear Boosting of a Pair of Classifiers.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The application of SOM as a decision support tool to identify AACSB peer schools Presenter : Chun-Ping.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A new data clustering approach- Generalized cellular automata.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Mechanisms and Cluster Identification with TurSOM.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Gustavo.
國立雲林科技大學 National Yunlin University of Science and Technology Mining Generalized Associations of Semantic Relations from Textual Web Content Tao Jiang,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text Classification, Business Intelligence, and Interactivity:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Extraction from Wikipedia: Moving Down the Long.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge Presenter : Jiang-Shan Wang Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A support system for predicting eBay end prices Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 f-information measures in medical image registration Presenter.
Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning Pa´ draig Cunningham TKDE, Vol.21, 2009, pp. 1532–1543. Presenter : Wei-Shen Tai 2009/11/17

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2 Outline Introduction Representation Similarity measures  Direct similarity mechanisms  Transformation-based measures  Information-theoretic measures  Emergent measures Implications for CBR research Conclusion Comments

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 3 Motivation Similarity is central to CBR  More recently, a number of novel mechanisms have emerged that introduce interesting alternative perspectives on similarity.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 4 Objective Novel SM mechanisms review  Present a taxonomy of similarity mechanisms that places these new techniques in the context of established CBR techniques.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 5 Feature value representation  In terms of case attributes or instance.  Enhancement Discover word associations in a text corpus and then use these associations to add terms to the representation.  Bill Gates - > software, CEO, mircrosoft Allow texts to be represented by more features.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 6 Structural representations Hierarchical structure  Features value themselves reference nonatomic objects. Network structure  Typically a semantic network The Semantic Web describes the relationships between things (like tire is a part of car and John Lennon was a member of the Beatles) and the properties of things (like size, weight, age, and price) Flow structure  Share many of the characteristics of hierarchical and network representations. For example, work or job.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 7 String and sequence representations The most straightforward representation for free text. (non-structure data)  It supports similarity assessment is the bag-of-words strategy from information retrieval.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 8 Direct similarity mechanisms Similarity and distance metrics  k-NN Set-theoretic measures  Jaccard index, Dice similarity Kullback-Leibler Divergence and the χ 2 Statistic  Compare two images described as histograms. Symbolic attributes in taxonomies  Case representation is organized by feature values into a taxonomy of is-a relationships. rootteaGreen teaBlack teacarbonatedPepsiCola

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 9 Transformation-based measures I Edit Distance  the number of editing to transform one string. From cat to rat is 1, from cats to cat is 1. Alignment Measures for Biological Sequences  A variety of sequence alignment in biology (DNA).

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 10 Transformation-based measures II Earth mover distance  A transformation-based distance for image data.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 11 Transformation-based measures III Similarity for networks and graphs  Structure mapping engine (SME) Identify the appropriate mapping between the two domains.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 12 Information-theoretic measures It works directly on the raw case representation  Compression-based similarity for text Two very similar documents, the compressed size of both them will not be much greater than one.  Information-based similarity for biological sequences Specialized algorithms are required to compress them  Similarity in a taxonomy Distinguish the weight of is-a relationship between features.  A taxonomy can be quantified as the negative log likelihood.  Similarity is the common parent node with the highest value.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 13 Emergent measures I Random forests  An ensemble of decision trees. For each ensemble member (n > N), build a decision tree for them with less selected features (m >> M). Track the frequency with which cases are located at the same leaf node. Two features get more shared leaf frequency means they are more similar as well.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 14 Emergent measures II Cluster kernels  A semi-supervized learning, where only some of the available data are labeled. Class labels do not change in regions of high density. Cluster kernels allow the unlabelled data to influence similarity. where K(x i, x j ) orig is a basic neighborhood kernel and K(x i, x j ) bag is a kernel derived from repeated clustering of all the data.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 15 Emergent measures III Web-based kernel  Text snippet similarity by documents returned in Web search.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 16 Implications for CBR research Vocabulary knowledge container  In some circumstances (e.g., information-theoretic measures) the role of the similarity knowledge container is increased. Speeding up technique  New methodologies are typically computationally intensive, the importance of strategies for speeding up case-retrieval is increased.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 17 Conclusions Similarity measurement taxonomy  Organize the broad range of strategies for similarity assessment in CBR into a coherent taxonomy. Improve effectiveness of CBR  Alternative metrics simply offer better accuracy because it embodies specific knowledge about the data.

N.Y.U.S.T. I. M. Intelligent Database Systems Lab 18 Comments Advantage  This paper introduces and discusses those alternative metrics of similarity assessment for CBR. Drawback . Application  Similarity measurement.