Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On Rival Penalization Controlled Competitive Learning.
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Automatic Document Categorisation by User Profile in MEDLINE Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Quality evaluation of product reviews using an information.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text classification based on multi-word with support vector.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology U*F clustering : a new performant “ clustering-mining ”
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Automated coding of diagnoses - three methods compared Presenter.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Web usage mining: extracting unexpected periods from web.
An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G.M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction Presenter : Jiang-Shan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A quantitative stock prediction system based on financial news Presenter : Chun-Jung Shih Authors :Robert.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology SIGIR1 Improving Web Search Results Using Affinity Graph.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A semantic similarity metric combining features and intrinsic information content Presenter: Chun-Ping.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Concept similarity in Formal Concept Analysis-An information.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Plagiarism Detection Technique for Java Program Using.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. How valuable is medical social media data? Content analysis of the medical web Presenter :Tsai Tzung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Adaptation of the Vector-Space Model for Ontology-Based.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A text mining approach on automatic generation of web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining concept maps from news stories for measuring civic scientific literacy in media Presenter :
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Predicting corporate bankruptcy using a self-organizing map: An empirical study to improve the forecasting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Concept Frequency Distribution in Biomedical Text Summarization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text Classification, Business Intelligence, and Interactivity:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Extraction from Wikipedia: Moving Down the Long.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An Integrated Machine Learning Approach to Stroke Prediction Presenter: Tsai Tzung Ruei Authors: Aditya.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge Presenter : Jiang-Shan Wang Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A support system for predicting eBay end prices Presenter.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.
MedSearch is a retrieval system for the medical literature
Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections Advisor : Dr. Hsu Presenter : Shu-Ya Li Authors : Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis, Evangelos E. Milios HIKM

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Current Approach : MMTx Method : AMTEx  C/NC-value method  Use of MeSH Thesaurus as lexical resource Experiments Conclusion Personal Opinions

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation MMTx, the U.S. NLM approach  maps biomedical documents to UMLS term concepts The limitations of MMTx in term extraction: 1) term over-generation 2) term concept diffusion 3) unrelated terms added to the final candidate list MMTx focus on UMLS rather than MeSH  But MEDLINE indexing is based on MeSH To improve the efficiency of automatic indexing of medical documents.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective We propose a new method, AMTEX 1) Improving the efficiency of automatic term extraction by using C/NC-value method. 2) Indexing and retrieval of MEDLINE documents, based on the extraction and mapping of document terms to the MeSH Thesaurus.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Current Approach : MMTx Maps arbitrary text to UMLS Metathesaurus concepts:  Parsing (syntactic analysis - linguistic filter)  Variant Generation (uses SPECIALIST Lexicon)  Candidate Retrieval (mapping process to Metathesaurus Concepts)  Candidate Evaluation (criteria: centrality, variation, coverage, cohesiveness)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 MMTx Example  Parsing Shallow syntactic analysis of the input text Linguistic filtering: isolates noun phrases e.g. the term “ ocular complications ” is analysed as:  Variant Generation e.g. “ obstructive sleep apnea ” has variants: obstructive sleep apnea, sleep apnea, sleep, apnea, osa,…  Candidate Retrieval Candidate Metathesaurus concepts for the variant “ osa ” : osa [osa antigen], osa [osa gene product] osa [osa protein] osa [obstructive sleep apnea]  Candidate Evaluation Obstructive Sleep apnea1000 Sleep Apnea 901 Apnea827… Sleeping793 Sleepy755 The limitations of MMTx in term extraction: 1. term over-generation 2. term concept diffusion 3. unrelated terms added to the final candidate list

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Method - AMTEx Input Document d, MeSH Ontology C/NC-value Multi-word Term Extraction & Term Ranking Term Mapping Single-word Term Extraction C/NC-value Multi-word Term Extraction & Term Ranking Term Variant Generation Term Expansion Output MeSH Term Lists MeSH Thesaurus Resource

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Step 1 & 2: C/NC value- Multi-word Term Extraction & Ranking Part-of-Speech Tagging Linguistic filtering: Term Extraction - C-value Term Ranking - NC-value Keep terms up to threshold T 1

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Step 3 : Term Mapping Candidate terms are mapped to terms of the MeSH Thesaurus (simple string matching). Only candidate terms matching MeSH are retained. Multi-word candidates not matching MeSH may contain (shorter) MeSH terms.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Step 4 : Single-word Term Extraction For multi-word terms not matching MeSH  Multi-word are split into single-word terms  Single-word terms are validated against MeSH  Matched MeSH terms are added to term list

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Step 5 : Term Variant Generation Inflectional variants of the extracted terms are identified during term extraction  (C/NC-value) Stemmed term-forms are also available in MeSH and are added to the list of terms

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Step 6 : Term Expansion Each term in the list is expanded with neighbor terms in MeSH The expansion may include terms more than one level higher or lower than the original term, depending on T 2

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Experiments Precision and Recall measures  Dataset  61 full MEDLINE documents, from PMC database of NCBI Pubmed  MEDLINE documents are paired to respective MeSH index terms, manually assigned by experts  Ground Truth  the set of MeSH document index terms  Benchmark method  MMTx against AMTEx

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Experiments

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Conclusion - AMTEx designed for indexing and retrieval of MEDLINE documents focuses on multi-word term extraction using valid linguistic & statistical criteria based on MeSH - similarly to human indexing selectively expands to term variants & synonyms outperforms the current benchmark MMTx method, reaching better precision & recall

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Personal Opinions Advantage Drawback  … Application  …