Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Slides:



Advertisements
Similar presentations
Chapter 14: Usability testing and field studies
Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Chapter 4 Validity.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Learning to Extract Form Labels Nguyen et al.. The Challenge We want to retrieve and integrate online databases We want to retrieve and integrate online.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
Extracting Structured Data from Web Page Arvind Arasu, Hector Garcia-Molina ACM SIGMOD 2003.
Employment Interview Frequently used to make selection decisions (over 90% usage) Social exchange (interpersonal) process Search for information.
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Chapter 17 Nursing Diagnosis
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Mining and Summarizing Customer Reviews
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
STAFFING VAIBHAV VYAS.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Measurement theory - for the interested student Erland Jonsson Department of Computer Science and Engineering Chalmers University of Technology.
Professor: S. J. Wang Student : Y. S. Wang
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
1 Text Classification for Healthcare Information Support Rey-Long Liu ( 劉瑞瓏 ) Dept. of Medical Informatics Tzu Chi University, Taiwan.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13.
SSO: THE SYNDROMIC SURVEILLANCE ONTOLOGY Okhmatovskaia A, Chapman WW, Collier N, Espino J, Conway M, Buckeridge DL Ontology Description The SSO was developed.
CHAPTER 3 Job Analysis Introduction to Industrial/Organizational Psychology by Ronald Riggio.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Jing Ye 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Sciences.
HTST Evaluation Notes. Outline of Stable Version of HTST Stable version of HTST contains: – HT sense knowledge base – Auxiliary sub-lexicons and data.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining concept maps from news stories for measuring civic scientific literacy in media Presenter :
De-anonymizing Genomic Databases Using Phenotypic Traits Humbert et al. Proceedings on Privacy Enhancing Technologies 2015 (2) :
Title Authors Introduction Text, text, text, text, text, text Background Information Text, text, text, text, text, text Observations Text, text, text,
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Meta-Path-Based Ranking with Pseudo Relevance Feedback on Heterogeneous Graph for Citation Recommendation By: Xiaozhong Liu, Yingying Yu, Chun Guo, Yizhou.
Looking for New, Not Known Music Only : Music Retrieval by Melody Style Fang-Fei Kuo Dept. of Computer Science and Information Engineering National Chiao.
Queensland University of Technology
Classroom Assessment A Practical Guide for Educators by Craig A
Improving Health Question Classification by Word Location Weights
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
THE NURSING PROCESS A systematic problem-solving approach used to identify, prevent and treat actual or potential health problems and promote wellness.
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Introduction to Industrial/Organizational Psychology by Ronald Riggio
Extracting Semantic Concept Relations
Citation-based Extraction of Core Contents from Biomedical Articles
Introduction to Industrial/Organizational Psychology by Ronald Riggio
Dynamic Category Profiling for Text Filtering and Classification
Introduction to Industrial/Organizational Psychology by Ronald Riggio
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
By Hossein Hematialam and Wlodek Zadrozny Presented by
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics Tzu Chi University Taiwan, R.O.C.

Outline Research background Problem definition The proposed approach: PDFI Empirical evaluation Conclusion

Research Background

Diagnosis Knowledge Map: Fundamental of Diagnosis Support & Education r5r5 r4r4 r3r3 r2r2 r1r1 d3d3 d2d2 d1d1 Symptoms & Signs (and examinations & tests) DiseasesRisk Factors m1m1 m2m2 m3m3 m4m4 m5m5

Basic Properties Diagnosis factors of a disease –Risk factors, symptoms, and signs of the disease A diagnosis knowledge map consist of many-to-many relationships between diseases and their diagnosis factors –May have different capability of discriminating the diseases, and may evolve Construction of a diagnosis knowledge map is essential but costly

Problem Definition

Goal Explore how the identification of the diagnosis factors may be supported by text mining Develop a technique PDFI (Proximity-based Diagnosis Factors Identifier) that –Employs term proximity to improve diagnosis factors identifiers –Serves as a supplement to improve existing identifiers

Related Work Extract relationships by parsing or template matching –Weakness: Relationships between diseases and diagnosis factors are seldom expressed in individual sentences Select key features by text classification –Weakness: Term proximity is NOT considered Proximity-based retrieval –Weakness: NOT applicable to diagnosis factor identification 8

The Proposed Approach: PDFI

Basic Observation In a medical text talking about the diagnosis of a disease, the diagnosis factors often appear in a nearby area of the text

The Approach For a candidate diagnosis factor u, PDFI –Measures how other candidate diagnosis factors appear in the areas near to u in the medical texts, and then –Encodes the term proximity information into the discriminating capability of u measured by the underlying discriminative factors identifiers.

System Overview Encode term proximity contexts to revise the strengths of candidate factors Measure discriminating strengths of candidate factors Underlying identifierPDFI Ranked factors for individual diseases Texts about individual diseases Discriminating strengths of candidate factors

Scoring for a Candidate Factor MinDist u,c = Minimum distance between u and n in the texts about disease c, and α is set to 30 For a candidate diagnosis factor u for disease c Rank(u,c) = Rank of u w.r.t. c by the underlying identifier Finalscore(u, c) = ProximityScore(u,c)+IdentifierScore(u,c)

Empirical Evaluation

Experimental Data Medical dictionary: from MeSH –Each MeSH term and its retrieval equivalence terms, resulting in a dictionary of 164,354 medical terms Medical texts for disease: from MedlinePlus –All the diseases for which MedlinePlus tags diagnosis/symptoms texts, resulting in a text database of 420 medical texts for 131 diseases –Each medical text is manually read and cross- checked to extract target diagnosis factor terms from the texts, resulting in 2,797 target terms

Underlying Diagnosis Factor Identifier The chi-square feature scoring technique –Produces a discriminating strength for each feature (candidate factor) with respect to each disease, and –For each disease, all positively-correlated features are sent to PDFI for re-ranking

Evaluation Criteria Mean average precision (MAP) –Measuring how target diagnosis factors are ranked high for the medical expert to check and validate –Example Targets ranked 1 st, 3 rd, 5 th  AP=(1/1+2/3+3/5)/3=0.76 Targets ranked 1 st, 2 nd, 3 rd  AP=(1/1+2/2+3/3)/3=1.00

Results MAP: chi-square: ; chi-square+PDFI:

An Example Parasitic diseases –AP: chi-square:0.3003; chi-square+PDFI: PDFI promotes the ranks of several target diagnosis factors (e.g., parasite, antigen, diarrhea, and MRI scan ) –They appear at some place(s) where more other candidate terms occur in a nearby area PDFI lowers the ranks of a few target diagnosis factors (e.g., serology ) – Serology only appears at one place where the author used lots of words to explain serology

Conclusion

Diagnosis factors to discriminate diseases are the fundamental basis for –Diagnosis decision support, diagnosis skill training, medical research, & health education Text mining is a good way to identify and maintain the huge amount of diagnosis factors for diseases By encoding term proximity information, PDFI may be a good supplement to existing technique to identify the diagnosis factors for individual diseases