Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.

Slides:



Advertisements
Similar presentations
Wendy Jones, 2005, National Center for Cultural Competence, based on categories by Rima Rudd, 2002, National Center for Adult Learning and Literacy Literacy.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
EVIDENCE BASED MEDICINE for Beginners
Dr. Rasha Salama PhD Community Medicine and Public Health Suez Canal University Egypt.
Consensus-based priority setting for elderly NSTEMI patients with multi-morbidity Niklas Ekerstad, MD Rurik Löfmark, MD Per Carlsson, Professor National.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
Diagnosis of Ovarian Cancer Based on Mass Spectra of Blood Samples Hong Tang Yelena Mukomel Eugene Fink.
1 HealthSense : Classification of Health-related Sensor Data through User-Assisted Machine Learning Presenter: Mi Zhang Feb. 23 rd, 2009 From Prof. Gregory.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Diagnosis of Ovarian Cancer Based on Mass Spectrum of Blood Samples Committee: Eugene Fink Lihua Li Dmitry B. Goldgof Hong Tang.
Clinical Content Update: Cancer John Horton, M.B., Ch.B. October, 2004.
Risk management planning related to Health Information Technology
Quality Improvement Prepeared By Dr: Manal Moussa.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture Notes by Neşe Yalabık Spring 2011.
Copyright © 2011, 2007 by Mosby, Inc., an affiliate of Elsevier Inc. 1 Contemporary Nursing Practice Chapter 1 Overview.
A Summary Of Key Findings From A National Survey Of Voters. #07160.
Clinical causality assessment I. Ralph Edwards R.H.B Meyboom.
) Linked2Safety Project (FP7-ICT – 5.3 ) A NEXT-GENERATION, SECURE LINKED DATA MEDICAL INFORMATION SPACE FOR SEMANTICALLY-INTERCONNECTING ELECTRONIC.
Public Health Issues in Canada. What do you think are the current issues? 1.Consider if the issue is affecting more than a few individuals 2.Is it something.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Introduction: Medical Psychology and Border Areas
Component 3-Terminology in Healthcare and Public Health Settings Unit 12-Urinary System This material was developed by The University of Alabama at Birmingham,
1 Text Classification for Healthcare Information Support Rey-Long Liu ( 劉瑞瓏 ) Dept. of Medical Informatics Tzu Chi University, Taiwan.
Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13.
Athletic Injuries ATC 222 The Sports Medicine Team and Their Roles Chapter 1.
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
From the Advanced Search page of the Cochrane Library, we have clicked on the Cochrane Reviews: By Topic hyperlink. This has displayed the Topics for Cochrane.
Health Checks. Introductions Today’s Layout 14:00 – 14:30 Welcome and Introductions Update from Hospital Discharges Slot for any updates from Go To people.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
CANCER IN THE WORKPLACE: HOW EMPLOYERS CAN HELP Lynn Zonakis Principal, The Zonakis Group LLC October 23, 2015.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Cancer - renal pelvis or ureter. Overview Cancer of the renal pelvis or ureter is cancer that forms in the pelvis or the tube that carries urine from.
Class Imbalance in Text Classification
Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach Yih-Cheng Chang Department of Computer Science and Information.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Slides last updated: March NSCLC is most often diagnosed at an advanced stage Many of the symptoms that do appear with more advanced disease can.
CANCER. CANCER IS UNCONTROLLED GROWTH AND REPRODUCTION OF CELLS RESULTING IN DESTRUCTION OF THE HEALTHY TISSUE. MOST COMMON CANCER TYPES ARE BREAST, PROSTATE,
Epidemiology and Bio-Statistics [HM208] By: Dr. Shruti Thakkar (BHMS, PGDHM, PGDFM) Assistant Professor Department of Hospital Management.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
By: Thien Bui and Reshma Neupane 1. Lung Cancer Lung Cancer is an uncontrolled growth of abnormal cells in one or more lungs 2 Common Types of Lung Cancer:
Finding Answers Online Comprehensiveness and accuracy in online information about breast cancer Kim Walsh-Childers, PhD Heather M. Edwards, MA University.
NIHR Southampton Biomedical Research Centre The Southampton Biomedical Research Centre is funded by the National Institute for Health Research (NIHR) and.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
DEPARTMENT OF RADIOLOGIC TECHNOLOGY AND MEDICAL IMAGING RAD 2325/RT 325 RADIOGRAPHIC PROCEDURES III Inter Hospital Collaborative Research in the Fall 2012.
MULTI DISEASE CLASSIFICATION BASED ON EFFECTIVE ANALYTICAL TECHNIQUES Guide: Mr.R. Nandhi kesavan S.Aabitha Banu A.Karthika.
Queensland University of Technology
NCT: Gaining Medical Insights and Enhancing Care for Cancer Patients with SAP HANA® Organization National Center for Tumor Diseases (NCT) Heidelberg, part.
Improving Health Question Classification by Word Location Weights
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
CANCER EVENT, A Case Study in Cancer
Title Goal Method Result
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
PEBL: Web Page Classification without Negative Examples
Citation-based Extraction of Core Contents from Biomedical Articles
Panagiotis G. Ipeirotis Luis Gravano
Dynamic Category Profiling for Text Filtering and Classification
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Extracting Why Text Segment from Web Based on Grammar-gram
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

Outline Research background Problem definition The proposed approach: IDAI Empirical evaluation Conclusion Disease Aspect Classification2

Research Background Disease Aspect Classification3

Disease Aspect Information (DAI) Disease Aspect Classification4 An example from MedlinePlus: Several passages about three aspects of kidney cancer: treatment, symptom and sign, and etiology. It also contains several passages not related to any aspect. You have two kidneys... Kidney cancer forms in the … Risk factors include smoking, having certain genetic conditions and …. Often, kidney cancer doesn't have early symptoms. However, see your health care provider if you notice Blood in your urine A lump in your abdomen … Pain in your side … Treatment depends on your age, …. It might include surgery, radiation, chemotherapy …

Disease Knowledge Map: An Application of DAI Disease Aspect Classification5

Identification of DAI Disease Aspect Classification6 Healthcare professionals & consumers Disease Info. Query & Aspect Medical texts for specific diseases Disease Aspects Classifier Disease aspect information symptoms diagnosistreatment etiology prevention Healthcare decision support system Disease Info. Cross-disease query Medical information provider Verified Info. Aspect Info.

Problem Definition Disease Aspect Classification7

Goals Modeling the identification of DAI as a text classification problem –Disease aspects are predefined categories of interest, not brief descriptions of information needs Developing a technique to enhance various kinds of text classifiers –Given a medical text, the classifier can be more capable in identifying those texts that talk about aspects of diseases Disease Aspect Classification8

Related Work Text classification (TC) –Weakness: multi-aspect information in a text will incur noises to text classifiers Segment extraction for topic detection –Weakness: designed for specific descriptions (not for categories) Passage extraction for TC –Weakness: location and length of the passages that are relevant to a specific category  becoming another problem of TC Disease Aspect Classification 9

The Proposed Approach: IDAI Disease Aspect Classification10

IDAI: Revising Term Frequency (TF) to Improve Classifiers Disease Aspect Classification11 Categories (aspects) Classifier Development Training Testing Underlying Text ClassifierIDAI Classification Training Texts A text (d) Assessing Term Frequencies (TF) TF of terms w.r.t. each category Identifying Term-Category Correlation type

Two Strategies for TF Revision Disease Aspect Classification12 Underlying classifier GEnhanced classifier G+IDAI Feature setsTF revision by IDAI Accepting relevant texts P: Set of positively correlated features (Strategy I) TF of a feature f is amplified (reduced) if neighbors of f have the same (different) correlation type to the category (Strategy II) TF of a feature f in Q is reduced if f appears in a text segment that mainly mentions features in P Rejecting irrelevant texts Q: Set of negatively correlated features

Revised TF(t,d,c) = WindowTF(t,d,c), if t is positively correlated to c; (for Strategy I) Max c’  c {WindowTF(t,d,c’)} - InconsistencyTF(t,d,c), if t is negatively correlated to c (for Strategy II) WindowTF(t,d,c) =  k (0.5+P window,k ), for each occurrence of t at k, P window,k = Distance-based sum of weights of other positively correlated terms in a window at k InconsistencyTF(t,d,c) =  k (P inconsistency,k ), for each occurrence of t at k, P inconsistency,k =0.5  How the text segment before k is dominated by the terms positively correlated to c Disease Aspect Classification13

Empirical Evaluation Disease Aspect Classification14

Experimental Data Top-10 fatal diseases and top-20 cancers in Taiwan –Total # of diseases: 28 –Source: Web sites of hospitals, healthcare associations, and department of health in Taiwan –Disease aspects (categories): 5 spects: etiology, diagnosis, treatment, prevention, and symptom. –Splitting the texts into aspects: 4669 texts about individual aspects –Test data: Randomly sampling 10% of the 4669 texts and merging them into test texts of 1 to 5 aspects Disease Aspect Classification15

Underlying Classifiers & Experimental Baselines Underlying classifier –The Support Vector Machine (SVM) classifier Baseline enhancer –CTFA (Liu, 2010), which employs Strategy I for better TC –CTFA does not consider Strategy II Disease Aspect Classification16

Results Disease Aspect Classification17

Disease Aspect Classification18

Conclusion Disease Aspect Classification19

Disease knowledge map (Dmap) –Supporting evidence-based medicine, health education, and healthcare decision support A key step to build a Dmap: Automatic identification of disease aspect information (DAI) Identification of DAI as a text classification problem Term proximity as key information to enhance existing classifiers to classify DAI Disease Aspect Classification20