Download presentation
Presentation is loading. Please wait.
Published byGeraldine Cole Modified over 9 years ago
1
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
2
Outline Research background Problem definition The proposed approach: IDAI Empirical evaluation Conclusion Disease Aspect Classification2
3
Research Background Disease Aspect Classification3
4
Disease Aspect Information (DAI) Disease Aspect Classification4 An example from MedlinePlus: Several passages about three aspects of kidney cancer: treatment, symptom and sign, and etiology. It also contains several passages not related to any aspect. You have two kidneys... Kidney cancer forms in the … Risk factors include smoking, having certain genetic conditions and …. Often, kidney cancer doesn't have early symptoms. However, see your health care provider if you notice Blood in your urine A lump in your abdomen … Pain in your side … Treatment depends on your age, …. It might include surgery, radiation, chemotherapy …
5
Disease Knowledge Map: An Application of DAI Disease Aspect Classification5
6
Identification of DAI Disease Aspect Classification6 Healthcare professionals & consumers Disease Info. Query & Aspect Medical texts for specific diseases Disease Aspects Classifier Disease aspect information symptoms diagnosistreatment etiology prevention Healthcare decision support system Disease Info. Cross-disease query Medical information provider Verified Info. Aspect Info.
7
Problem Definition Disease Aspect Classification7
8
Goals Modeling the identification of DAI as a text classification problem –Disease aspects are predefined categories of interest, not brief descriptions of information needs Developing a technique to enhance various kinds of text classifiers –Given a medical text, the classifier can be more capable in identifying those texts that talk about aspects of diseases Disease Aspect Classification8
9
Related Work Text classification (TC) –Weakness: multi-aspect information in a text will incur noises to text classifiers Segment extraction for topic detection –Weakness: designed for specific descriptions (not for categories) Passage extraction for TC –Weakness: location and length of the passages that are relevant to a specific category becoming another problem of TC Disease Aspect Classification 9
10
The Proposed Approach: IDAI Disease Aspect Classification10
11
IDAI: Revising Term Frequency (TF) to Improve Classifiers Disease Aspect Classification11 Categories (aspects) Classifier Development Training Testing Underlying Text ClassifierIDAI Classification Training Texts A text (d) Assessing Term Frequencies (TF) TF of terms w.r.t. each category Identifying Term-Category Correlation type
12
Two Strategies for TF Revision Disease Aspect Classification12 Underlying classifier GEnhanced classifier G+IDAI Feature setsTF revision by IDAI Accepting relevant texts P: Set of positively correlated features (Strategy I) TF of a feature f is amplified (reduced) if neighbors of f have the same (different) correlation type to the category (Strategy II) TF of a feature f in Q is reduced if f appears in a text segment that mainly mentions features in P Rejecting irrelevant texts Q: Set of negatively correlated features
13
Revised TF(t,d,c) = WindowTF(t,d,c), if t is positively correlated to c; (for Strategy I) Max c’ c {WindowTF(t,d,c’)} - InconsistencyTF(t,d,c), if t is negatively correlated to c (for Strategy II) WindowTF(t,d,c) = k (0.5+P window,k ), for each occurrence of t at k, P window,k = Distance-based sum of weights of other positively correlated terms in a window at k InconsistencyTF(t,d,c) = k (P inconsistency,k ), for each occurrence of t at k, P inconsistency,k =0.5 How the text segment before k is dominated by the terms positively correlated to c Disease Aspect Classification13
14
Empirical Evaluation Disease Aspect Classification14
15
Experimental Data Top-10 fatal diseases and top-20 cancers in Taiwan –Total # of diseases: 28 –Source: Web sites of hospitals, healthcare associations, and department of health in Taiwan –Disease aspects (categories): 5 spects: etiology, diagnosis, treatment, prevention, and symptom. –Splitting the texts into aspects: 4669 texts about individual aspects –Test data: Randomly sampling 10% of the 4669 texts and merging them into test texts of 1 to 5 aspects Disease Aspect Classification15
16
Underlying Classifiers & Experimental Baselines Underlying classifier –The Support Vector Machine (SVM) classifier Baseline enhancer –CTFA (Liu, 2010), which employs Strategy I for better TC –CTFA does not consider Strategy II Disease Aspect Classification16
17
Results Disease Aspect Classification17
18
Disease Aspect Classification18
19
Conclusion Disease Aspect Classification19
20
Disease knowledge map (Dmap) –Supporting evidence-based medicine, health education, and healthcare decision support A key step to build a Dmap: Automatic identification of disease aspect information (DAI) Identification of DAI as a text classification problem Term proximity as key information to enhance existing classifiers to classify DAI Disease Aspect Classification20
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.