Download presentation
Presentation is loading. Please wait.
Published byRosamund Gwendolyn Small Modified over 9 years ago
1
1 Text Classification for Healthcare Information Support Rey-Long Liu ( 劉瑞瓏 ) Dept. of Medical Informatics Tzu Chi University, Taiwan
2
2 Background Text categorization (TC) as a fundamental component for information processing – Many TC techniques were developed Unfortunately, high-quality TC is often an unrealizable ideal – Very high precision – Very high recall
3
3 Background (Cont.) An application scenario: healthcare information support Classification Confirmation Classified Information Information Gathered Relevant Information Classified Query Query Consultancy Inquiry Classified Inquiry High-Quality TC General Users (e.g. patients) Information Gathering Systems Healthcare Professionals Classified Information Base
4
4 Outline Interaction as an approach to high-quality TC – Main consideration Reducing the amount of the interaction – Criteria & straightforward interaction strategies An intelligent interaction strategy: COM (Content Overlapping Measurement) Empirical evaluation – Chinese cancer texts classification Conclusion
5
5 Interaction for High-Quality TC Interaction with the user – Possibly a “final” approach – More application scenarios Information recommendation & archiving – Definite relevant vs. potentially relevant Main consideration – Reducing the number of interactions
6
6 Interaction for High-Quality TC (Cont.) Evaluation criteria – Confirmation Precision (CP) Related to cognitive load to users – Confirmation Recall (CR) Related to the quality of TC
7
7 Interaction for High-Quality TC (Cont.) Straightforward interaction strategies Max DOA xoxxxoxooooo ooxxx (A) Setting two thresholds to identify the DOA range for confirmation (o: positive validation document; x: negative validation document) : Rejection Threshold Acceptance Threshold Uniform Confirmation (UC): Preferring CR (B) Confirmation strategy: Prob = 0 (when DOA(d, c) > AT) Prob = 0 (when DOA(d, c) < RT) Prob = 1.0 (when RT DOA(d, c) AT) Min DOA
8
8 Interaction for High-Quality TC (Cont.) Probabilistic Confirmation (PC): Preferring CP Prob = 0 (when DOA(d, c) = Min) Prob = 0 (when DOA(d, c) = Max) Prob = 1.0 (when DOA(d, c) = threshold) (B) Confirmation strategy: (A) Tuning a threshold in the hope to optimize F 1 (o: positive validation document; x: negative validation document): xoxxxoxooo o o oo Min DOA xxx The classifier’s Threshold (T) Max DOA
9
9 ICCOM: Interactive Confirmation by COM Training Testing (2) Threshold Tuning based on Content Overlapping Incoming Document Training Documents for Classifier Building Training Documents for Threshold Tuning (validation) ICCOM Classified/Filtered Documents Classifier Building Feature Selection Threshold Tuning Underlying Classifier (1) Content Overlap Measurement (COM) Documents to be Confirmed (3) Content Overlap Measurement (COM) Classification
10
10 ICCOM: Interactive Confirmation by COM (content overlapping measurement) Procedure COM(c, d), where (1) c is a category, (2) d is a document for thresholding or testing Return: Degree of content overlap (DCO) between d and c Begin (1) DCO = 0; (2) For each term t that is positively correlated with c but does not appear in d, do (2.1) DCO = DCO - 2 (t,c); (3) For each term t that is negatively correlated with c but appears in d, do (3.1) DCO = DCO - (number of occurrences of t in d) 2 (t,c); (4) Return DCO; End.
11
11 ICCOM: Interactive Confirmation by COM (content overlapping measurement, cont.)
12
12 ICCOM: Interactive Confirmation by COM (content overlapping measurement, cont.) N: total number of documents, A: # documents that are in c and contain t, B: # documents that are not in c but contain t, C: # documents that are in c but do not contain t, and D: # documents that are not in c and do not contain t. “positively-correlated” if AD>BC; otherwise “negative-correlated”
13
13 ICCOM: Interactive Confirmation by COM (thresholding) Rejection Threshold (RT) Rejection Confirmation oxx xoxooo o o oo Acceptance Rejection oxxx o xooo o o oo Min DOA Max DOA xxx The classifier’s threshold (T) Invoking COM to compute DCO Positive Confirmation Threshold (PCT) Negative Confirmation Threshold (NCT)
14
14 ICCOM: Interactive Confirmation by COM (collaboration with the classifier) Procedure InteractiveHighQualityTC(c, d, T, RT, PCT, NCT), where (1) c is a category, (2) d is the document to be processed, (3) T is the classifier’s threshold for c, (4) RT is the rejection threshold for c, (5) PCT is the positive confirmation threshold for c, and (6) NCT is the negative confirmation threshold for c. Return: A decision (acceptance, rejection, or confirmation) for d with respect to c. Begin (1) DOA d = Invoke the classifier to compute DOA of d with respect to c; (2) If (DOA d RT), Return “rejection”; (3) Else (3.1) DCO d = Invoke COM to compute DCO of d with respect to c; (3.2) If (DOA d T) (3.2.1) If (DCO d PCT), Return “acceptance”; (3.2.2) Return “confirmation”; (3.3) Else (3.3.1) If (DCO d NCT), Return “rejection”; (3.3.2) Return “confirmation”; End.
15
15 Empirical Evaluation Chinese disease (cancer) texts – 16 types of cancers (e.g. liver cancer, lung cancer, …, etc.) top-ranked by the department of health in Taiwan – Collected by sending cancer names to “ 知識 +” (knowledge+) in Yahoo! at Taiwan – For each cancer, there are 5 subcategories Cause, symptom, curing, side-effect, and prevention Therefore, we have 80 (16*5) categories with 2850 documents 90% for training; 10% for testing 2-fold cross validation (classifier building vs. thresholding)
16
16 Empirical Evaluation (cont.) Classification of cancer information
17
17 Empirical Evaluation (cont.) Classification of 40 symptom description without cancer names Note: For the 40 test symptom documents, RO+ICCOM conducts 35 and 51 confirmations in the 1st and 2nd folds, respectively
18
18 Conclusion High-quality TC is essential but often unrealizable Interactive confirmation may be one final resort – Information recommendation & archiving – Healthcare information support COM as a classifier-independent strategy for interaction
19
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.