Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.

Similar presentations


Presentation on theme: "Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics."— Presentation transcript:

1 Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics Tzu Chi University Taiwan, R.O.C.

2 Outline Research background Problem definition The proposed approach: PDFI Empirical evaluation Conclusion PDFI@ACIIDS20112

3 Research Background PDFI@ACIIDS20113

4 Diagnosis Knowledge Map: Fundamental of Diagnosis Support & Education PDFI@ACIIDS20114 r5r5 r4r4 r3r3 r2r2 r1r1 d3d3 d2d2 d1d1 Symptoms & Signs (and examinations & tests) DiseasesRisk Factors m1m1 m2m2 m3m3 m4m4 m5m5

5 Basic Properties Diagnosis factors of a disease –Risk factors, symptoms, and signs of the disease A diagnosis knowledge map consist of many-to-many relationships between diseases and their diagnosis factors –May have different capability of discriminating the diseases, and may evolve Construction of a diagnosis knowledge map is essential but costly PDFI@ACIIDS20115

6 Problem Definition PDFI@ACIIDS20116

7 Goal Explore how the identification of the diagnosis factors may be supported by text mining Develop a technique PDFI (Proximity-based Diagnosis Factors Identifier) that –Employs term proximity to improve diagnosis factors identifiers –Serves as a supplement to improve existing identifiers PDFI@ACIIDS20117

8 Related Work Extract relationships by parsing or template matching –Weakness: Relationships between diseases and diagnosis factors are seldom expressed in individual sentences Select key features by text classification –Weakness: Term proximity is NOT considered Proximity-based retrieval –Weakness: NOT applicable to diagnosis factor identification PDFI@ACIIDS2011 8

9 The Proposed Approach: PDFI PDFI@ACIIDS20119

10 Basic Observation In a medical text talking about the diagnosis of a disease, the diagnosis factors often appear in a nearby area of the text PDFI@ACIIDS201110

11 The Approach For a candidate diagnosis factor u, PDFI –Measures how other candidate diagnosis factors appear in the areas near to u in the medical texts, and then –Encodes the term proximity information into the discriminating capability of u measured by the underlying discriminative factors identifiers. PDFI@ACIIDS201111

12 System Overview PDFI@ACIIDS201112 Encode term proximity contexts to revise the strengths of candidate factors Measure discriminating strengths of candidate factors Underlying identifierPDFI Ranked factors for individual diseases Texts about individual diseases Discriminating strengths of candidate factors

13 Scoring for a Candidate Factor PDFI@ACIIDS201113 MinDist u,c = Minimum distance between u and n in the texts about disease c, and α is set to 30 For a candidate diagnosis factor u for disease c Rank(u,c) = Rank of u w.r.t. c by the underlying identifier Finalscore(u, c) = ProximityScore(u,c)+IdentifierScore(u,c)

14 Empirical Evaluation PDFI@ACIIDS201114

15 Experimental Data Medical dictionary: from MeSH –Each MeSH term and its retrieval equivalence terms, resulting in a dictionary of 164,354 medical terms Medical texts for disease: from MedlinePlus –All the diseases for which MedlinePlus tags diagnosis/symptoms texts, resulting in a text database of 420 medical texts for 131 diseases –Each medical text is manually read and cross- checked to extract target diagnosis factor terms from the texts, resulting in 2,797 target terms PDFI@ACIIDS201115

16 Underlying Diagnosis Factor Identifier The chi-square feature scoring technique –Produces a discriminating strength for each feature (candidate factor) with respect to each disease, and –For each disease, all positively-correlated features are sent to PDFI for re-ranking PDFI@ACIIDS201116

17 Evaluation Criteria Mean average precision (MAP) –Measuring how target diagnosis factors are ranked high for the medical expert to check and validate –Example Targets ranked 1 st, 3 rd, 5 th  AP=(1/1+2/3+3/5)/3=0.76 Targets ranked 1 st, 2 nd, 3 rd  AP=(1/1+2/2+3/3)/3=1.00 PDFI@ACIIDS201117

18 Results MAP: chi-square: 0.2136; chi-square+PDFI: 0.2996 PDFI@ACIIDS201118

19 An Example Parasitic diseases –AP: chi-square:0.3003; chi-square+PDFI:0.3448 PDFI promotes the ranks of several target diagnosis factors (e.g., parasite, antigen, diarrhea, and MRI scan ) –They appear at some place(s) where more other candidate terms occur in a nearby area PDFI lowers the ranks of a few target diagnosis factors (e.g., serology ) – Serology only appears at one place where the author used lots of words to explain serology PDFI@ACIIDS201119

20 Conclusion PDFI@ACIIDS201120

21 Diagnosis factors to discriminate diseases are the fundamental basis for –Diagnosis decision support, diagnosis skill training, medical research, & health education Text mining is a good way to identify and maintain the huge amount of diagnosis factors for diseases By encoding term proximity information, PDFI may be a good supplement to existing technique to identify the diagnosis factors for individual diseases PDFI@ACIIDS201121


Download ppt "Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics."

Similar presentations


Ads by Google