Presentation is loading. Please wait.

Presentation is loading. Please wait.

Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov

Similar presentations


Presentation on theme: "Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov"— Presentation transcript:

1 Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov bthomson@cs.umn.edu

2 Goal The goal of this presentation is to present a simple but effective approach to identify the syntactic structure of three word terms

3 Importance Potentially improve the analysis of unrestricted medical text  Mapping of medical text to standardized terminologies  Unsupervised syntactic parsing

4 Syntactic Structure of Terms w1 w2 w3 Monolithi c Non-branchingRight-branchingLeft-branching blue = independence green = dependence

5 Example small bowel obstruction

6 Syntactic Structure of Example small bowel obstruction Monolithi c Non-branchingRight-branchingLeft-branching

7 Method used to determine the structure of a term The Log Likelihood Ratio is the ratio between the observed probability of a term occurring and the probability it would be expected to occur Probability of Term Occurring ----------------------------------- Expected Probability of Term

8 Log Likelihood Ratio The expected probability of a term is often based on the Non-branching (Independence) Model P(small bowel obstruction) ----------------------------------- P(small) P(bowel) P(obstruction) EXPECTED PROBABILITY OBSERVED PROBABILITY

9 Extended Log Likelihood Ratio The expected probabilities can be calculated using two other hypothesis (models) Non-branchingRight-branchingLeft-branching P(small)P(bowel)P(obstruction)P(small bowel) P(obstruction)P(small) P(bowel obstruction)

10 Three Log Likelihood Ratio Equations P(small bowel obstruction) ----------------------------------- P(small) P(bowel) P(obstruction) P(small bowel obstruction) ----------------------------------- P(small bowel) P(obstruction) P(small bowel obstruction) ----------------------------------- P(small) P(bowel obstruction) Non-branching Right-branchingLeft-branching

11 Expected Probability The expected probability of a term differs as does the Log Likelihood Ratio Non-branchingRight-branchingLeft-branching P(small) P(bowel) P(obstruction)P(small bowel) P(obstruction)P(small) P(bowel obstruction) LL = 11,635.45 LL = 5,169.81LL = 8,532.90

12 Model Fitting The model with the lowest Log Likelihood Ratio best describes the underlying structure of the term Non-branchingRight-branchingLeft-branching P(small) P(bowel) P(obstruction)P(small bowel) P(obstruction)P(small) P(bowel obstruction) LL = 11,635.45 LL = 5,169.81LL = 8,532.90

13 ReCap The Log Likelihood Ratio is calculated for each possible model  Non-branching  Right-branching  Left-branching The probabilities for each model are obtained from a corpus The term is assigned the structure whose model has the lowest Log Likelihood Ratio

14 Test Set Contains 708 three word terms from the SNOMED-CT 73 terms Monolithi c Non-branchingRight-branchingLeft-branching 6 terms378 terms251 terms

15 Test Set (cont) Syntactic structure of each term was determined through the consensus of two medical text index experts (kappa = 0.704) The probabilities were obtained from over 10,000 Mayo Clinic clinical notes

16 Monolithic Results Technique Percentage agreement with human experts 35.5 53.4 74.8

17 Results without Monolithic Terms Technique Percentage agreement with human experts 39.5 59.5 83.5

18 Limitations Monolithic structures  possibly identify through collocation extraction or dictionary lookup As the number of words in a term grows so does the number of hypothesis (models) to be evaluated  only consider adjacent models  limit the length of the terms to 5 or 6 words

19 Conclusions Present a simple but effective method to identify the structure of three word terms The method uses the Log Likelihood Ratio Could be extended to identify the structure of for four, five and six word terms

20 Future Work Improve accuracy of method  explore other measures of association Chi-squared, Phi, Dice coefficient...  incorporate multiple measures together Extend our method to four and five word terms  difficulty: finding a test set

21 Thank you Software: Ngram Statistic Package (NSP) www.d.umn.edu/~tpederse/nsp.html Log Likelihood Ratio Models www.cs.umn.edu/~bthomson/mti.html

22 Log Likelihood Equation 2 * ∑ xyz ( n xyz * log(n xyz / m xyz ) )

23 Expected Values 2 * ∑ xyz ( n xyz * log(n xyz / m xyz ) ) Non-branching: m xyz = n x++ * n +y+ * n ++z / n +++ Left-branching: m xyz = n xy+ * n ++z / n +++ Right-branching: m xyz = n x++ * n +yz / n +++


Download ppt "Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes Ted Pedersen Serguei V. Pakhomov"

Similar presentations


Ads by Google