Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes¹ Ted Pedersen² and Serguei V. Pakhomov¹ University of Minnesota¹.

Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes¹ Ted Pedersen² and Serguei V. Pakhomov¹ University of Minnesota¹ University of Minnesota Duluth²

Syntactic Structure of Terms w1 w2 w3 MonolithicNon-branchingRight-branchingLeft-branching black = independence green = dependence

Syntactic Structure of Terms w1 w2 w3 MonolithicNon-branchingRight-branchingLeft-branching black = independence green = dependence difficulty finding words

Syntactic Structure of Terms w1 w2 w3 MonolithicNon-branchingRight-branchingLeft-branching black = independence green = dependence difficulty finding wordsserum dioxin level

Syntactic Structure of Terms w1 w2 w3 MonolithicNon-branchingRight-branchingLeft-branching black = independence green = dependence difficulty finding wordsserum dioxin level urinary tract infection

Syntactic Structure of Terms w1 w2 w3 MonolithicNon-branchingRight-branchingLeft-branching black = independence green = dependence difficulty finding wordsserum dioxin level urinary tract infection low back pain

Goal Simple but effective approach to identify the syntactic structure of three-word medical terms

Motivation  Potentially improve the analysis of unrestricted medical text Unsupervised syntactic parsing Mapping of medical terms to standardized terminologies

Related Work  Previously Resnik, 1993 Resnik and Hirst, 1993 Pustejovsky, Anick and Bergler, 1993 Lauer, 1995  Currently Lapata and Keller, 2004 Nakov and Hirst, 2005  Medical Domain Nakov and Hirst, 2005

Example small bowel obstruction

Syntactic Structure small bowel obstruction MonolithicNon-branchingRight-branchingLeft-branching

Method used to determine the structure of a term The Log Likelihood Ratio is the ratio between the observed probability of a term occurring and the probability it would be expected to occur

Log Likelihood Ratio The expected probability of a term is often based on the Non-branching (Independence) Model P(small bowel obstruction) P(small) P(bowel) P(obstruction)

Log Likelihood Ratio The expected probability of a term is often based on the Non-branching (Independence) Model P(small bowel obstruction) P(small) P(bowel) P(obstruction) OBSERVED PROBABILITY

Log Likelihood Ratio The expected probability of a term is often based on the Non-branching (Independence) Model P(small bowel obstruction) P(small) P(bowel) P(obstruction) EXPECTED PROBABILITY

Extended Log Likelihood Ratio The expected probabilities can be calculated using two other models Non-branchingRight-branchingLeft-branching P(small)P(bowel)P(obstruction)P(small bowel) P(obstruction)P(small) P(bowel obstruction)

Three Log Likelihood Ratio Equations P(small bowel obstruction) P(small) P(bowel) P(obstruction) P(small bowel obstruction) P(small bowel) P(obstruction) P(small bowel obstruction) P(small) P(bowel obstruction) Non-branching Right-branchingLeft-branching

Expected Probability The expected probability of a term differs as does the Log Likelihood Ratio Non-branchingRight-branchingLeft-branching P(small) P(bowel) P(obstruction)P(small bowel) P(obstruction)P(small) P(bowel obstruction) LL = 11,635.45 LL = 5,169.81LL = 8,532.90

Model Fitting The model with the lowest Log Likelihood Ratio that best describes the underlying structure of the term Non-branchingRight-branchingLeft-branching P(small) P(bowel) P(obstruction)P(small bowel) P(obstruction)P(small) P(bowel obstruction) LL = 11,635.45 LL = 5,169.81LL = 8,532.90

ReCap  The Log Likelihood Ratio is calculated for each possible model Non-branching Left branching Right branching  The probabilities for each model are calculated using frequency counts from a corpus  Term is assigned structure whose model has the lowest Log Likelihood Ratio

Test Set 708 three word terms from the SNOMED-CT 73 terms MonolithicNon-branchingRight-branchingLeft-branching 6 terms378 terms251 terms

Test Set  Syntactic structure determined by two medical text indexers Kappa = 0.704  Frequency counts obtained from over 10,000 clinical notes from the Mayo Clinic

Results with Monolithic Terms Technique Percentage agreement with human experts 35.5 53.4 74.8

Results without Monolithic Terms Technique Percentage agreement with human experts 39.5 59.5 83.5

Limitations  Does not identify Monolithic Terms Collocation extraction Dictionary lookup  Number of words in term grows so does the number of models Limit length of terms to 5 words

Conclusions  Simple but effective method for identifying three-word terms  Method uses the Log Likelihood Ratio  Easily extended to four and five word terms

Future Work  Improve accuracy Explore other measures of association  Dice coefficient, phi... Incorporate multiple measures  Extend method to four and five word terms

Thank you Software: Ngram Statistic Package (NSP) www.d.umn.edu/~tpederse/nsp.html Log Likelihood Ratio Models www.cs.umn.edu/~bthomson/mti.html

Log Likelihood Equation

Expected Values Non-branching: Left-branching: Right-branching:

Non-branching: m xyz = n x++ * n +y+ * n ++z / n +++ Left-branching: m xyz = n xy+ * n ++z / n +++ Right-branching: m xyz = n x++ * n +yz / n +++

Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes¹ Ted Pedersen² and Serguei V. Pakhomov¹ University of Minnesota¹.

Similar presentations

Presentation on theme: "Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes¹ Ted Pedersen² and Serguei V. Pakhomov¹ University of Minnesota¹."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes¹ Ted Pedersen² and Serguei V. Pakhomov¹ University of Minnesota¹.

Similar presentations

Presentation on theme: "Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes¹ Ted Pedersen² and Serguei V. Pakhomov¹ University of Minnesota¹."— Presentation transcript:

Similar presentations

About project

Feedback