Extracting claim sentences from biomedical documents: a pilot study focusing on drug-drug interaction claims Linh Hoang1, Richard D. Boyce2, Jodi Schneider1 1 School of Information Sciences, University of Illinois at Urbana-Champaign 2 Department of Biomedical Informatics, School of Medicine, University of Pittsburgh In the biomedical domain, a large amount of new research is published every year. It is difficult to keep track of new research claims. Drug-drug interactions are a particularly important kind of claim. Only claims of interactions have been extracted with NLP, more work is needed to extract supporting evidence of drug-drug interactions. Semantic models that connect claims with their supporting evidence could support better information retrieval of biomedical literature. Given a dataset of drug-drug interaction claims and evidence, our goals are: 1) To understand what types of claims appear 2) To understand how data and methods are used as supporting evidence 3) To apply text mining and machine learning to automate the process of extracting claims and evidence 4) To apply a semantic model connecting claims and supporting evidence Motivation Pilot Study Methods Completed so far: Categorized annotated claims into groups. Identified the most common components each group of claims. 1. Analyze existing dataset Ongoing: Get full-text of the original documents that contain claims. Identify the document sections the claims were found in. Analyze the language used in the claims by comparing words in the annotated claims with the document titles, and identifying the most common words that appear in the claims. 875 drug-drug interaction claims with supporting evidence Human annotated from 143 documents (journal articles, drug labels, clinical case reports, etc.) Introduction of the Drug Interaction Knowledge Base dataset Annotated claim Drug-drug interaction Supporting evidence 2. Analyze the claims in details claimtext relationship drug1 drug2 aucvalue auctype aucdirection cmaxvalue cmaxtype cmaxdirection interact_with clopidogrel bupropion 60 Percent Increase 40 Ticlopidine, Clopidogrel: In a study in healthy male volunteers, clopidogrel 75 mg once daily or ticlopidine 250 mg twice daily increased exposures (Cmax and AUC) of bupropion by 40% and 60% for clopidogrel, by 38% and 85% for ticlopidine, respectively. The exposures of hydroxybupropion were decreased. Initial Data Analysis Results Next steps: Propose features to extract claims and evidence Build machine learning model to detect the claim and evidence automatically. Test and evaluate the models. Category 1: Claim includes phamacokinetics1 (PK) measurements “Combined administration of racemic citalopram (40 mg) and ketoconazole (200 mg), a potent CYP3A4 inhibitor, decreased the Cmax and AUC of ketoconazole by 21% and 10%, respectively, and did not significantly affect the pharmacokinetics of citalopram.” Category 2: General claim: doesn’t include PK measurements, specifically mentions participants, methods & results “A 77-year-old man with dyslipidemia treated with simvastatin was prescribed a course of clarithromycin by his general practitioner for an upper respiratory tract infection. He subsequently presented to the hospital with generalized muscle pain, weakness, and inability to walk.” Category 3: General claim: doesn’t include PK measurements or methods or results “The following report describes a case of rhabdomyolysis that occurred in an acutely ill patient who was taking atorvastatin in combination with gemfibrozil.” 3. Implement automatic extraction Next steps: Represent claims and supporting evidence (methods & data) in a semantic model. 4. Apply a semantic model In the above examples: Drug entities (mandatory component): two drugs that interact (in RED) Interaction (mandatory component): the direction of the interaction (in PURPLE) Supporting evidence (optional component): information (e.g. phamacokinetics measurements—AUC or Cmax) that indicates an interaction, or lack of interaction, between the drugs (in GREEN) Context (optional component): the context in which the drug-drug interaction occurs (in BLUE) 1: Pharmacokinetics is the study of drug absorption, distribution, metabolism, and excretion. Partially supported by NIH R01LM011838.