IDA2: Intelligent Discovery of Acronyms and Abbreviations Adam Mallen under the advisement of Dr. Craig Struble and Dr. Lenwood Heath.

Slides:



Advertisements
Similar presentations
SYSTEM. 1 What is the GoProbe System? Florida Probe GoProbe ® System Florida Probe FP32 Software and new Wireless Keypad Input Device GoProbe System includes:
Advertisements

Count On Us for… Advanced Training. Agenda Part 1 (9am-12pm) Overview (15 min) File Cabinet/Database work (30 min) Forms iQ (2 hour) Entry Form (30 min)
Update on the Electronic Periodontal Literature Review Dr. Michael B. Goldberg, MSc DDS Assistant Professor University of Toronto.
Quality Education for a Healthier Scotland Prevention and Treatment of Periodontal Diseases in Primary Care Dental Clinical Guidance Published June 2014.
Finding Evidence to Support Physical Therapy Clinical Practice: DPT.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
Informatics and Clinical Trials Marjorie Jeffcoat DMD University of Alabama University of Pennsylvania.
Diagnosis and Treatment of Periodontal Disease
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
CS347 Review Slides (IR Part II) June 6, 2001 ©Prabhakar Raghavan.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
Behavioral Health Services for Injured or Ill workers – Collaborative Care Analysis and Recommendations January 22, 2015.
HIBBs is a program of the Global Health Informatics Partnership Introduction to Form Design Regional East African Centre for Health Informatics (REACH-INFORMATICS)
 The purpose of periodontal therapy is increase the longevity of the person natural dentition by preserving the support structures of the teeth.  Periodontal.
Dental Care of the Future: Part I David J.Apsey, DDS
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
KSU College of Dentistry PDS Presented by : Dr.Khalid AL-Hezaimi Presented by : Dr.Khalid AL-Hezaimi.
SVMLight SVMLight is an implementation of Support Vector Machine (SVM) in C. Download source from :
Data collection, extraction, and central collation Induction Event.
Annotating Search Results from Web Databases. Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces.
Abstract Tooth loss data from 100 periodontal patients (2509 teeth) under maintenance care for up to 15 years were analyzed to develop a system for determination.
NICTA Copyright 2013From imagination to impact Identifying Publication Types Using Machine Learning BioASQ Challenge Workshop A. Jimeno Yepes, J.G. Mork,
TagHelper: Basics Part 1 Carolyn Penstein Rosé Carnegie Mellon University Funded through the Pittsburgh Science of Learning Center and The Office of Naval.
In The Name Of God. Patient Profile Gender: maleGender: male Age: 45Age: 45 Occupation:Occupation: Orthopedic resident Chief complaint: “ I have bleeding.
In The Name Of God. Patient Profile Gender: maleGender: male Age: 45Age: 45 Occupation:Occupation: Orthopedic resident Chief complaint: “ I have bleeding.
Career Information Copyright © Texas Education Agency, All rights reserved.
The identification of interesting web sites Presented by Xiaoshu Cai.
Resolving abbreviations to their senses in Medline S. Gaudan, H. Kirsch and D. Rebholz-Schuhmann European Bioinformatics Institute, Wellcome Trust Genome.
USE OF TOPICAL DESSICANT AGENT (HYBENX®) AS AN ADJUNCT TO ULTRASONIC DEBRIDEMENT IN THE INITIAL TREATMENT OF CHRONIC PERIODONTITIS: A CLINICAL AND MICROBIOLOGICAL.
EPIDEMIOLOGY OF PERIODONTAL DISEASE
Erbium:YAG laser compared to scaling and root planing in periodontal treatment A controlled, prospective clinical study Frank Schwarz*¹, Anton Sculean²,
Hospitalization Prediction From Health Care Claims Adithya Renduchintala, Benjamin Martin, & Lance Legel University of Colorado Boulder  Data Mining 
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Author Name Disambiguation in Medline Vetle I. Torvik and Neil R. Smalheiser August 31, 2006.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
©2012 Paula Matuszek CSC 9010: Text Mining Applications Lab 3 Dr. Paula Matuszek (610)
Evidence Based Practice Alice Knott, RN November 11, 2008.
University of the Aegean AI – LAB ESWC 2008 From Conceptual to Instance Matching George A. Vouros AI Lab Department of Information and Communication Systems.
Journal Club Alcohol, Other Drugs, and Health: Current Evidence November-December 2012.
CCC adult course – periodontology MPEs and competency exam guidelines Every student is expected to score 8 points throughout the academic year to fulfill.
Personal Health Budgets Evaluation Evaluation of the Personal Health Budgets Pilots Wider Cohort Pilot Sites.
Relevance Feedback Prof. Marti Hearst SIMS 202, Lecture 24.
Visar Bunjaku DDS 1* Visar Bunjaku DDS 1*, Aneta Atanasovska – Stojanovska DDS PhD 2, Mirjana Popovska DDS PhD 2, Shefqet Mrasori DDS PhD 3, Metush Disha.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Appendix 1: Search strategies across databases Oluwatosin Tokede et al. Efficacy of Ozone as an Adjunctive Anti-microbial in the Non-surgical Treatment.
謝世賢 ▲ ( Hsieh SH), 林承俊 ( Lin CJ) 戴德森醫療財團法人嘉義基督教醫院牙周病科 (Periodontal Department of Ditmanson Medical Chia-Yi Christian Hospital) Systemic administration.
AZITHROMYCIN AS AN ADJUNCTIVE TREATMENT OF GENERALIZED SEVERE CHRONIC PERIODONTITIS: CLINICAL, MICROBIOLOGIC AND BIOCHEMICAL PARAMETERS Buket Han, Gulnur.
Evidence-Based Medicine in PubMed PubMed for Trainers, Summer 2016 U.S. National Library of Medicine (NLM) and NN/LM Training Office.
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
The Surgical Phase of Therapy
American Evaluation Association
The DEPression in Visual Impairment Trial:
Application of Classification and Clustering Methods on mVoC (Medical Voice of Customer) data for Scientific Engagement Yingzi Xu, Department of Statistics,
Laboratory Investigations, Prognosis and Treatment Plan
Natural Language Processing of Knee MRI Reports
Waikato Environment for Knowledge Analysis
An Inteligent System to Diabetes Prediction
Category-Based Pseudowords
Machine Learning with Weka
Tutorial for LightSIDE
Author Name Disambiguation in Medline
periodontal disease: diagnosis and treatment
Treatment Plan Seminar
Lecture 10 – Introduction to Weka
Austin Karingada, Jacob Handy, Adviser : Dr
Presentation transcript:

IDA2: Intelligent Discovery of Acronyms and Abbreviations Adam Mallen under the advisement of Dr. Craig Struble and Dr. Lenwood Heath

Example Medline Abstract AIM: In the present 6-month multicentre trial, the outcome of 2 different approaches to non-surgical treatment of chronic periodontitis, both involving the use of a locally delivered controlled-release doxycycline, was evaluated. MATERIAL AND METHODS: 105 adult patients with moderately advanced chronic periodontitis from 3 centres participated in the trial. Each patient had to present with at least 8 periodontal sites in 2 jaw quadrants with a probing pocket depth (PPD) of >=5 mm and bleeding following pocket probing (BoP), out of which at least 2 sites had to be >=7 mm and a further 2 sites >=6 mm. Following a baseline examination, including assessments of plaque, PPD, clinical attachment level (CAL) and BoP, careful instruction in oral hygiene was given. The patients were then randomly assigned to one of two treatment groups: scaling/root planing (SRP) with local analgesia or debridement (supra- and subgingival ultrasonic instrumentation without analgesia). …

System Outline I. Build initial dictionary database using the Schwartz and Hearst abbreviation finding algorithm. II. Use this dictionary as a labeled training set to build an abbreviation disambiguation classifier. III. Use the classifier to predict the expanded forms of ambiguous abbreviations and add them to the dictionary. IV. Implement a web-based front end interface for searching and interacting with the dictionary database.

Building the Dictionary  Scan and find all abbreviations in all Medline baseline abstracts. Almost 19 million Medline abstracts.  Use the Schwartz and Hearst algorithm to find abbreviations defined in the abstract following either form: i. long form ‘(‘ short form ‘)’  e.g. clinical attachment level (CAL) ii. short form ‘(‘ long form ‘)’  e.g. CAL (clinical attachment level)  Create a database of the dictionary and the abstracts in which each abbreviation/long form pair has been found.  Create a front-end web interface for searching and interacting with the database.

Training the Disambiguation Model  Use abbreviation instances found in the building of the dictionary as labeled training data.  Extract lexical features and MeSH headings from the abstract to use as training attributes for each abbreviation’s long form.  Use machine learning algorithm (such as Naïve Bayes classifier, Support Vector Machine, and Vector Space Model) to build classifier for predicting long forms of ambiguous abbreviations.

Progress  Wrote a Java program built on LingPipe’s Medline tutorial code as well as Dr. Struble’s java implementation of the Schwartz and Hearst Algorithm to parse Medline abstracts, find abbreviation/long form pairs, and add them to the dictionary database.  Used Condor to run this program in parallel on the entire Medline baseline.  Found 1,497,702 unique abbreviation/long form pairs in 4,126,655 abstracts

Database Schema Dictionary Id short_form long_form Abstracts Id Pmid

Future Work  Current Work  Exploring statistics and details of the dataset such as the number of associated long forms for each abbreviation and their frequency of appearing in the abstracts.  Building the web-interface for interacting with the database.  Future Work  Decide on features for model training and write tools for extracting these features and training the classifier.  Find ambiguous abbreviations in Medline abstracts and predict their long forms using the classifier. Add these entries to the database.  Create a pipeline for automatically doing this as additional Medline abstracts are released.

Questions?