Text Mining of Medical Documents Michael Elhadad - Raphael Cohen Dept of Computer Science
Natural Language Processing Analyze free text to extract “information” Key challenges: Ambiguity: heart, ברק Variability: diabetes, dm, diab. Applications: Search Text Mining: information extraction, relations Summarization
NLP for Medical Domain Opportunity Availability of online textual documents EHR: mostly textual (release notes) Scientific literature (PubMed) Challenge Methods developed on “regular language” fail on “medical language”
Specific Interest EHR Hebrew NLP Domain Adaptation Exploit rich textual data in EHR. In Hebrew! Hebrew NLP Complex morphology, no dictionaries, no UMLS Domain Adaptation Machine learning methods to port NLP models from one domain to medical domain.
Recent Work in Domain Raphael Cohen, Michael Elhadad and Ohad S Birk, Analysis of free online physician advice services, PLOS ONE, 2013 Raphael Cohen, Noemie Elhadad, Michael Elhadad, Redundancy in Electronic Health Record Corpora: Analysis, Impact on Text Mining Performance and Mitigation Strategies BMC Bioinformatics, 2013. Raphael Cohen and Michael Elhadad, Syntactic Dependency Parsers for Biomedical-NLP, AMIA Proceedings 2012, pp121-128 Raphael Cohen, Yoav Goldberg and Michael Elhadad, Domain Adaptation of a Dependency Parser with a Class-Class Selectional Preference Model, ACL 2012, SRW Raphael Cohen, Avitan Gefen, Michael Elhadad and Ohad S Birk, CSI-OMIM - Clinical Synopsis Search in OMIM, BMC Bioinformatics 2011, 12:65