Tiran Software RadeX Tahir Bilal Onur Deniz Soner Kara -TURKUAZ Project- RadeX Tahir Bilal Onur Deniz Soner Kara M. Mert Karadağlı Assistant: Umut Eroğul Instructor: Meltem T. Yöndem
Outline Problem Definition Important Aspects Our Approach General Structure Analyzer Component Searcher Component Current Status Prototype Tool and Resources Q/A
Problem Definition Billions of radiology reports Unfortunately, they are stored in free-text format Hard to search and retrieve Need for searchable information
Important Aspects Text Mining NLP Machine Learning Information Extraction Morphological Analysis Named Entity Recognition Machine Learning Neural Networks, Decision Trees ...
Our Approach RadeX, Radiology Data Extractor will enable.. Modular machine learning component Support for internal/external dictionary connection Template-based approach for finalizing
General Structure
General Structure (cont.) Analyzer Component Preprocess free text Look-up internal and external lexicons Gives semantic to words Extracts searchable data Searcher Component Send query strings to database Retrieve corresponding information
Current Status Preprocessing. Connecting and using external sources. Database implementation. Applying SVM to unrelated but tagged corpus.
Current Status (cont.) Mapping Turkish terms to English translations. Finding stem of unknown words. Constructing lexicons. Features of verbs, adjectives, nouns...
In Prototype we will be able to... ..decompose reports into sub-parts, sentences and words, .. analyze words using Zemberek and a stemmer. .. give semantics to words via internal/external lexicons .. extract simple information using pre-defined templates
Tools & Resources SVM-Light WordNet JWNL TDK / Zargan Zemberek, PostgreSQL
Any Questions?