Presentation is loading. Please wait.

Presentation is loading. Please wait.

Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel.

Similar presentations


Presentation on theme: "Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel."— Presentation transcript:

1 Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel

2 Motivation word2vec: An algorithm that associates closely-related words. Combining with the outcome of our project, this algorithm will help creating a medical text summarizer.

3 Project Goals ● Create a fast, scalable, highly accurate, machine-learning based classifier which predicts whether a given document is medical or not. ● Distributively run this classifier over a large amount of web content and extract medical documents.

4 Building a labeled dataset Problem: Manually collect medical and non- medical documents is almost impossible. Solution: Using Wikipedia’s archive files, we tagged wiki pages based on their category and title. Result: decent amount of medical and non-medical data.

5 Training Phase DocumentsBoilerpipeTokenizing TF- IDF Feature selection Data transformation flow:

6 Training Phase

7 Classifier Evalutaion - Measures

8 Evalutaion Phase Configuration Parameters: Classification algorithm Amount of features Stemming or not Each configuration was trained and then tested on a random 5% of the tagged dataset.

9 Results - Graph Average F-Measure Features Count

10 Distributed Programming Phase Use Apache Spark framework Iterate ClueWeb web archives (~14 TB) in a master-slave architecture Use the same training pipeline to convert web document to vector Tag each document and export medical- tagged documents’ IDs.

11 Questions?


Download ppt "Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel."

Similar presentations


Ads by Google