Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel.

Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel

Motivation word2vec: An algorithm that associates closely-related words. Combining with the outcome of our project, this algorithm will help creating a medical text summarizer.

Project Goals ● Create a fast, scalable, highly accurate, machine-learning based classifier which predicts whether a given document is medical or not. ● Distributively run this classifier over a large amount of web content and extract medical documents.

Building a labeled dataset Problem: Manually collect medical and non- medical documents is almost impossible. Solution: Using Wikipedia’s archive files, we tagged wiki pages based on their category and title. Result: decent amount of medical and non-medical data.

Training Phase DocumentsBoilerpipeTokenizing TF- IDF Feature selection Data transformation flow:

Training Phase

Classifier Evalutaion - Measures

Evalutaion Phase Configuration Parameters: Classification algorithm Amount of features Stemming or not Each configuration was trained and then tested on a random 5% of the tagged dataset.

Results - Graph Average F-Measure Features Count

Distributed Programming Phase Use Apache Spark framework Iterate ClueWeb web archives (~14 TB) in a master-slave architecture Use the same training pipeline to convert web document to vector Tag each document and export medical- tagged documents’ IDs.

Questions?

Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel.

Similar presentations

Presentation on theme: "Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel.

Similar presentations

Presentation on theme: "Medical Data Classifier undergraduate project By: Avikam Agur and Maayan Zehavi Advisors: Prof. Michael Elhadad and Mr. Tal Baumel."— Presentation transcript:

Similar presentations

About project

Feedback