Deema Abdal Hafeth MSc student by research School of Computer Science, University of Lincoln Dr Amr Ahmed Supervisor Dr David Cobham supervisor.

Slides:



Advertisements
Similar presentations
Of.
Advertisements

1 of 20 Evaluating an Information Project From Questions to Results © FAO 2005 IMARK Investing in Information for Development Evaluating an Information.
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Obesity e-Lab Enabling obesity research using the Health Surveys for England: The Obesity e-Lab project Dexter Canoy The University of Manchester
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING An open discussion and exchange of ideas Introduced by Eric Atwell, Language.
Tracking Information Epidemics in Blogspace A paper synopsis Alistair Wright, Ken Tan, Kisan Kansagra, Jenn Houston.
with For students choosing post 18.
Tech Thursday March 25 Ronnie Ancona Enhancing Latin 201 with Instructor and Student Created Audio using Wimba Voice Tools.
Secrecy and silence in Huntington’s disease Eleanor Wilson PhD Student Supervisors: Dr. Kristian Pollock & Dr. Aimee Aubeeluck Funding: The Sue Ryder Care.
Farag Saad i-KNOW 2014 Graz- Austria,
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Introduction There are various theoretical concepts and skills that bioscience students need to develop in order to become effective at solving problems.
MOLEDINA-1 CSE 5810 CSE5810: Intro to Biomedical Informatics The Role of AI in Clinical Decision Support Saahil Moledina University of Connecticut
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Introduction to Research Methodology
Happy semester with best wishes from all nursing staff Dr Naiema Gaber
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Lie Detection using NLP Techniques
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies.
PROGRAMME Audits for the PGA in Professional Skills Thursday 26 August, CSB UHCW 2.30 – 3.15pmPGA and Audit Dr Paul O’Hare 3.15 – 3.30 pmBreak 3.30 – 4.30pmWorkshops.
Section 2: Science as a Process
An Evaluation of Planning and Scheduling Operations in Services A Thesis Proposal By Samuel Chukwuemeka Department of Computer Science Troy University.
Machine Learning Queens College Lecture 1: Introduction.
1 Predicting Download Directories for Web Resources George ValkanasDimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics.
INNOVATIVE THINKING FOR THE REAL WORLD Exploring the Benefits of the Service Improvement Project Lynne Harrison, School of Health.
Towards an activity-oriented and context-aware collaborative working environments Presented by: Ince T Wangsa Supervised by:
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
By Hayley Bates and Nathalie Dean
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction Presenter : Jiang-Shan.
A MIXED MODEL FOR CROSS LINGUAL OPINION ANALYSIS Lin Gui, Ruifeng Xu, Jun Xu, Li Yuan, Yuanlin Yao, Jiyun Zhou, Shuwei Wang, Qiaoyun Qiu, Ricky Chenug.
Where did plants and animals come from? How did I come to be?
Prediction of Influencers from Word Use Chan Shing Hei.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Author : Stamatina Thomaidou, Konstantinos Leymonis, and Michalis Vazirgiannis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Research Methodology For AEP Assoc. Prof. Dr. Nguyen Thi Tuyet Mai HÀ NỘI 12/2015.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
Linear Algebra Course Activity 2: Finding Similarities and Dissimilarities in DNA Sequences of HIV Patients Objective: Classify the types of Distances.
Word Class Noun Paul, paper, speech, playVerb talk, become, likeAdjective young, dark, cheerfulAdverb carefully, quietly, warmly.
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
Planning the Learner Experience Linda Rolfe & Cerian Ayres Petroc.
Real World Experience Panel: Engaging the Patient Robert J. West, PhD Professor of Health Psychology Director of Tobacco Studies Cancer Research UK Health.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Dr.Ali K Al-mesrawi. RESEARCH word is originated from the word “Researche”. Research = ‘Re’+ search’. Re means once again,anew, or a fresh. Search means.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Automated Experiments on Ad Privacy Settings
Artist Identification Based on Song Analysis
NYSDOH AIDS Institute Quality of Care Program eHIVQUAL
Source: Procedia Computer Science(2015)70:
An Inteligent System to Diabetes Prediction
Presented by: Prof. Ali Jaoua
iSRD Spam Review Detection with Imbalanced Data Distributions
Web Mining Department of Computer Science and Engg.
BCI Research at the ISRC, University of Ulster N. Ireland, UK
By Hossein Hematialam and Wlodek Zadrozny Presented by
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
The 7th EAHSC               Tools and techniques for Clinical Decision Support: A case of Tanzania             Author : Augustino Mwogosi Co-authors.
Chapter 10 Content Analysis
Scaling up in Health Sciences.
MSc. Research Methods Week 1- Introduction.
Presentation transcript:

Deema Abdal Hafeth MSc student by research School of Computer Science, University of Lincoln Dr Amr Ahmed Supervisor Dr David Cobham supervisor

Introduction Clinical reports includes valuable medical-related information in free- form text which can be extremely useful in aiding/providing better patient care. Text analysis techniques have demonstrated the potential to unlock such information from text. I2B2* designed a smoking challenge requiring the automatic classification of patients in relation to smoking status, based on clinical reports. This was motivated by the benefits that such classification and similar extractions can be useful in further studies/research, e.g. asthma studies. *Informatics for Integrating Biology and the Bedside,

Aim and Objectives Investigate the potential of text analysis techniques in predicting the smoking status but from user-generated contents such as forums, in analogy with the I2B2 challenge done on the clinical reports. Investigate appropriate compact feature sets that facilitate further level of studies; e.g. Psycholinguistics, as explained later, with the hypothesis that forum posts have different linguistic features and are rich in personal stories, fresh opinions, and thoughts.

Methodology Data collected, systematically and with set criteria, from web forums. Extracted and compared some properties of the text, for forum data and clinical reports. Machine learning (Support Vector Machine) classifier model was built from the collected data, using a baseline feature sets (as per the I2B2 challenge), for each data set (clinical and forum) Another model was built using a new feature set LIWC (Linguistic Inquiry and Word Count) + POS (Part of Speech), for each data set (clinical and forum). Smoking status classification accuracy was calculated for each of the above models on each dataset.

SelfIOther I I’D I’LL I’M I’VE LET’S LETS ME MINE MY MYSELF OUR OURS OURSELVES US WE WE’D WE’LL WE’RE WE’VE I I’D I’LL I’M I’VE ME MINE MY MYSELF HE HE'D HE'LL HE'S HER HERS HERSELF HIM HIMSELF HIS SHE SHE'D SHE'LL SHE'S THEIR THEM THEMSELVES THEY THEY'D THEY'LL THEY'RE THEY'VE Pronounce

Results & Evaluation In general, the classification accuracy from forum posts is found to be in line with the baseline results done on clinical records (figure 1). Using LIWC+POS features (125 feature) did not improve the accuracy, compared to baseline features (>20K feature). But the feature set is compact, fixed length, independent of the dataset and facilitates further levels of studies (Psycholinguistic)

Results & Evaluation Forum’s classification accuracy with LIWC+POS was improved with : o long post o large data set size o removing parts of the features

User-generated contents, such as forums, can be as well as useful as clinical reports. The proposed LIWC+POS feature set, while achieve comparable results, it is highly compact and facilitates further levels of studies (e.g. Psycholinguistics). We expect our work to be useful not only in medical studies but also in Statistical & linguistic studies, access to patient's real-time information, health business (industry)/advertising. For future work Improve the classification accuracy, with LIWC+POS, and use this feature set as a tool to explore further psychological status and studies. Visualisation tool for smokers, in-journey to stop-smoking, past-smoker people to study the process and various factors affecting it, including timings and periods. Similarly the tool could be utilised to identify specific audience (e.g. smokers, in-journey) in forums, to target for specific products or studies. Conclusion

Thank you