Identifying Drug Related Events from Social Media

Slides:



Advertisements
Similar presentations
Mining Officially Unrecognized Side effects of drugs by combining Web Search and Machine learning Carlo Carino, Yuanyuan Jia, Bruce Lambert, Patricia West.
Advertisements

Web Archives, IDEAL, and PBL Overview Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science Virginia Tech Blacksburg, VA, USA 21.
Website Conversion & Virtual Food Drive Feeding America: Southwest Virginia Bradley BaileySarah Dotson Taehee HanHunter Shepherd Susan FengSean Kelley.
Introduction. Mark Twain (top left) Get a quote and copy it. Copy and answer the following questions about your quote: 1. What is this quote talking about?
Note: Many problems in this packet will be completed together in class during review time. Students are not expected to complete every single problem in.
Tweets Metadata May 4, 2015 CS Multimedia, Hypertext and Information Access Department of Computer Science Virginia Polytechnic Institute and State.
Numbers in Science. Why Do We Collect Data? We collect data to analyze test results, calculate averages, compare our data with other sets of data, and.
Problem Based Learning To Build And Search Tweet And Web Archives Richard Gruss Edward A. Fox Digital Library Research Laboratory Dept. of Computer Science.
Diabetes Complications Eye Disease Nerve Disease Kidney Disease Heart and Vascular Disease Oral Health Sexual Function.
Spreadsheet Vocabulary Multimedia Lab Kathleen Pape.
VTCAR Final Presentation CS Dr. Fox Virginia Tech - Blacksburg, VA /3/16 Kyle Simmons, Steven Whitehead, Sebastian Welsh Client: Dr. Julee.
August 17, SC. 912.N.1.1- Scientific Method
“ I was struggling to keep on top of my diabetes. My Health Advocate nurse made it easy. ” Robert was recently diagnosed with type 2 diabetes and found.
Unit 1 Test, “Number Sense”
Doctor About Blood Pressure Doctor About Blood Pressure
Group 7 Hospital Readmission Predictive Analytics
Medicaid Expansion.
Glucometer Ultra Test Strips
Rdoc2vec Jake Clark, Austin Cooke, Steven Rolph, Stephen Sherrard
Common Crawl Mining Team: Brian Clarke, Tommy Dean, Ali Pasha, Casey Butenhoff Manager: Don Sanderson (Eastman Chemical Company) Client: Ken Denmark.
Background Check Website for R4 OpSec, LLC
Section 4.4 Treating Mental Disorders Objectives
Zenodo Data Archive Irtiza Delwar, Michael Culhane, John Sizemore, Gil Turner Client: Dr. Seungwon Yang Instructor: Dr. Edward A. Fox CS 4624 Multimedia,
Script Your Future Adherence Challenge
Patient Medical Records
Basic Statistics Overview
VT microaggressions.cs.vt.edu
Virginia Tech Center for Drug Discovery Website Migration and Redesign
VR4GETAR CS4624: Multimedia, Hypertext and Information Access
Visualizations of School Shootings
Clustering tweets and webpages
Pathways Web CS4624 Multimedia, Hypertext, and Information Access
AI in Cybersecurity Kevin Song, Shivani Rajasekaran, Vedant Tyagi, Paul Kim CS 4624: Multimedia, Hypertext, and Information Access Virginia Polytechnic.
Rahul Bansal, Lisa Kirch, Joe Morales, and Christopher Walker
CS 5604 Information Storage and Retrieval
Maptivity Conor O’Neill, Kaz Eslami, Cody Douglass
The Team Ernesto Cortes Kipp Dunn Sar Gregorczyk Alex Schmidt
Hey everyone, I’m Sunny …harsh caroline xavier
Graph Query Portal Amit Dayal David Brock
Two Way Frequency Tables
VT Arunima Singh, Brandon Falcone, Cyndy Ejanda, Robert Wenger
Multimedia Database Virginia Polytechnic Institute and State University Blacksburg, VA CS 4624 Multimedia, Hypertext and Information Access Client.
Social Interactome Recommender Team Final Presentation
Event Focused URL Extraction from Tweets
Cloud Digital Repo Optimization
Kidney Check - Manitoba
Collection Management Webpages Final Presentation
Stream Field Final Project Presentation
Event Trend Detector Ryan Ward, Skylar Edwards, Jun Lee, Stuart Beard, Spencer Su CS 4624 Multimedia, Hypertext, and Information Access Instructor: Edward.
Final Presentation: Neural Network Doc Summarization
Twitter Equity Firm Value
LucidWorks: Vectorize Workflow Module
News Event Detection Website Joe Acanfora, Briana Crabb, Jeff Morris
Paleontology Topic Trends
Tweet URL Analysis Guoxin Sun, Kehan Lyu, Liyan Li
Social Interactome Recommender Team
IDRgeneralization: Music Appreciation
Script Your Future Adherence Team Challenge
Blacksburg to Guatemala Archive
Critical Path Analysis
Using Medications Wisely Consumer Education Update
Graphing Notes Graphs and charts are great because they communicate information visually. For this reason, graphs are often used in science, newspapers,
Implementing a quality management system to improve the product design process Cordel Michel Mentored by Mrs. Jamie Boney Introduction Results Results.
________________’s The BFG Journal.
High Risk of Heart Disease in South Asians
Touching on medication
Visualization of prescriptive algorithm: provider dashboard prototype.
Building pattern  Complete the following tables and write the rule 
CS2310 Milestone2 Zihang Huang Project: pose recognition using Kinect
Python4ML An open-source course for everyone
Presentation transcript:

Identifying Drug Related Events from Social Media Jeongho Noh Jisu You Yoonju Lee Woo Jin Kye Sungho Kim CS4624 Multimedia, Hypertext, and Information Access Professor: Edward A. Fox Client: Weiguo Fan, Long Xia May 2, 2017 Virginia Tech, Blacksburg, VA 24061 Hi We are Identifying Drug Related Events from Social Media team My name is

Innovative information system and processing steps (Crawl social network reviews on drugs that are used to treat diabetes - by client) Label the crawled data manually Generate side effect dictionary to recognize side effect entities. Visualize the resulting information for doctors and patients Create confusion matrix to see result I will introduce the processing steps for our innovative information system. First we got crawled social network review data that are used to treat diabetes from our client. We labeled each data to start, and then we generated side effect dictionary to recognize side effect entities. After that we create pie chart and confusion matrix to see visualized result

Data Labeling Manual labeling is necessary to build a problem specific dictionary. Labeled about 235,000 words for named entity recognition. Table on the right shows sequence of words from reviews retrieved using crawler

Data Labeling - Named Entity Recognition Four different labels for different entities: D – drug entity S – side effect entity M – miscellaneous medical terms that are not a drug entity or a side effect entity O – others This labeling process is very important since as you can see from table on the left, word blood and sugar is labeled medical term instead of others since they are in sequence and blood sugar relates to medical term for this problem specific dictionary for diabetic drug reviews. Also from tables on the right words barely, feel, my, toes are labeled side effect entity due to the mention of swelling in the same review.

Data Labeling - Named Entity Recognition Cont. There are a total of 2242 unique side effect entities and 412 unique drug entities out of 228881 named entities. And this pie chart shows the number of entities after labeling process. There are a total of 2242 unique side effect entities and 412 unique drug entities. This problem specific dictionary was used to create smokelist which woojin is going to talk about next.

2. Generating Side Effect Dictionary - Smoke List From the manually labeled list of words, we created a side effect dictionary. The first step was to create a smoke list that contains prevalence scores of each word. The scores indicate how much likely each word is associated with side effect.

2. Generating Side Effect Dictionary - Filtering Out of the side effect entities, we filtered out some neutral words like ‘my’.

2. Generating Side Effect Dictionary - Result The resulting dictionary contains total 2076 unique words.

3. Visualization *values are in percentage Using the dictionary, we created this pie chart that shows the top 20 symptoms of the drugs that treat diabetes

4. Validation Confusion Matrix summary of the result. Select two hundred reviews from the list of 5585 reviews labeled by PamTAT 100 from the top and 100 from the bottom of the list. Hypothesis: All the reviews from the top of the list will contain a mention of side effects, and the reviews from the bottom of the list will not.

Demo https://www.youtube.com/watch?v=QYgBhBQtGsQ