Mining Officially Unrecognized Side effects of drugs by combining Web Search and Machine learning Carlo Carino, Yuanyuan Jia, Bruce Lambert, Patricia West.

Slides:



Advertisements
Similar presentations
CWS: A Comparative Web Search System Jian-Tao Sun, Xuanhui Wang, § Dou Shen Hua-Jun Zeng, Zheng Chen Microsoft Research Asia University of Illinois at.
Advertisements

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
A Quality Focused Crawler for Health Information Tim Tang.
Transportation mode detection using mobile phones and GIS information Leon Stenneth, Ouri Wolfson, Philip Yu, Bo Xu 1University of Illinois, Chicago.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Recall: Query Reformulation Approaches 1. Relevance feedback based vector model (Rocchio …) probabilistic model (Robertson & Sparck Jones, Croft…) 2. Cluster.
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.
Presented by Zeehasham Rasheed
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Improving web image search results using query-relative classifiers Josip Krapacy Moray Allanyy Jakob Verbeeky Fr´ed´eric Jurieyy.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Searching the Web Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
CHEE DRUG PRODUCT DEVELOPMENT u Drug ä agent intended for use in the diagnosis, mitigation, treatment, cure, or prevention of disease in man or animals.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.
CZ3253: Computer Aided Drug design Lecture 3: Drug and Cheminformatics Databases Prof. Chen Yu Zong Tel:
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
25/03/2003CSCI 6405 Zheyuan Yu1 Finding Unexpected Information Taken from the paper : “Discovering Unexpected Information from your Competitor’s Web Sites”
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Word Sense Disambiguation in Queries Shaung Liu, Clement Yu, Weiyi Meng.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Understanding User’s Query Intent with Wikipedia G 여 승 후.
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
LOGO 1 Mining Templates from Search Result Records of Search Engines Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hongkun Zhao, Weiyi.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Medical Information Retrieval: eEvidence System By Zhao Jin Mar
GRID ENABLED SYSTEM FOR MEDICAL IMAGE GATHERING, ANALYZING, RETRIEVAL AND PROCESSING Gorgi Kakasevski†, Aneta Buckovska*, Suzana Loskovska*, Ivica Dimitrovski*
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Learning to Rank From Pairwise Approach to Listwise Approach.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Cs Future Direction : Collaborative Filtering Motivating Observations:  Relevance Feedback is useful, but expensive a)Humans don’t often have time.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Computational Linguistics Courses Experiment Test.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
PEBL: Web Page Classification without Negative Examples
Data Mining Chapter 6 Search Engines
Panagiotis G. Ipeirotis Luis Gravano
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

Mining Officially Unrecognized Side effects of drugs by combining Web Search and Machine learning Carlo Carino, Yuanyuan Jia, Bruce Lambert, Patricia West and Clement Yu University of Illinois at Chicago

Motivation Drugs have side effects Food and Drug Administration (FDA) requires drug companies to do extensive clinical trials before a drug enters the market place Not all side effects of a given drug are officially recognized by the FDA Recent case: Vioxx

Objective Find unrecognized side effects of drugs Approach Submit a query ( drug name, side effect) to a Web search engine Extract from the retrieved pages all side- effects which are found in those pages

Problems associated with this approach (1)May not retrieve enough relevant documents (2)May retrieve a lot of irrelevant documents (3)May not want to have entire documents

Want to have more relevant documents Modify the query to become Active ingredients = the chemical compounds forming the drug

Want to reduce the number of irrelevant documents Web retrieval | Classification ( machine learning algorithm)

Machine Learning algorithm Neural network Many features: words such as actual side-effects; “side-effects”. “adverse effects”; “safe”, “not” etc Hundreds of such features

Training phase, then test phase Train on 7 drugs: identify the relevant and irrelevant pages retrieved for these 7 drugs; Test on 20 other drugs

Results Each drug retrieves 100 pages from Google; Using the classification algorithm only 16.4 pages/drug are retained by our system; Average precision-accept: 90.3% Average precision-reject: 87.5% Average precision of top 17 pages for Google: 61.2%

Reduce the amount of manual efforts for collecting training data Generate “positive examples” and “negative examples” automatically with high probabilities

Automatic generation of training data Find a drug, d, such that the diseases it treats, T, are disjoint from its known side-effects, S. A retrieved page in response to, if does not contain any known side effect S is “negative; A retrieved page in response to, if does not contain “safe” or “not” in the vicinity of s is “positive”.

Accuracy of generating training examples automatically Accuracy of generating 50 positive examples: 98% Accuracy of generating 50 negative examples: 96%

Validation of unrecognized side-effects Validated by licensed pharmacist and drug information specialist Prilosec: pneumonia Accutane: Watery eye Uroxatral: Yellowing of skin or eyes

Retrieve passages instead of pages For each passage of certain number of words, compute the degree of “relevance”. Output the passage with highest relevance, if it exceeds a threshold. Also output frequencies of side- effects

Summary Proposed system can retrieve unrecognized side-effects of drugs Improve accuracy of retrieve; Need a good medical dictionary to recognize highly related side effects for example, nausea and vomiting

Extensions Our approach, retrieval followed by classification, can be applied to other retrieval problems: Examples: (1)Complications of medical procedures/ operations; (2)Processing of queries of specialized types