Improving Health Question Classification by Word Location Weights

Slides:



Advertisements
Similar presentations
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Advertisements

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
Using IR techniques to improve Automated Text Classification
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
1/1/ Question Classification in English-Chinese Cross-Language Question Answering: An Integrated Genetic Algorithm and Machine Learning Approach Min-Yuh.
Recommender systems Ram Akella November 26 th 2008.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Gender and 3D Facial Symmetry: What’s the Relationship ? Xia BAIQIANG (University Lille1/LIFL) Boulbaba Ben Amor (TELECOM Lille1/LIFL) Hassen Drira (TELECOM.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
1 Text Classification for Healthcare Information Support Rey-Long Liu ( 劉瑞瓏 ) Dept. of Medical Informatics Tzu Chi University, Taiwan.
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Domain-Specific Iterative Readability Computation Jin Zhao 13/05/2011.
Identifying Disease Diagnosis Factors by Proximity-based Mining of Medical Texts Rey-Long Liu *, Shu-Yu Tung, and Yun-Ling Lu * Dept. of Medical Informatics.
Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Manu Chandran. Outline Background and motivation Over view of techniques Cross validation Bootstrap method Setting up the problem Comparing AIC,BIC,Crossvalidation,Bootstrap.
Retrieval of Highly Related Biomedical References by Key Passages of Citations Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
National Taiwan University, Taiwan
Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: YU-SHENG.
Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.
Ranking Definitions with Supervised Learning Methods J.Xu, Y.Cao, H.Li and M.Zhao WWW 2005 Presenter: Baoning Wu.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Consensus Group Stable Feature Selection
Class Imbalance in Text Classification
Proximity-based Ranking of Biomedical Texts Rey-Long Liu * and Yi-Chih Huang * Dept. of Medical Informatics Tzu Chi University Taiwan.
Saisakul Chernbumroong, Shuang Cang, Anthony Atkins, Hongnian Yu Expert Systems with Applications 40 (2013) 1662–1674 Elderly activities recognition and.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Artificial Intelligence for Data Mining in the Context of Enterprise Systems Thesis Presentation by Real Carbonneau.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Signature Recognition Using Neural Networks and Rule Based Decision Systems CSC 8810 Computational Intelligence Instructor Dr. Yanqing Zhang Presented.
TBAS: Enhancing Wi-Fi Authentication by Actively Eliciting Channel State Information Muye Liu, Avishek Mukherjee, Zhenghao Zhang, and Xiuwen Liu Florida.
Learning to Detect and Classify Malicious Executables in the Wild by J
Automatically Labeled Data Generation for Large Scale Event Extraction
Queensland University of Technology
Bridging Domains Using World Wide Knowledge for Transfer Learning
Guillaume-Alexandre Bilodeau
JPEG Compressed Image Retrieval via Statistical Features
Hefei Normal University
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
National Taiwan University
Support Vector Machines
Linear regression project
Learning to Rank Shubhra kanti karmaker (Santu)
Inferential statistics,
PEBL: Web Page Classification without Negative Examples
A Hybrid PCA-LDA Model for Dimension Reduction Nan Zhao1, Washington Mio2 and Xiuwen Liu1 1Department of Computer Science, 2Department of Mathematics Florida.
Text Detection in Images and Video
Steve Zhang Armando Fox In collaboration with:
Project 1: Text Classification by Neural Networks
Discriminative Frequent Pattern Analysis for Effective Classification
Citation-based Extraction of Core Contents from Biomedical Articles
Model generalization Brief summary of methods
Housam Babiker, Randy Goebel and Irene Cheng
Dynamic Category Profiling for Text Filtering and Classification
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Extracting Why Text Segment from Web Based on Grammar-gram
Discriminative Training
Essential concepts in each FAQ
Presenter: Donovan Orn
Presentation transcript:

Improving Health Question Classification by Word Location Weights Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

Outline Background Problem definition The proposed approach: WLW Empirical evaluation Conclusion

Background

Categories of Health Questions

Classification of Health Questions Why health questions? Health questions provide both reliable and readable health information Why classification of health questions? Given a health question q, retrieve related questions (and their answers)

Problem Definition

Goal & Motivation Goal Motivation Target: Chinese Health Questions (CHQs) Contribution: Developing a technique WLW (Word Location Weight) that estimates the location weights of words in a CHQ based on their locations Motivation Location weights can be used by classifiers (e.g., SVM) to improve the classification Classifying in-space CHQs (cause, diagnosis, process) Filtering out-space CHQs (may be whatever)

Basic Idea Those words that are more related to the category of a CHQ tend to appear at the beginning and end of the CHQ Examples: 如何(how to)克服(deal with)緊張(nervous)的情緒(mood)?  process 嬰兒(infant)體溫(body temperature)太低(too low)怎麼辦(how to do)?  process

Related Work Recognition of question types (e.g., when, where) Weakness: Types  Intended categories of CHQs Classification by parsing Weakness I: Parsing Chinese is still challenging Weakness II: CHQs are NOT always well-formed Classification by pattern matching Weakness: Difficult to construct the string patterns

The Proposed Approach: WLW

Main Challenges (1) Defining the two weights of a location p in a CHQ q

Main Challenges (cont.) (2) Encoding the location weights of a word w into two features for the underlying classifier

Interesting Behaviors of WLW A word w in a question q has two features Fvaluefront and Fvaluerear Applicable to different categories and languages (e.g., English) When w is far from the front and the rear Both features reduce to the term frequency (TF) of w WLW reduces to traditional feature-encoding approach (using TF as the features)

Empirical Evaluation

Experimental Design CHQs were downloaded from a health information provider 864 in-space CHQs cause (category 1): 313 diagnosis (category 2): 92 process (category 3): 459 100 out-space CHQs whatever (general description) Five-fold cross validation

Underlying Classifiers The Support Vector Machine (SVM) classifier

Results: Classification of In-Space CHQs Evaluation criteria Micro-averaged F1 (MicroF1) Macro-averaged F1 (MacroF1)

SVM+WLW is significantly better than SVM

Results: Filtering of Out-Space CHQs Evaluation criteria Filtering ratio (FR) = # out-space CHQs successfully rejected by all categories / # out-space CHQs Average number of misclassifications (AM) = # misclassifications for the out-space CHQs / # out-space CHQs

SVM+WLW achieves higher FR and lower AM

Conclusion

Healthcare consumers often read health information on the Internet Health questions as the valuable resources for healthcare consumers Providing both reliable and readable health information Classification of health questions is basis for the retrieval of related questions cause, diagnosis, process, whatever WLW can help SVM to improve the classification of CHQs