Download presentation
Presentation is loading. Please wait.
Published byHannele Hänninen Modified over 5 years ago
1
Low-Rank Sparse Feature Selection for Patient Similarity Learning
2016 IEEE 16th International Conference on Data Mining Mengting Zhan et.al IBM T. J. Watson Research, Yorktown Height, NY 10598, USA Piao Liying
2
Introduction Challenges Background
EMR carries a variety types of data (such as clinical diagnosis, medical treatments and lab results) Huge list of possible medical events High dimensional, heterogeneous, sparse and biased data Collecting labels of patients is expensive and time consuming in medical domains Background Patient information is represented in high dimensional space with noise and redundancy Patient similarity is dependent on particular clinical settings, which implies supervised learning scheme is more useful in medical domains, however, Supervised information is limited but critically important in patient similarity learning.
3
Introduction Proposal Contribution
Supervised information given as pairwise constraints -> much easier to obtain comparing to the absolute labels. Perform feature selection and patient similarity learning at the same time. Contribution Propose an algorithm that performs feature selection and patient similarity learning at the same time. low-rank property makes it scale to large problems. Our method learns from both patient records (unsupervised) and pairwise constraints (supervised), which is easier to obtain than label based supervision.
4
Method Goal is to learn a similarity function:
Two sets of pairwise similarity constraints are given as : Construct a binary label for each pair of If a pair of patients are considered to be similar (e.g. they have the same disease path or symptom), the similarity measure between them should reflect this fact by providing a larger value compared to the ones that are dissimilar. a fixed threshold to control the scale of learned similarities
5
Method Objective function
The problem of similarity learning as a classification problem through loss function
6
Experiment on real world dataset
Data description and preprocessing: 218,680 patients for over four years length, including demographics, medications, lab results and other clinical-related indicators Each patient is repre- sented by vectors according to the medical events sequentially Make vector length uniform via Med2vec Result Patient classification
7
Experiment on real world dataset
Result Clustering Retrieval (KNN under different k)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.