Low-Rank Sparse Feature Selection for Patient Similarity Learning

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Date : 2014/12/04 Author : Parikshit Sondhi, ChengXiang Zhai Source : CIKM’14 Advisor : Jia-ling Koh Speaker : Sz-Han,Wang.
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
Data Mining Classification: Alternative Techniques
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Texture Segmentation Based on Voting of Blocks, Bayesian Flooding and Region Merging C. Panagiotakis (1), I. Grinias (2) and G. Tziritas (3)
Discriminative Segment Annotation in Weakly Labeled Video Kevin Tang, Rahul Sukthankar Appeared in CVPR 2013 (Oral)
Frustratingly Easy Domain Adaptation
Lecture Notes for Chapter 2 Introduction to Data Mining
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
Interconnect Estimation without Packing via ACG Floorplans Jia Wang and Hai Zhou Electrical & Computer Engineering Northwestern University U.S.A.
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on.
What is Cluster Analysis
1 A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions Zhihong Zeng, Maja Pantic, Glenn I. Roisman, Thomas S. Huang Reported.
Introduction to machine learning
Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Joint Image Clustering and Labeling by Matrix Factorization
ACCURATE TELEMONITORING OF PARKINSON’S DISEASE SYMPTOM SEVERITY USING SPEECH SIGNALS Schematic representation of the UPDRS estimation process Athanasios.
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Dongyeop Kang1, Youngja Park2, Suresh Chari2
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
1 Lazy Learning – Nearest Neighbor Lantz Ch 3 Wk 2, Part 1.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
DeepFont: Large-Scale Real-World Font Recognition from Images
Special Topics in Text Mining Manuel Montes y Gómez University of Alabama at Birmingham, Spring 2011.
Data Mining & Knowledge Discovery Lecture: 2 Dr. Mohammad Abu Yousuf IIT, JU.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
Clustering of Trajectory Data obtained from Soccer Game Record -A First Step to Behavioral Modeling Shoji Hirano Shusaku Tsumoto
Supervised Clustering of Label Ranking Data Mihajlo Grbovic, Nemanja Djuric, Slobodan Vucetic {mihajlo.grbovic, nemanja.djuric,
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Reporter: Shau-Shiang Hung( 洪紹祥 ) Adviser:Shu-Chen Cheng( 鄭淑真 ) Date:99/06/15.
Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,
Ultra-high dimensional feature selection Yun Li
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.
N VISUAL ANALYTICS FOR HEALTHCARE: BIG DATA, BIG DECISIONS David Gotz Healthcare Analytics Research Group IBM T.J. Watson Research Center.
Technische Universität München Yulia Gembarzhevskaya LARGE-SCALE MALWARE CLASSIFICATON USING RANDOM PROJECTIONS AND NEURAL NETWORKS Technische Universität.
Experience Report: System Log Analysis for Anomaly Detection
DeepFont: Identify Your Font from An Image
Prepared by: Mahmoud Rafeek Al-Farra
Liang Zheng and Yuzhong Qu
An Introduction to Supervised Learning
Topic Oriented Semi-supervised Document Clustering
Classification and Prediction
Objectives Data Mining Course
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Nearest Neighbors CSC 576: Data Mining.
Sequential Hierarchical Clustering
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Topological Signatures For Fast Mobility Analysis
Learning to Rank with Ties
Data Pre-processing Lecture Notes for Chapter 2
Presentation transcript:

Low-Rank Sparse Feature Selection for Patient Similarity Learning 2016 IEEE 16th International Conference on Data Mining Mengting Zhan et.al IBM T. J. Watson Research, Yorktown Height, NY 10598, USA Piao Liying 2019-02-21

Introduction Challenges Background EMR carries a variety types of data (such as clinical diagnosis, medical treatments and lab results) Huge list of possible medical events High dimensional, heterogeneous, sparse and biased data Collecting labels of patients is expensive and time consuming in medical domains Background Patient information is represented in high dimensional space with noise and redundancy Patient similarity is dependent on particular clinical settings, which implies supervised learning scheme is more useful in medical domains, however, Supervised information is limited but critically important in patient similarity learning.

Introduction Proposal Contribution Supervised information given as pairwise constraints -> much easier to obtain comparing to the absolute labels. Perform feature selection and patient similarity learning at the same time. Contribution Propose an algorithm that performs feature selection and patient similarity learning at the same time. low-rank property makes it scale to large problems. Our method learns from both patient records (unsupervised) and pairwise constraints (supervised), which is easier to obtain than label based supervision.

Method Goal is to learn a similarity function: Two sets of pairwise similarity constraints are given as : Construct a binary label for each pair of If a pair of patients are considered to be similar (e.g. they have the same disease path or symptom), the similarity measure between them should reflect this fact by providing a larger value compared to the ones that are dissimilar. a fixed threshold to control the scale of learned similarities

Method Objective function The problem of similarity learning as a classification problem through loss function

Experiment on real world dataset Data description and preprocessing: 218,680 patients for over four years length, including demographics, medications, lab results and other clinical-related indicators Each patient is repre- sented by vectors according to the medical events sequentially Make vector length uniform via Med2vec Result Patient classification

Experiment on real world dataset Result Clustering Retrieval (KNN under different k)