1 Yang Yang, Jie Tang, Juanzi Li Tsinghua University Walter Luyten, Marie-Francine Moens Katholieke Universiteit Leuven Lu Liu Northwestern University.

Slides:



Advertisements
Similar presentations
1 Topic Distributions over Links on Web Jie Tang 1, Jing Zhang 1, Jeffrey Xu Yu 2, Zi Yang 1, Keke Cai 3, Rui Ma 3, Li Zhang 3, and Zhong Su 3 1 Tsinghua.
Advertisements

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.
Active Appearance Models
AIME03, Oct 21, 2003 Classification of Ovarian Tumors Using Bayesian Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens.
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Presentor:
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Modelling Paying Behavior in Game Social Networks Zhanpeng Fang +, Xinyu Zhou +, Jie Tang +, Wei Shao #, A.C.M. Fong *, Longjun Sun #, Ying Ding -, Ling.
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
EE-148 Expectation Maximization Markus Weber 5/11/99.
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
x – independent variable (input)
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Speaker Adaptation for Vowel Classification
Efficient and Numerically Stable Sparse Learning Sihong Xie 1, Wei Fan 2, Olivier Verscheure 2, and Jiangtao Ren 3 1 University of Illinois at Chicago,
Automatic Detection of Excessive Glycemic Variability for Diabetes Management Matthew Wiley, Razvan Bunescu, Cindy Marling, Jay Shubrook and Frank Schwartz.
Sufficient Dimensionality Reduction with Irrelevance Statistics Amir Globerson 1 Gal Chechik 2 Naftali Tishby 1 1 Center for Neural Computation and School.
Scalable Text Mining with Sparse Generative Models
Learning Transportation Mode from Raw GPS Data for Geographic Applications on the Web Yu Zheng, Like Liu, Xing Xie Microsoft Research.
Dan Simon Cleveland State University
Diabetes Mellitus Type 2
1 1 Chenhao Tan, 1 Jie Tang, 2 Jimeng Sun, 3 Quan Lin, 4 Fengjiao Wang 1 Department of Computer Science and Technology, Tsinghua University, China 2 IBM.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
IBM Research – China, 2013 Mining Information Dependency in Outpatient Encounters for Chronic Disease Care Wen Sun, Weijia Shen, Xiang Li, Feng Cao, Yuan.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
1 Active learning based survival regression for censored data Bhanukiran Vinzamuri Yan Li Chandan K.
INTENSITY OF HEALTH SERVICES AND COSTS OF CARE FOR PREVIOUSLY UNINSURED MEDICARE BENEFICIARIES J. Michael McWilliams, M.D. Division of General Medicine.
Bug Localization with Machine Learning Techniques Wujie Zheng
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
American Diabetes association(ADA) statistics for 2008.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
IEEE N SubmissionLiang Li VinnoSlide 1 Project: IEEE P Working Group for Wireless Personal Area Networks (WPANs) Submission.
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Unsupervised Streaming Feature Selection in Social Media
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.
IEEE N SubmissionLiang Li VinnoSlide 1 Project: IEEE P Working Group for Wireless Personal Area Networks (WPANs) Submission.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An integrated scheme for feature selection and parameter setting in the support vector machine modeling.
12/1/2009 Liu, Cheng Yang, Hsiu-Han Han, Seung Yeob 1 CS 590 Project Indoor/Outdoor Classification December 1, 2009 Liu, Cheng Yang, Hsiu-Han Han, Seung.
Journal Club February 7, 2014 Sadie T. Velásquez, MD.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Deep Feedforward Networks
Zeyu You, Raviv Raich, Yonghong Huang (presenter)
LECTURE 11: Advanced Discriminant Analysis
Correlative Multi-Label Multi-Instance Image Annotation
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
AMIA Joint Summits 2017 San Francisco
Machine Learning Basics
Predicting Primary Myocardial Infarction from Electronic Health Records -Jitong Lou.
Weakly Learning to Match Experts in Online Community
Introduction. A Reality Check on BP Goals and Optimizing Patient Adherence to Treatment.
Principal Component Analysis
WHICH PSM METHOD TO USE? THE ASSOCIATION BETWEEN CHOSEN PROPENSITY SCORE METHOD AND OUTCOMES OF RETROSPECTIVE REAL-WORLD TREATMENT COMPARISIONS: EVALUATION.
Actively Learning Ontology Matching via User Interaction
Effects of Pay-for-Performance on the Quality of Primary Care in England Mean Scores for Clinical Quality at the Practice Level for Aspects of Care for.
Low-Rank Sparse Feature Selection for Patient Similarity Learning
Jifan Yu, Chenyu Wang, Gan Luo,
Presentation transcript:

1 Yang Yang, Jie Tang, Juanzi Li Tsinghua University Walter Luyten, Marie-Francine Moens Katholieke Universiteit Leuven Lu Liu Northwestern University Forecasting Potential Diabetes Complications

2 Diabetes Complications Life-Threatening –Over 4.8 million people died in 2012 due to diabetes [1]. –Over 68% of diabetes-related mortality is caused by diabetes complications [2]. – 471 billion USD, while 185 million patients remain undiagnosed [1]. Need to be diagnosed in time [1] [2] [1] [2] coronary heart disease diabetic retinopathy

3 Forecasting Diabetes Complication Routine urine analysis Bilirubin example coronary heart disease diabetic retinopathy Output: diabetes complications Input: a patient’s lab test results

4 Data Set A collection of real clinical records from a hospital in Beijing, China over one year. Clinical record Challenge: feature sparseness Each clinical record only contains different lab tests 65.5% of lab tests exist in < 10 clinical records ( %). ItemStatistics Clinical records181,933 Patient35,525 Lab tests1,945

5 Our Approach

6 Baseline Model I Learning task: Limitations: 1.Cannot model correlations between y 2.Cannot handle sparse features

7 Baseline Model II Objective function: Still cannot handle sparse features!

8 Proposed Model dimensional reduction classification Objective function:

9 Learning Algorithm 1 2

10 Learning Algorithm (cont.) Update the dimensional reduction parameters –The remaining part of SparseFGM could be regarded as a mixture generative model, with the log- likelihood –Jensen’s inequality tells us that –Derivate with respect to each parameters, set them to zero, and get the update equations

11 Learning Algorithm (cont.) Update the classification parameters –New log-likelihood –Adopt a gradient descent method to optimize the new log-likelihood

12 Theoretical Analysis 、 、 indicate

13 Experiments

14 Setting Experiments  Is our model effective?  How do different diabetes complications associate with each lab test?  Can we forecast all diabetes complications well? Comparison Methods SVM (model I) FGM (model II) FGM+PCA (an alternative method to handle feature sparseness) SparseFGM (our approach)

15 Experimental Results HTN: hypertension, CHD: coronary heart disease, HPL: hyperlipidemia SVM and FGM suffer from feature sparseness % in recall. FGM vs. FGM + PCA (increase +40.3% in recall) PGM+PCA vs. SparseFGM (increase +13.5% in F1)

16 Association Pattern Illustration WBC in the urine causes frequent voiding -> no good sleep at night Association score: c: complication, e: lab test insomnia

17 Can We Forecast All Diabetes Complications? HPL can be forecasted precisely based on lab test results.

18 Conclusion We study the problem of forecasting diabetes complications. We propose a graphical model which integrates dimensional reduction and classification into a uniform framework. We further study the underlying associations between different diabetes complications and lab test types.

19 Thanks! Yang Yang