STUDENTLIFE PREDICTIVE MODELING Hongyu Chen Jing Li Mubing Li CS69/169 Mobile Health March 2015.

Slides:



Advertisements
Similar presentations
PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Presentor:
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
Feature Selection Presented by: Nafise Hatamikhah
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
Scikit-learn: Machine learning in Python
Ensemble Learning (2), Tree and Forest
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
This week: overview on pattern recognition (related to machine learning)
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Multi-Layer Perceptrons Michael J. Watts
Multimodal Information Analysis for Emotion Recognition
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Human pose recognition from depth image MS Research Cambridge.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
Predicting Voice Elicited Emotions
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Experience Report: System Log Analysis for Anomaly Detection
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
A Smart Tool to Predict Salary Trends of H1-B Holders
ECE 471/571 - Lecture 19 Review 02/24/17.
Evaluating Classifiers
An Empirical Comparison of Supervised Learning Algorithms
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Dimensionality reduction
Source: Procedia Computer Science(2015)70:
Mixture of SVMs for Face Class Modeling
Basic machine learning background with Python scikit-learn
Machine Learning Basics
Support Vector Machines (SVM)
Importance Weighted Active Learning
Transportation Mode Recognition using Smartphone Sensor Data
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Machine Learning Week 1.
Paulo Gonçalves1 Hugo Carrão2 André Pinheiro2 Mário Caetano2
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Principal Component Analysis
Predicting Breast Cancer Diagnosis From Fine-Needle Aspiration
Implementing AdaBoost
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Machine Learning with Clinical Data
CS639: Data Management for Data Science
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
CAMCOS Report Day December 9th, 2015 San Jose State University
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Support Vector Machines 2
8/22/2019 Exercise 1 In the ISwR data set alkfos, do a PCA of the placebo and Tamoxifen groups separately, then together. Plot the first two principal.
An introduction to Machine Learning (ML)
ECE – Pattern Recognition Midterm Review
Presentation transcript:

STUDENTLIFE PREDICTIVE MODELING Hongyu Chen Jing Li Mubing Li CS69/169 Mobile Health March 2015

Motivation Let’s go further than StudentLife 1.0! Standardized, normalized data set Proof of concept Scientific finding to our question: Can we predict depression from a two week window of StudentLife data collection? Study Design Data cleaning/parsing Feature selection Class determination Predictive classifiers through supervised machine learning methods Validation Case study

StudentLife Dataset PHQ9 Threshold Non- depressed whole time Depressed whole time Depresstion status changed EMA: Sleep Mood Stress Social Exercise, etc. Sensor: Audio Conversation Activity Dark, etc. Feature Class SVM, etc. Prediction N-fold CV Result analysis Accruacy F statistics Precision/ Recall Sensitivity/Sp ecificity Data Preprocessing & Interpolation: Linear Nearest Neighbour Concatenation Case Study PCA Data separation by week Project Design/Workflow Feature Selection

Class Determination via Thresholding Why? Keeps it a classification problem, not a regression problem Depression presents in many different ways Small sample size

Class Determination via Thresholding PHQ-9 scores of students before and after StudentLife

Class Determination via Thresholding Threshold determined by visual inspection on strip plot

Class Determination via Thresholding Consistent with medical literature? PHQ-9 scoreDiagnosis 0-4No Depression 5-9Mild Depression 10-14Moderate Depression 15-19Moderately Severe Depression 20-27Severe Depression Do you at least moderate depression?

Linear Interpolation for EMA Data EMA data is very sparse Interpolation increases number of points

Nearest-Neighbor Interpolation for Sensor Data Sensor data is too dense Interpolation decreases number of points

Standardized Data Set? In the first iteration of StudentLife: Every data collection modality had Different scaling Different periodicity Different quality Now: All 15 depression-related modalities have One value per 24-hour period Comparable scaling A guarantee of good quality (279 samples removed)

Feature Selection Step 1: Decide sliding window time frame Two weeks Balance of enough time to make diagnosis, but short enough to have enough time points for testing Step 2: Feature aggregation Step 3: Dimensionality Reduction We cannot use 105 dimensions to classify only a couple hundred cases!

Principle Component Analysis (PCA)

Top Features from PCA

Random Forest Decision Trees

Predictive classifier Classes: (not depressed, depressed) Features: top features by PCA Training set All depressed Samples(50%) Selected not depressed Samples(50%) SVM model Cross Validation Accuracy = %

Case study ●Participant No.16: Beginning of the term: Not depressed (-1) End of the term: Depressed (+1) Not depressed Depressed

Future Directions Why is this important? 1. Contributes (marginally) to existing medical literature about depression 2. Proof of concept for possible interventions Imagine app that tells you when you could be depressed Connects you with resources to help 3. Standardized data set available Opens door to future analyses Not only on depression Small taste of the beginnings of… StudentLife2.0?