Playing with features for learning and prediction Jongmin Kim Seoul National University
Problem statement Predicting outcome of surgery
Ideal approach.. ? Training Data Predicting outcome surgery
Predicting outcome of surgery Initial approach –Predicting partial features Predict witch features?
Predicting outcome of surgery 4 Surgery –DHL+RFT+TAL+FDO flexion of the knee ( min / max ) dorsiflexion of the ankle ( min ) rotation of the foot ( min / max )
Predicting outcome of surgery Is it good features? Number of Training data –DHL+RFT+TAL : 35 data –FDO+DHL+TAL+RFT : 33 data
Machine learning and feature Data Feature representation Learning algorithm Feature representation Learning algorithm
Joint position / angle Velocity / acceleration Distance between body parts Contact status … Features in motion
Features in computer vision SIFT Spin image HoGRIFT Textons GLOH
Machine learning and feature
Outline Feature selection - Feature ranking - Subset selection: wrapper, filter, embedded - Recursive Feature Elimination - Combination of weak prior (Boosting) - ADAboosting(clsf) / joint boosting (clsf)/ Gradientboost (regression) Prediction result with feature selection Feature learning?
Feature selection Alleviating the effect of the curse of dimensionality Improve the prediction performance Faster and more cost-effective Providing a better understanding of the data
Subset selection Wrapper Filter Embedded
Feature learning? Can we automatically learn a good feature representation? Known as: unsupervised feature learning, feature learning, deep learning, representation learning, etc. Hand-designed features (by human): 1. need expert knowledge 2. requires time-consuming hand-tuning. When it’s unclear how to hand design features: automatically learned features (by machine)
Learning Feature Representations Key idea: – Learn statistical structure or correlation of the data from unlabeled data –The learned representations can be used as features in supervised and semi-supervised settings
Learning Feature Representations Encoder Decoder Input (Image/ Features) Output Features e.g. Feed-back / generative / top-down path Feed-forward / bottom-up path
Learning Feature Representations σ(Wx) Dz Input Patch x Sparse Features z e.g. Predictive Sparse Decomposition [Kavukcuoglu et al., ‘09] Encoder filt ers W Sigmoid fu nction σ(.) Decoder fi lters D L 1 Spars ity
Stacked Auto-Encoders Encoder Decoder Input Image Class label Features Encoder Decoder Features Encoder Decoder [Hinton & Salakhutdinov Science ‘06]
At Test Time Encoder Input Image Class label Features Encoder Features Encoder [Hinton & Salakhutdinov Science ‘06] Remove decoders Use feed-forward path Gives standard(Convol utional) Neural Network Can fine-tune with bac kprop
Status & plan Data 파악 / learning technique survey… Plan : 11 월 실험 끝 12 월 논문 writing 1 월 시그랩 submit 8 월에 미국에서 발표 But before all of that….
Deep neural net vs. boosting Deep Nets: - single highly non-linear system - “deep” stack of simpler modules - all parameters are subject to learning Boosting & Forests: - sequence of “weak” (simple) classifiers that are linearly combined to produce a powerful classifier - subsequent classifiers do not exploit representations of earlier classifiers, it's a “shallow” linear mixture - typically features are not learned
Deep neural net vs. boosting
Feature learning for motion data Learning representations of temporal data - Model complex, nonlinear dynamics such as style Restricted Boltzmann machine - didn’t understand the concept.. - the result is not impressive
Restricted Boltzmann machine Model complex, nonlinear dynamics Easily and exactly infer the latent binary state given the observations