Predicting Post-Operative Gait of Cerebral Palsy Patients

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Biointelligence Laboratory, Seoul National University
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
SVM—Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
Computer vision: models, learning and inference Chapter 8 Regression.
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
Modeling the Shape of People from 3D Range Scans
Hilbert Space Embeddings of Hidden Markov Models Le Song, Byron Boots, Sajid Siddiqi, Geoff Gordon and Alex Smola 1.
Pattern Recognition and Machine Learning
The Kernel Trick Kenneth D. Harris 3/6/15.
Principal Component Analysis
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
3D Human Body Pose Estimation using GP-LVM Moin Nabi Computer Vision Group Institute for Research in Fundamental Sciences (IPM)
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Gaussian Process Dynamical Models JM Wang, DJ Fleet, A Hertzmann Dan Grollman, RLAB 3/21/2007.
Nonlinear Dimensionality Reduction Approaches. Dimensionality Reduction The goal: The meaningful low-dimensional structures hidden in their high-dimensional.
Summarized by Soo-Jin Kim
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Presented By Wanchen Lu 2/25/2013
Cao et al. ICML 2010 Presented by Danushka Bollegala.
Playing with features for learning and prediction Jongmin Kim Seoul National University.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Outline Separating Hyperplanes – Separable Case
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Canonical Correlation Analysis and Related Techniques Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Ki Hyuk Sung, MD Department of Orthopaedic Surgery Seoul National University Bundang Hospital Long term outcome of SEMLS including DHL in spastic diplegia.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Curve-Fitting Regression
An Introduction to Support Vector Machines (M. Law)
Measure Independence in Kernel Space Presented by: Qiang Lou.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Ch 12. Continuous Latent Variables Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by S.-J. Kim and J.-K. Rhee Revised by D.-Y.
Carlos H. R. Lima - Depto. of Civil and Environmental Engineering, University of Brasilia. Brazil. Upmanu Lall - Water Center, Columbia.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
PCA vs ICA vs LDA. How to represent images? Why representation methods are needed?? –Curse of dimensionality – width x height x channels –Noise reduction.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Therapy of intoeing gait in cerebral palsy AOPA-Orlando-German Day, October 2010 F. Braatz MD, S. Wolf PhD.
연구 계획 이경호 Predicting outcome of surgery Ideal approach.. ? Training Data Predicting outcome surgery.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Ultra-high dimensional feature selection Yun Li
Predicting Post-Operative Patient Gait Jongmin Kim Movement Research Lab. Seoul National University.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Ch 12. Continuous Latent Variables ~ 12
CH 5: Multivariate Methods
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Computer vision: models, learning and inference
Machine Learning Basics
Presenter: Hajar Emami
Machine Learning Dimensionality Reduction
Biointelligence Laboratory, Seoul National University
Large scale multilingual and multimodal integration
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Predicting Post-Operative Gait of Cerebral Palsy Patients Movement Research Lab. Seoul National University

Motivation We want to predict the gait of post-operative patients to compensate doctors’ experience. 정형외과 수술의 효과가 즉각적으로 나타나지 않기 때문에 수술 후 환자의 보행이 어떠할지 예측하기 어렵다. 따라서 의사의 경험과 직관에 의존하여 수술에 대한 판단을 하게 된다. 우리는 수술 후 환자 동작을 예측하여 이러한 판단에 보완이 되도록 하기 위해 본 연구를 진행하였다. //우리는 수술 후 환자 보행이 어떻게 바뀔지를 예측하여 수술을 하는 것이 도움이 될지 아니면 고통만 가중되고 힘들지만 할지를 알아보고//자 한다. //어떤 수술을 할지 근육, 뼈 이런 판단을 예전에는 경험에 의해서만 했지만 수량화된 값으로 제시해주면

Our goal Predicting post-operative gait learning from pre & post operative gait data

Motion predictor Learn a motion predictor from training data set . - : pre-operative gait (input) - : post-operative gait (output) Given new input data, we generate a new motion using the learned predictor. New pre-operative gait, x 새로운 input data, 수술 전 보행 데이터가 시스템에 들어왔을 때 새로운 수술 후 보행 결과는 prediction model을 통해서 만들어진다. Prediction model은 학습 데이터로부터 regression 과정을 통해서 만들어진다. New post- operative gait Training data Regression process Predictor

Regression process . . . . . . Pre-operative gait i Regression Motion to motion i Regression 과정을 좀 더 살펴보겠습니다. 한 명의 수술 전 환자의 gait사이클은 다음과 같은 자세들로 구성되어 있습니다. 마찬가지로 같은 환자의 수술 후 gait 사이클은 다음과 같은 자세들로 구성되어 있습니다. 수술 전후의 여러 명의 환자의 gait를 이용하여 regression을 수행하는데요 우선, Regression을 적용하기 위해 수술 전 후 환자데이터를 벡터화 해야 하는데요. 수술 전 후 환자 자세로부터 모든 관절의 방향 값을 얻어낼 수 있고 이 값들을 쌓아서 벡터화 합니다. i 번째 수술 전 환자의 전체 보행 모션 Xi 에서 i 번째 수술 후 환자 전체 보행 모션 Yi을 motion-to-motion 변환 방법을 기반으로 regression을 수행을 합니다. . . . Post-operative gait

Canonical Correlation Analysis (CCA) Find pairs of basis that maximize the correlation between two variables in the reduced space. Variable X Regression Reduced Y Projection Basis X, CCA Variable Y Basis Y Reduced X With such a training data in hand, we consider canonical correlation analysis, CCA as a good prediction model/ because CCA well explain data dependency between input and output. So CCA can minimize the prediction error./ CCA finds pairs of basis that maximize the correlation between two variables x & y in subspace. / When we perform the regression in the reduced space, the fitting errors are minimized because two variables are highly correlated in the reduced space./ X bar and Y bar represent the reduced variable x and y, respectively. / After some substitution, we can define the CCA equation /The pair basis which is solution of CCA are obtained by singular value decomposition. //Please see our paper for the details of the derivation of this equation. // We perform the regression in the reduces pace. Correlation :

Sparse CCA Reformulation

CCA-based regression Linear regression between pair of reduced data Reconstruction from subspace to original space Concatenating these matrix produces the predictor Linear Regression Reduced motion data Reduced input data Reduced motion data Original motion data Linear Regression Here, I am ready to talk about the kernel CCA-based regression. /First, we project training input and output data into each basis that are computed by kernel CCA. / We can obtain matrix A that links the reduced input to the reduced pose by linear regression. / Then we also obtain matrix B that recovers from the reduced pose to the original pose. /So, by concatenating these matrix, we can determine the predictor.

Motion synthesis Orientations of all joints Projection to the acquired basis Prediction matrix Pre-operative gait Post-operative gait Here, I want to emphasize how to synthesize realistic motion. / We get the input data from /We then multiply it by prediction matrix. / Finally, we can get the all joints orientations of the body. The final pose is generated by joint mapping.

Result - method comparison

GCD Data normalization

Design X & Y Pre-operative Post-operative C3D C3D GCD GCD

Result - feature graph

Thank You Q & A Thank you for listening to my presentation.

Data & Feature Many data has hundreds of variables with many irrelevant and redundant ones. Feature is variables obtained by erasing redundant / noise variables from data. 보통 많은 케이스의 data는 수백 가지의 중복되거나 노이즈가 있는 variable들로 이루어져 있다. Feature는 이러한 data로부터 중복되거나 노이즈가 있는 성분을 제거한 variable들로 이루어져 있다.

Advantages of feature selection Alleviating the effect of the curse of dimensionality Improve a learning algorithm’s prediction performance Faster and more cost-effective Providing a better understanding of the data 이렇게 만들어진 feature의 장점은 curse of dimensionality를 완화시킬 수 있다는 것인데 curse of dimensionality는… 따라서 data에서 추출된 feature는 원래 data보다 낮은 차원이고 따라서 curse of dimensionality를 피하여 잘 prediction할 수 있도록 해준다. 또한 계산시간이 빨라지고 data에 대한 이해도를 높여준다. 이 부분은 나중에 나오는 slide에서 설명해주겠다.

L1 regularization Effective feature selection method L1 norm: - It is the sum of the absolute value of each component.

L1 regularization The L1 drives maximizes sparseness. A new predicting post-operative gait can be estimated as matrix-vector multiplication. - e.g.,) L1 sparsity term

Result

L1 regularization With the learned model , we can fully explain the features for each body joints. - Features can be considered as the combination of the joint information corresponding non-zero terms in the row vector of the learned model. - e.g. left knee position = 0.4 * left ankle position + 0.6 * pelvis position.

The problems Can not explain the nonlinear relationship between training input and output. Correspondence Pre-operative patient’s motion Post-operative patient’s motion

Regression process . . . . . . Pre-operative gait Regression Pose to pose . . . Post-operative gait

Result – naïve, pose to pose

Minimizing prediction error Motion to motion - Considering a relation between remote poses in temporal domain

Gait of cerebral palsy patient 이외에도 다리가 안쪽이나 바깥쪽으로 돌아가는 증상이 있을 수 있습니다. 출처: http://www.youtube.com/watch?v=q7AokhnifG0

Treatments 뇌성마비의 재활 보조기 활용 주사요법 신경외과적 수술 정형외과적 수술 약물치료

What is Cerebral palsy ? Cerebral palsy (뇌성마비) - 태아의 뇌에서 발생하는 비진행성 장애에 의한 활동 제한으로 동작과 자세에 영향을 미치는 영구적인 이상

Orthopedic surgery Distal Hamstring Lengthening (DHL) Rectus Femoris Transfer (RFT) Tendo Achilles Lengthening (TAL) Femoral Derotation Osteotomy (FDO) [Poses of cerebral palsy patients]

Related work “Predicting outcomes of rectus femoris transfer surgery” [Reinbolt et al. 2009] Select a set of preoperative gait features that distinguished between good and poor outcomes “Evaluation of conventional selection criteria for lengthening for individuals with cerebral palsy” [Truong et al. 2011]

Related work [Chai and Hodgins 2005] [Slyper and Hodgins 2008] 기술적으로 관련된 연구로서 학습을 통한 캐릭터 애니메이션을 생성하는 연구입니다. 미리 녹화하여 저장된 사용자 입력과 그에 대응하는 사람 동작을 학습하여 새로운 사용자 입력에 대한 동작을 생성해내는 연구들이 있었다. 다양한 입력들로 마커, … 우리는 애니메이션 분야에 적용되었던 이러한 학습 기법을 토대로 수술 후 환자 보행을 예측하고자 합니다. [Kim et al. 2012] [Seol et al. 2013]

Motion data Number of patients Total seven joints DHL+RFT+TAL : 35 FDO+DHL+TAL+RFT : 33 Total seven joints left foot right foot left femur right femur pelvis left knee right knee 3가지 혹은 4가지 수술을 받은 총 68명 환자의 수술 전과 수술 후 1, 2년 지난 환자의 동작 데이터를 학습에 이용하였다.

Naïve linear regression Direct regression analysis between pre and post-operative gait Minimize fitting error to obtain the predictor, . Before I jump into the details of our prediction model, I want to talk about the Naïve linear regression. /The following equation minimizes the fitting errors between training input and output. We then obtain linear predictor, matrix A. / Although this formulation is very simple and easy to implement, there exists potential artifacts of the results because of large estimation errors. /This equation do not consider the data dependency between x and y. /So, the solution would come up with reducing the dimension of the training examples before entering the regression step. Problems ? large prediction error

Result : motion to motion + naïve

Minimizing prediction error Dimension reduction Fully explains the nonlinear relationship between training input and output - Nonlinear dimension reduction method

PCA PCA : maximum variance projection method. Data dependency ? Variable X Data dependency ? Reduced Y Reduced X Projection Variable Y When we talk about dimensional reduction, there are many popular statistical methods. Maybe, PCA is the most familiar approach to choose. However, PCA might not be a good choice for regression purpose. As you can see here, we have two variables x & y that are training input and output, respectively. When we apply PCA to each variable, we can obtain the reduce the data by projection into the basis. We can represent the relationship of reduced x & y in 2D space. However, there is no guarantee pertaining to the data dependency of two variables in the reduced space. As a result, the prediction errors can not be minimized // So, We now consider canonical correlation analysis, CCA. CCA finds the two different basis of each variable. The correlation between the two data sets is maximized in the reduced space. So it can reduce the estimation errors as associated with regression. In terms of approximating the original data, PCA is a good choice. PCA works by finding some basis that maximize variance of the data set.

Kernel CCA CCA may not fully explain non-linear relationship between pre-operative and post-operative motion. Non-linear CCA using the kernel trick method Transform the data into high dimensional space Substitute non-linear mapping into CCA However, one of the limitation of linear CCA is that it may not fully explain the non-linear relationship between input and output very well. So our key idea is that we map training data into the high-dimensional space, which is the feature space. By applying a function phi to training data, we can transform training data into the feature space. And we can reformulate CCA into Kernel CCA by using the kernel trick method.

Future work Design training input & output with respect to the clinical context. Feature selection - Alleviating the effect of the curse of dimensionality - Improve a prediction performance - Faster and more cost-effective - Providing a better understanding of the data Mlml method. Feature selection 방법 정리.