Rank-Based Approach to Optimal Score via Dimension Reduction Shao-Hsuan Wang National Taiwan University, Taiwan Nov 2015 1.

Rank-Based Approach to Optimal Score via Dimension Reduction Shao-Hsuan Wang National Taiwan University, Taiwan Nov 2015 1

Rank-based measures Kendall’s  Concordance Index Rank correlation Widely used in medical statistics, epidemiology, economics, and sociology, etc. 2

Rank-based measures Regression Model Y : a univariate response Z Z  (Z(Z,, Z ) : multiple covariates 1 p 3

Rank-based measures YRYR Response Y TZTZ TZRTZR Composite score 4

Rank-based measures (Y(Y T,  Z ) ( Y,  Z ) For pair of observations concordant : T 1 and 2 T, Y  Y and  T Z  Z Y  Y and  T Z  Z 1212 discordant : T Y  Y and  121121 T Z  Z Y 212212 T  Y and  T Z  Z 121121 2122121212 5

Rank-based measures Kendall’s   P(Y P(Y T  Y,  Z  T Z )  P ( Y  Y T,  Z  T Z)Z) 1212 Rank correlation 121121 T 212212 rc  P ( Y  Y,  Z  Z ) 1 212 Concordance Index TT CI  P (  Z  Z | Y  Y ) 1 212 6

Rank-based measures YRYR Response Y TZTZ TZRTZR Composite score 7

Rank-based measures There could not exist a monotonic association !! 8

Motivation

Composite score TZTZ g (Z) measurable functions 10

C-max YRYR Response Concordance-index function : C ( g )  P ( g ( Z g(Z)Rg(Z)R Composite score )  g ( Z )| Y  Y ) C 12121212 (g)(g) C-max : max  sup gFgF c Optimal score : m ( Z ) such that m  sup C ( g ) g  F 11 c

Intrinsic model behind Rank-based measures M1 Distributional assumption : Generalized Regression Model (Han 1987) M2 Structural assumption : Dimension Reduction (Li 1991, Cook 1991) 12

Intrinsic model behind Rank-based measures M1 a non-degenerate monotonic function on R YG(mYG(m d (Z),)(Z),) 0 13

Intrinsic model behind Rank-based measures M1 a non-degenerate monotonic function on R YG(mYG(m d (Z),)(Z),) 0 an unspecifed bivariate function strictly increasing at each component for the other one being fixed 14

Intrinsic model behind Rank-based measures M2 Y  D G ( m d (Z),)(Z),) 0 a multivariate polynomial of the unknown degree d 0 15

Intrinsic model behind Rank-based measures M2 Dimension Reduction m(Z)m(Z) T m(Bm(B Z)Z) d dk 0 0 00 (1) d 0 be the smallest degree such that YZYZ | m d (Z)(Z) 0 B (2) 0 {{ 01,,  0k00k0 } is a basis of the central subspace (CS) 16

Model Flexibility Linear regression model Y T 0 Z T Binary Choice model Accelerated Failure time model Y I ( log( Y ) 0 Z T 0 0) Z Generalized linear regression model (GLM) Non-monotonic regression model Y(Y( T 0 2 Z)Z) 17

Types of covariates all discrete but continuous covariates Covariates which moments could not exist 18

Theories Propositions: (1) Existence m ( Z  arg max C ( g ) d0d0 g (2) Uniqueness f ( Z )  arg max C ( g ) f(Z)f(Z) cm ( Z )  c (3) Optimality d0d0 g for a ploynomial f d0d0 d01d01 ( z ) of the degree d02d02 d 0 g( Z )  arg max C ( g ) g(Z)T(m(Z))g(Z)T(m(Z)) d g for some monotonic function T 0 19

Summary TZTZ could not be the best composite score Model flexibility Various types of covariates Optimal score : existence, uniqueness, and optimality 20

How to estimate d k 0 0 : structural degree : structural dimension S(BS(B ) : the central subspace 0 m ( BZ ) : the optimal score d k 0 C 0 max : the C-max

Estimation Procedure

Derive m ( Z ) by maximizing the concordance index function via Step1 d the generalized single-index form of the polynomial Tips:(1) dpdp m(Z)cZm(Z)cZ rjrj  T Z d    r  1p1p r0r1rprj1r0r1rprj1 n I(I( T Z  T Z,Y  Y ) (2) C (m ( Z )) C()C()  ijij i1j1i1j1 nd 0 nn n  i1j1i1j1 I (Y  Y ) ijij

Estimation Procedure Step 2 Apply the outer grandient approach to obtain B Tips :(1) T k m (u) mm (B(Bu)u) d dk 0 0 00 (2) col( S ( B ))  col(  m ( u )(  m T (u)) dW (u)) 0  p uRuR d0d0d0d0

Estimation Procedure Step 3 Derive the estimator of Tips :(1) m dk T (B(B k Z)Z) ZBTZBTZ k n I(I( T Z  T Z,Y  Y ) (2)  ˆ  arg max  (3) T  i1j1i1j1 T i n  i1j1i1j1 jijjij I (Y  Y ) ijij m (BZ) ˆ Z dk k

Estimation Procedure Step 4 Adopt the concordance-based generalized BIC to estimate T d, k, S ( B ), m ( BZ ), and C 000d0k0000d0k0 Tips : (1) IC ( d, k ) 0max T  nC (m(B log n kdkd Z ))  (C  1) ndkk with IC (0, k )  1/2 (2) (d,k)  arg max IC ( d, k ) 0  d,1   p  1 2 k

Asymptotic results Consistent model selection --- parsimonious model among the class of Correct models (d(d,k,k ) 0 0 n -consistency of estimators of T S ( B ) and m ( B Z)Z) 0 Asymptotic normality of estimators of C d0k00d0k00 max 27

Wine Data Vinho verde wine : red wine and white wine (from the Minho Region of Northern Portugal) Collected from May/2004 -February/2007 Red wine : sample size (n)=1599 White wine : n=4898 Physicochemical and sensory tests

Wine data Response (Y): Preferences 0 (bad) -10 (excellent) 11 Covariates (Z) : fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, PH, sulphates, and alcohol

Wine data 30

Wine data 31

Thank You !

Rank-Based Approach to Optimal Score via Dimension Reduction Shao-Hsuan Wang National Taiwan University, Taiwan Nov 2015 1.

Similar presentations

Presentation on theme: "Rank-Based Approach to Optimal Score via Dimension Reduction Shao-Hsuan Wang National Taiwan University, Taiwan Nov 2015 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rank-Based Approach to Optimal Score via Dimension Reduction Shao-Hsuan Wang National Taiwan University, Taiwan Nov 2015 1.

Similar presentations

Presentation on theme: "Rank-Based Approach to Optimal Score via Dimension Reduction Shao-Hsuan Wang National Taiwan University, Taiwan Nov 2015 1."— Presentation transcript:

Similar presentations

About project

Feedback