Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006

Outline Motivation Approximate sorting Lower bound Upper bound Aggregation Algorithm Experimental results Conclusion

Motivation Find preference structure of consumer w.r.t. a set of products Common: assign value function to products Value function determines a ranking of products Elicitation: pairwise comparisons Problem: deriving metric value function from non-metric information  We restrict ourselves to finding ranking

Efficiency measure: number of comparisons  Comparison based sorting algorithm  Lower Bound: comparisons As set of products can be large this is too much Motivation  Find for every respondent a ranking individually

Motivation Possible solutions: Approximation Aggregation Modeling and distribution assumptions

Approximation (joint work with J. Giesen and M. Stojaković) 1. Lower bound (proof) 2. Algorithm

Approximation Consumer’s true ranking of n products corresponds to: Identity increasing permutation id on {1,.., n} Wanted: Approximation of ranking corresponds to: s.t. small

Metric on S n Needed: metric on Meaningful in the market research context  Spearman’s footrule metric D: Note:

We show: To approximate ranking within expected distance at least comparisons necessary comparisons always sufficient

Lower bound : randomized approximate sorting algorithm If for every input permutation the expected distance of the output to id is at most r, then A performs at least comparisons in the worst case. A,

Lower bound: Proof Assume less than comparisons for every input.  Expected distance of larger than r.  There is a, s.t. expected distance of larger than r. Contradiction. Fix  deterministic algorithm. Then for at least permutations: output at distance more than 2r. Follows Yao’s Minimax Principle

Lower bound: Lemma For r>0 : ball centered at with radius r id r Lemma:

uniquely determined by the sequence Lower bound: Proof of Lemma # sequences of n non-negative integers whose sum is at most r:  If then For sequence of non-negative integers : at most 2 n permutations satisfy

Lower bound: deterministic case For fixed, the number of input permutations which have output at distance more than 2r to id is more than Now to show:

k comparisons  2 k classes of same outcome Lower bound: deterministic case

For in the same class: Lower bound: deterministic case For in the same class:

Lower bound: deterministic case

At most 2 k input permutations have same output Lower bound: deterministic case

At most input permutations with output in Lower bound: deterministic case

At least input permutations with output outside Lower bound: deterministic case

Upper Bound Algorithm (suggested by Chazelle) approximates any ranking within distance with less than comparisons.

Algorithm Partitioning of elements into equal sized bins Elements within bin smaller than any element in subsequent bin. No ordering of elements within a bin Output: permutation consistent with sequence of bins

Algorithm Round 0 1 2

Analysis of algorithm m rounds  2 m bins Output : ranking consistent with ordering of bins Running Time Median search and partitioning of n elements: less than 6n comparisons (algorithm by Blum et al) m rounds  less than 6nm comparisons Distance Set

Algorithm: Theorem Any ranking consistent with bins computed in rounds, i.e. with less than comparisons has distance at most

Approximation: Summary For sufficiently large error: less comparisons than for exact sorting: error, const: comparisons error : comparisons For real applications: still too much Individual elicitation of value function not possible  Second approach: Aggregation

Aggregation (joint work with J. Giesen and D. Mitsche) Motivation: We think that population splits into preference/ customer types Respondents answer according to their type (but deviation possible) Instead of Individual preference analysis or aggregation over the population  aggregate within customer types

Aggregation Idea: Ask only a constant number of questions (pairwise comparisons) Ask many respondents Cluster the respondents according to answers into types Aggregate information within a cluster to get type rankings Philosophy: First segment then aggregate

Algorithm The algorithm works in 3 phases: (1) Estimate the number k of customer types (2) Segment the respondents into the k customer types (3) Compute a ranking for each customer type

Algorithm if respondent i prefers y over x if respondent i prefers x over y if respondent i has not compared x and y Every respondent performs pairwise comparisons. Basic data structure: matrix A = [a ij ] Entry a ij in {-1,1,0}, refers to respondent i and the j-th product pair (x,y)

Algorithm Define B = AA T Then B ij = number of product pairs on which respondent i and j agree minus number of pairs on which they disagree (not counting 0’s).

Algorithm: phase 1 Phase 1: Estimation of number k of customer types Use matrix B Analyze spectrum of B We expect: k largest eigenvalues of B to be substantially larger than the other eigenvalues  Search for gap in the eigenvalues

Algorithm: phase 2 Phase 2: Cluster respondents into customer types Use again matrix B Compute projector P onto the space spanned by the eigenvectors to the k largest eigenvalues of B Every respondent corresponds to a column of P Cluster columns of P

Algorithm: phase 2 Intuition for using projector – example on graphs:

Algorithm: phase 2 01100000 10010000 10010000 01100010 0000010 1 00001001 00010001 00001110 Ad =

Algorithm: phase 2 2.0 2.1 2.4-0.6 0.9-0.3 2.1 2.2 2.5-0.4 1.0-0.1 2.1 2.2 2.5-0.4 1.0-0.1 2.4 2.5 3.0 0 0 1.5 0.5 -0.6-0.4 0 2.7 1.4 3.1 -0.6-0.4 0 2.7 1.4 3.1 0.9 1.0 1.5 1.4 1.8 -0.3-0.1 0.5 3.1 1.8 3.7 P =

Algorithm: phase 2 11110000 11110000 11110000 11110010 0000110 1 00001111 00011111 00001111 P’ =

Algorithm: phase 2 Embedding of the columns of P

Algorithm: phase 3 Phase 3: Compute the ranking for each type For each type t compute characteristic vector c t : For each type t compute A T c t if entry for product pair (x,y) is positive: x preferred over y by t negative: y preferred over x by t zero : type t is indifferent if respondent i belongs to that type otherwise

Experimental study On real world data 21 data sets from Sawtooth Software, Inc. (Conjoint data sets) Questions: Do real populations decompose into different customer types Comparison of our algorithm to Sawtooth’s algorithm

Conjoint structures Attributes: Sets A 1,.. A n, |A i |=m i An element of A i is called level of the i-th attribute A product is an element of A 1 x …x A n In practical conjoint studies: Example: Car  Number of seats = {5, 7}  Cargo area = {small, medium, large}  Horsepower = {240hp, 185hp}  Price = {$29000, $33000, $37000}  …

Quality measures Difficulty: we do not know the real type rankings  We cannot directly measure quality of result  Other quality measures: Number of inverted pairs : average number of inversions in the partial rankings of respondents in type i with respect to the j-th type ranking Deviation probability Hit Rate (Leave one out experiments)

# respondents = 270 Size of study: 8 x 3 x 4 = 96 # questions = 20 Study 1 Largest eigenvalues of matrix B

# respondents = 270 Size of study: 8 x 3 x 4 = 96 # questions = 20 Study 1  two types  Size of clusters: 179 – 91 Ranking for type 1 Ranking for type 21-p Type 10.193.330.95% Type 22.280.753.75% Number of inversions and deviation probability

Hitrates:  Sawtooth: ?  Our algorithm: 69% # respondents = 270 Size of study: 8 x 3 x 4 = 96 # questions = 20 Study 1

# respondents = 539 Size of study: 4 x 3 x 3 x 5 = 180 # questions = 30 Study 2 Largest eigenvalues of matrix B

# respondents = 539 Size of study: 4 x 3 x 3 x 5 = 180 # questions = 30 Study 2  four types  Size of clusters: 81 – 119 – 130 – 209 Ranking for type 1 Ranking for type 2 Ranking for type 3 Ranking for type 4 1-p Type 10.446.775.116.531.5% Type 25.580.926.927.983.1% Type 33.566.10.845.672.8% Type 43.565.084.251.163.9% Number of inversions and deviation probability

# respondents = 539 Size of study: 4 x 3 x 3 x 5 = 180 # questions = 30 Study 2 Hitrates:  Sawtooth: 87%  Our algorithm: 65%

# respondents = 1184 Size of study: 9 x 6 x 5 = 270 # questions = 48 Study 3 Size of clusters: 6 – 3 – 1164 – 8 – 3 Size of clusters: 3 – 1175 – 6 1-p = 12% Largest eigenvalues of matrix B

# respondents = 1184 Size of study: 9 x 6 x 5 = 270 # questions = 48 Study 3 Hitrates:  Sawtooth: 78%  Our algorithm: 62%

# respondents = 300 Size of study: 6 x 4 x 6 x 3 x 2= 3456 # questions = 40 Study 4 Largest eigenvalues of matrix B

# respondents = 300 Size of study: 6 x 4 x 6 x 3 x 2= 3456 # questions = 40 Study 4 Hitrates:  Sawtooth: 85%  Our algorithm: 51%

Aggregation - Conclusion Segmentation seems to work well in practice. Hitrates not good Reason: information too sparse  Additional assumptions necessary Exploit conjoint structure Make distribution assumptions

Thank you!

Yao’s Minimax Principle : finite set of input instances : finite set of deterministic algorithms C(i,a): cost of algorithm a on input i, where i and a For all distributions p over and q over

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Similar presentations

Presentation on theme: "Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.

Similar presentations

Presentation on theme: "Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006."— Presentation transcript:

Similar presentations

About project

Feedback