Download presentation
Presentation is loading. Please wait.
1
1 Quantifying Opinion about a Logistic Regression using Interactive Graphics Paul Garthwaite The Open University Joint work with Shafeeqah Al-Awadhi
2
2 Introduction/Plan This work arose from a practical problem in logistic regression. The theory extends easily to elicit opinion about the link function of any glm. I will outline the method for glm’s in general. The motivating problem has some additional (commonly occurring) structure that the elicitation method exploits. Interactive computing is used to elicit opinion. Prior models can be formed that aim to allow a small amount of data to correct some potential systematic biases in assessments. Results for the practical problem will be given.
3
3 Motivating Example The task is to model the habitat distribution of fauna in south-east Queensland - bats, birds, mammals etc. Available information: Environmental attributes on a GIS database. Sample information of presence/absence at 300- 400 sites. Background knowledge of ecologists. The ecologists have seen the bat (say) in various locations but this information is difficult to use in a traditional statistical analysis because it has not been obtained from any sampling scheme. Prob(presence) = f (environmental attributes)
4
4 Continuous variables: elevation; quarterly rainfall and temperatures; canopy cover; slope; aspect. Factors: land type; vegetation; forest structure; logging; grazing; etc. A workshop with 15 ecologists indicated unimodal or monotic relationships independence between attributes in their effect on the probability of presence.
5
5 Generalised Linear Model (glm) The model has the form where g[.] is the link function. For logistic regression, and is the probability of presence. is the vector of predictor variables. From the ith predictor variable,, a vector of explanatory variables is constructed such that we have the linear equation
6
6 Define: and then is a linear function of
7
7 Factors:One factor level (the best one, say) is chosen as the reference level. Each other level is given a dummy 0/1 variable that equals 1 for that level and 0 for all other levels:
8
8 The sampling model is Let For the prior distribution we put The values of the parameters in red must be chosen by the expert to represent his or her opinions.
9
9 Assessing medians and quartiles. These are fundamental assessment tasks the expert performs. How far is it from Aberdeen to Southampton? 25% 25% 25% 25% 470m 525m 600miles The median (blue) is assessed first and then the lower and upper quartiles (red). Ecologists were given practice at performing these tasks in preparatory training and explanation.
10
10 Eliciting and and. Also, at the reference point. The expert assesses, the median of at this point. (For logistic regression is the probability of presence.) We put. The expert also assesses the lower and upper quartiles and. We put
11
11 Eliciting and is determined from the unconditional assessments. is determined from assessments conditional on. equalling.
12
12 Eliciting and for factors. Put. Then enabling to be estimated. [Go to program]
13
13 Assessments to obtain Conditional on the first three line segments being correct, the dashed lines are quartiles of where the line might continue.
14
14 Conditional Assessments for Factors The circles indicate conditions. Dotted horizontal bars are previous assessments. Solid bars are current assessments and must be within the dotted bars if is positive-definite. [Go to program]
15
15 Calculating Iterative calculations determine. Start by estimating the lower-right scalar element of, and call it. Then estimate the lower-right of and call it, etc. If and is positive-definite, then so is provided.
16
16 Alternative Prior Models Individuals can show systematic bias in their subjective assessments. The aim is to form prior models that allow a small amount of data to largely correct some potential biases. Prior 2 The marginal distribution of is diffuse, rather than. The conditional distribution of is assumed to be unchanged: This allows for error in specifying the origin of the Y-axis.
17
17 Prior 3 Prior 3 replaces the scale for Y with some other linear scale. is again given a diffuse distribution and the conditional distribution of is taken to be is also given a diffuse distribution. Prior 4 This is the same as Prior 3, except it allows for systematic bias in quartile assessments by putting are given diffuse distributions.
18
18 Cross-validation and scoring The usefulness of a prior distribution can be objectively examined by using cross-validation and a scoring rule. For the cross-validation the data for a species were divided into four sets. Each set in turn was omitted and the remaining sets used to form prediction equations. Prediction equations were applied to the omitted set and squared error loss determined: where the summation is over all sites in the omitted (validation) set, is the probability of presence given by the prediction equation, and is a 0/1 dummy variable indicating absence/presence. This defines a proper scoring rule.
19
19 Results for little bent-wing bat _______________________________________
20
20
21
21 ____________________________________________
22
22 Concluding Comments The elicitaion method described here is able to handle large problems by: (a) using interactive graphics (b) suggesting values to the expert that might represent his or her opinions. It is believed that the use of graphs can improve the quality of the assessed distributions. Cross-validation can demonstrate clearly the gain from using prior knowledge, when there is such gain. Additional parameters in the prior model can allow limited data to be used more effectively.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.