Download presentation
Presentation is loading. Please wait.
1
Farrokh Alemi, Ph.D. Sunday, February 06, 2005
Modeling Preferences This is Farrokh Alemi from George Mason University. Farrokh Alemi, Ph.D. Sunday, February 06, 2005
2
Alemi at George Mason University
Objectives Model decision makers preferences Learn how to interact with a policymaker in order to model their values Create a mathematical model that assigns higher numbers to more preferred options Test the accuracy/validity of the model Focused on one decision maker’s values This presentation introduces a flexible method for modeling decision makers' values and several means of validating such models. Our examples assume that a model is developed jointly by one analyst and one decision maker. If the development process involves a group, we suggest using the integrative group process. Alemi at George Mason University
3
Alemi at George Mason University
Historical Basis Bernoulli Von Neumann Edwards Von Winterfeldt & Edwards Value models are based on Bernoulli's (1938) recognition that money's value did not always equal its amount. He postulated that increasing amount of income had decreasing value to the wage earner. Value models have received considerable attention from economists. Von Neumann used multi-attribute utility models to explain the behavior of firms. Psychologists like Edwards in 1974 used it to explain the behavior of individuals. A comprehensive and rather mathematical introduction to constructing value models is found in von Winterfeldt and Edwards (1986). Here we focus on detailed instructions for making the models and ignore their mathematical foundations. Alemi at George Mason University
4
What is a model of values?
Value models help us quantify a person's preferences Assign numbers to options so that higher numbers reflect more preferred options Assumptions Decision makers must select from several options The selection requires grading the preferences for the options. Preferences are quantified by examining the various attributes (characteristics, dimensions, or features) of the options. Overall preference is a function of decision maker’s preferences on each attribute Electronic Medical Record (EMR) system Value models help us quantify a person's preferences. By this we mean that value models assign numbers to options so that higher numbers reflect more preferred options. These models assume that the decision makers must select from several options and that the selection should depend on grading the preferences for the options. These preferences are quantified by examining the various attributes (characteristics, dimensions, or features) of the options. For example, if decision makers were choosing among different Electronic Medical Record (EMR) systems, the value of the different EMRs could be scored by examining such attributes as “compatibility with legacy systems”, “potential impact on practice patterns” and “cost.” First, the impact of each EMR on each attribute would be scored, this is often called single attribute value function. Second, scores would be weighted by the relative importance of each attribute. Third, the scores for all attributes would be aggregated, often by using a weighted sum. Fourth, the EMR with the highest weighted score would be chosen. Alemi at George Mason University
5
Mathematical Model Overall Value = V (A1) + V (A2) + ... + V (An)
An Additive Multi-Attribute Value Model Overall Value = V (A1) + V (A2) + V (An) Value on Attribute 1 If each option was described in terms of n attributes A1, A2, ... , An the option would be assigned a score on each attribute, V(A1), V(A2), ... , V(An). The overall value of an option equals: Value = Function [ V(A1), V(A2), ... , V(An) ] In words, the overall value of an option is a function of the value of the option on each attribute. Value on Attribute 2 Value on Attribute n Alemi at George Mason University
6
Alemi at George Mason University
Why Model Preferences? To clarify and communicate To aid decision making in complex situations To replace decision maker in repetitive tasks To measure hard to measure concepts One reason to model values is to clarify and communicate how decision makers feel about complex issues. Modeling values helps managers communicate their positions by explicitly showing the importance of each priority. Value models divide multi-attribute problems into components that are systematically analyzed. A specific formula then quantifies the outcome of the analysis. These models clarify the basis of decisions so others can see the logic behind the decision and‑ideally‑agree with it. For example, we constructed a value model to help the Wisconsin Division of Health determine the eligibility of nursing home residents for a higher level of reimbursement (the "super‑skilled" level of nursing care). This model showed which attributes of an applicant affected eligibility and how much weight each attribute deserved. As a consequence of this effort, the regulator, the industry, and the patients became more aware of how eligibility decisions were made (Cline et al. 1982). Another reason to model values is to aid decision making in complex situations. In complex decisions, decision makers face uncertain events as well as ill expressed values. In these circumstances, modeling the values adds to the decision maker's understanding of the underlying problem. Still another reason to model the values of the decision maker is to repeatedly use the mathematical model instead of the decision maker. Consider screening a large number of applicants. If we model the decision maker's values, we could go through thousands of applicants and select a few that the manager needs to interview. Because the model reflects the manger's values, we are reassured that we have not erroneously screened out applicants that the manger would have liked to interview. Finally the last reason for modeling values is to quantify hard to measure concepts such as the severity of trauma (Detmer et al. 1977), the degree to which an area is medically underserved (Health Services Research Group 1975), or the quality of the remaining years of life (Pliskin et al. 1980). These hard‑to‑measure concepts are similar to preferences because they are subjective and open to disagreement. Like complex preferences it is necessary to describe hard to measure concepts in terms of several attributes that may capture different dimensions of the concept. Measuring a subjective construct may seem a contradiction in terms. After all, rational people look at the same construct and evaluate them very differently. For some broad distinctions, such as whether there is more of one thing than another, the task is easier. Thus, we might say that one patient is more ill than another, or that an area is more medically underserved than another. Finally Alemi at George Mason University
7
Alemi at George Mason University
Misleading Numbers Model scores are rough approximate measures of preference Nominal scale Ordinal scale Interval scale Though value models allow us to quantify subjective concepts, the resulting numbers are rough estimates that should not be mistaken for precise measurements. It is important that managers and policymakers using these models do not read more into the numbers than they mean. Analysts must stress that the numbers in value models are intended to offer a consistent method of tracking, comparing, and communicating rough, subjective concepts. An important distinction is whether the model is to be used for rank ordering (ordinal) or rating the extent of presence of a hard‑to measure concept (interval). Some value models produce numbers that are only useful for rank ordering options. Thus, some severity indexes indicate whether one patient is sicker than another, not how much sicker. In these circumstances, a patient with a severity score of 4 may not be twice as ill as a patient with a severity of 2. Averaging such ordinal scores is meaningless. In contrast, value models that score on an interval scale show how much more preferable one option is than another. For example, a severity index can be created to show how much mare severe one patient's condition is than another’s. A patient scoring 4 can be considered twice as ill as one scoring 2. Further, averaging interval scores is meaningful. Interval scales are more difficult to construct than ordinal scales. In the remainder of this chapter, we will show how both ordinal and interval value models can be constructed. First we wish to introduce an example for reference throughout the chapter. Alemi at George Mason University
8
Alemi at George Mason University
Step 1. Would It Help? Who decides? What must be done? What judgments must be made? How can the model of the judgment be used? The first, and most obvious, question is whether constructing a value model would help resolve the problem faced by the manager. Defining the problem is the most significant step of the analysis, yet surprisingly little literature is available for guidance. To define a problem, we must answer several related questions: Who is the decision maker? What objectives does this person wish to achieve? What role do subjective judgments play in these goals? Would it help to model the judgment? Should a value model be used? Finally, once a model is developed, how might it be used? Alemi at George Mason University
9
Alemi at George Mason University
Step1. Would it help? Who decides? What must be done? What judgments must be made? How can the model of the judgment be used? Administrator In organizations, there are often many decision makers. No single person's viewpoint is sufficient, and we need a multidisciplinary consensus instead. If, as is often the case, the input comes from several constituencies with different value systems, the analyst should think through the needs of these diverse decision makers, A later section discusses how values of a group of people can be modeled. For simplicity, the following discussion assumes that only one person is involved in the decision‑making process. The core of the problem here was that AIDS patients need different amounts of resources depending on the severity of their illness. The federal administrators of the Medicaid program wanted to measure severity of AIDS patients because the federal government paid for part of their care. The state administrators were likewise interested because state funds paid for another portion of their care. Individual hospital administrators were interested in order to analyze a clinicians' practice patterns and recruit one who is more efficient. For the study, we assembled six experts known for clinical work with AIDS patients or for research on the survival of AIDS patients. Physicians came from several states, including New York and California (which at the time had the highest number of AIDS patients in the United States). California had more homosexual AIDS patients, while intravenous drug users were more prevalent in New York. We used the integrative group process to construct the severity index. Alemi at George Mason University
10
Alemi at George Mason University
Step1. Would it help? Who decides? What must be done? What judgments must be made? How can the model of the judgment be used? Administrator Judge efficiency of providers’ practice What must be done? Problem solving starts by recognizing a gap between the present situation and the desired outcome. Typically, at least one decision maker has noticed a difference between what is and what should be and begins to share this awareness with the relevant levels of the organization. Gradually a motivation is created to change, informal social ties are established to promote the change, and an individual or group receives a mandate to find a solution. Often, a perceived problem must be reformulated to address the real issues. For example, a stated problem of developing a model of AIDS severity may indicate that the real issue is insufficient funds to provide for all AIDS treatment programs. When solutions are proposed prematurely, it is important to sit back and gain greater perspective on the problem. Occasionally, the decision makers have a solution in mind before fully understanding the problem, which shows the need for examining the decision makers' circumstances in greater depth. In these situations, it is the analyst's responsibility to redefine the problem to make it relevant to the real issues. Alemi at George Mason University
11
Alemi at George Mason University
Step1. Would it help? Who decides? What must be done? What judgments must be made? How can the model of the judgment be used? Administrator Judge efficiency of providers’ practice Severity of illness After the problem has been defined, we must examine the role subjective judgments can play in its resolution. In the example, the administrators needed to budget for the coming years, and they knew judgments of severity would help them anticipate utilization rates and overall costs. Programs caring for low severity patients would receive a smaller allocation than programs caring for high‑severity patients. But there were no objective measures of severity available, so we opted for using clinician judgments instead. Alemi at George Mason University
12
Alemi at George Mason University
Step1. Would it help? What plans would change if the judgment were different? What is being done now? If no one makes a judgment about the underlying concept, would it really matter, and who would complain? Would it be useful to tell how the judgment was made, or is it better to leave matters rather ambiguous? Must we choose among options, or should we let things unfold on their own? Is a subjective component critical to the judgment, or can it be based on objective standards? Who decides? What must be done? What judgments must be made? How can the model of the judgment be used? Administrator Judge efficiency of providers’ practice Severity of illness One can examine the role of subjective judgments by asking several "what if' questions: What plans would change if the judgment were different? What is being done now? If no one makes a judgment about the underlying concept, would it really matter, and who would complain? Would it be useful to tell how the judgment was made, or is it better to leave matters rather ambiguous? Must we choose among options, or should we let things unfold on their own? Is a subjective component critical to the judgment, or can it be based on objective standards? Alemi at George Mason University
13
Alemi at George Mason University
Step 1.4 Would It Help? Who decides? What must be done? What judgments must be made? How can the model of the judgment be used? Administrator Judge efficiency of providers’ practice Severity of illness Model can be used repeatedly to analyze medical records In understanding what judgments must be made, it was crucial to attend to the limitations of circumstances in which these judgments are going to be made. The use of existing AIDS severity indexes was limited because they relied on physiological variables that were unavailable in automated data bases (Redfield and Burke 1988). Administrators asked us to predict prognoses from existing data. The only information widely available on AIDS patients was diagnoses, which are routinely collected after every encounter. Because these data did not include any known physiological predictors of survival (such as number of T4 cells), we had to find alternative ways to predict survival. Experts seem to intuitively know the prognosis of the patient. They easily recognize a patient that is very sick. Would it make sense to model how experts make these judgments? Although it was theoretically possible to have an expert panel review each case and estimate severity, from the outset it was clear that a model was needed because case‑by‑case review was extremely expensive. Moreover, the large number of cases would require the use of several expert panels, each judging a subset of cases, and the panels might disagree. Further, judgments within a panel can be quite inconsistent over time. In contrast, the model provided a quick way of rating the severity of patients. It also explained the rationale behind the ratings, which allowed skeptics to examine the fairness of judgments, thus increasing the acceptance of those judgments. Alemi at George Mason University
14
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt After defining the problem, the second step is to identify the attributes needed for making the judgment. We solicited the list of the attributes that may be needed in making a judgment of severity from invited experts. There are several steps to interviewing an expert. First, Introduce yourself and your purpose. Briefly say who you are, why you are talking to the expert, the model's purpose, and how it will be developed. Be as brief as possible. An interview is going well if the analyst is listening and the expert is talking. If it takes you five minutes just to describe your purpose, then something is amiss. Probably you haven't understood the problem well, or possibly the expert is not familiar with the problem. Alemi at George Mason University
15
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt Be assertive in setting the interview's pace and agenda. Because you are likely to receive a comment whenever you pause, be judicious about pausing. Thus, if you stop after saying, "Our purpose is to construct a severity index to work with existing databases," your expert will, likely discuss your purpose. But if you immediately follow the previous sentence with a question about the expert's experience in assessing severity, the expert is more likely to begin describing his or her background. The point is that, as the analyst, you set the agenda, and you should construct your questions to resolve your uncertainties, not to suit the expert's agenda. Alemi at George Mason University
16
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt Ask the experts to introduce themselves. It is important to establish that you are a record keeper and that the expert will provide the content. A good way of doing this is to ask the expert to introduce himself or herself by describing relevant experiences. This approach also stresses your respect for his or her expertise. Alemi at George Mason University
17
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt Start with tangible examples. Concrete examples help you understand which patient attributes should be used to predict severity and how they can be measured. Ask the expert to recall an actual situation and contrast it with other occasions to discern the key discriminators. For example, you might ask the expert to describe a severely ill patient in detail (to ensure that the expert is referring to a particular patient). Then ask for a description of a patient who was not severely ill and elicit the key differences between the two patients. These differences are attributes you can use to judge severity. Continue asking the expert to think about specific patients until you have identified a number of attributes. Here is a sample dialogue. Alemi at George Mason University
18
Analyst: Can you recall a specific patient with a very poor prognosis?
19
Expert: I work in a referral center, and we see a lot of severely ill patients. They seem to have many illness and are unable to recover completely, so they continue to worsen.
20
Analyst: Tell me about a recent patient who was severely ill.
21
Expert: A 28‑year‑old homosexual male patient deteriorated rapidly
Expert: A 28‑year‑old homosexual male patient deteriorated rapidly. He kept fighting recurrent influenza and died from gastrointestinal cancer. The real problem was that he couldn't tolerate AZT, so we couldn't help him much. Once a person has cancer, we can do little to maintain him.
22
Analyst: Tell me about a patient with a good prognosis, say close to five years.
23
Expert: Well, let me think
Expert: Well, let me think. A year ago we had a 32‑year‑old male patient diagnosed with AIDS who has not had serious disease since‑‑a few skin infections but nothing serious. His spirit is up, he continues working, and we have every reason to expect he will survive four or five years.
24
Analyst: What key difference between the two patients made you realize that the first patient had a poorer prognosis than the second?
25
Expert: That's a difficult question‑ patients are so different from each other that it's tough to point at one characteristic. But if you really push me, I would say two characteristics: the history of illness and the ability to tolerate AZT.
26
Analyst: What about the history is relevant?
27
Expert: If I must predict a prognosis, I want to know whether he has had serious illness in vital organs.
28
Analyst: Which organs?
29
Expert: Brain, heart, and lungs are more important than, say, skin.
30
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt As you may have noticed in this dialogue, we started with tangible examples and used the terminology and words introduced by the expert to become more concrete. There are several advantages. First it helps the expert recall the details; second, soliciting attributes by contrasting patients helps single out those attributes that truly affect prognosis. Thus it does not produce a wish list of information that is loosely tied to survival‑an extravagance we cannot afford in model building. Alemi at George Mason University
31
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt After you have identified some attributes, you can ask directly for additional attributes that indicate prognosis. You might ask if there are other markers of prognosis, if the expert has used the word marker. If you have to, you might say: “In our terminology, we refer to the kinds of things you have mentioned as markers of prognosis. Are there other markers?” In the dialogue, you might even express your own ideas without pushing them on the expert. In general, analysts are not there to express their own ideas. They are there to listen. But they can ask questions to clarify things or even to mention things overlooked by the expert, as long as it does not change the nature of the relationship between the analyst and the expert. Alemi at George Mason University
32
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt Arrange the attributes in a hierarchy from broad to specific attributes. Some analysts suggest using a hierarchy to solicit and structure the attributes. For example, an expert may suggest that a patient's prognosis depends on medical history and demographics. Demographics include age and sex. Medical history involves the nature of the illness, co morbidities, and tolerance of AZT. The nature of illness breaks down into body systems involved (skin, nerves, blood, etc.). Within each body system, some diagnoses are minor and other diagnoses are more threatening. The expert then lists, within each system, a range of diseases. The hierarchical structure promotes completeness and simplifies tracking many attributes. Alemi at George Mason University
33
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt Be careful about terminology. Always use the expert's terminology, even if you think a reformulation would help. Thus, if the expert refers to "sex," do not substitute "gender." Such new terminology may confuse the conversation and create an environment where the analyst acts more like an expert, which can undermine the expert's confidence that he or she is being heard. It is reasonable, however, to ask for clarification‑‑"sex" could refer to gender or to sex practices, and you must understand which meaning is intended. Alemi at George Mason University
34
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt In general, the less esoteric prompts are more likely to produce the best responses, so formulate a few prompts and use the prompts that feel most natural for your task. Avoid jargon, including the use of terminology from Decision Analysis (e.g. attribute, value function, aggregation rules, etc.). Alemi at George Mason University
35
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes and do not interrupt Take notes, and do not interrupt. Have paper and pencil available, and write down the important points. Not only does this help the expert's recall, but it also helps you review matters while the expert is still available. Experts tend to list a few attributes, then focus attention on one or two. Actively listen to these areas of focus. When the expert is finished, review your notes for items that need elaboration. If you don't understand certain points, ask for examples, which are an excellent means of clarification. For instance, after the expert has described attributes of vital organ involvement, you may ask the expert to elaborate on something mentioned earlier, such as "acceptance of AZT." If the expert mentions other topics in the process, return to them after completing the discussion of AZT acceptance. This ensures that no loose ends are left when the interview is finished and reassures the expert that you are indeed listening. Alemi at George Mason University
36
Step 2. Select Attributes
Introduce yourself and your purpose Be judicious about pausing Ask the experts to introduce themselves Start with tangible examples Ask directly for additional attributes Arrange the attributes in a hierarchy Always use the expert's terminology Use prompts that feel most natural Take notes, and do not interrupt To review, we would like to start with tangible examples and get the expert familiar with the process and then directly query the expert for the attributes. Other, more statistical approaches to soliciting attributes are available, such as multidimensional scaling and factor analysis. However, we prefer the behavioral approach to soliciting attributes because it involves the expert more in the process and leads to greater acceptance of the model. Alemi at George Mason University
37
Alemi at George Mason University
Step 3. Do It Again Use two different perspectives Survival versus mortality prompts Check assumptions: Attributes are exhaustive Attributes are not redundant Attributes are important to decision maker Attributes are not preferentially dependent After soliciting a set of attributes, it is important to examine and, if necessary, revise them. Psychological research suggests that changing the framing of a question alters the response. Consider these two questions: "What are the markers for survival?""What are the markers for poor prognosis?" One question emphasizes survival, the other mortality. We would expect that patient attributes indicating survival would also indicate mortality, but researchers have found this to be untrue (for a review, see Nisbett and Ross i 980). Experts may identify entirely different attributes for survival and mortality. This research suggests that value‑laden prompts tap different parts of the memory and can evoke recall of different pieces of information. Evidence about the impact of questions on recall and judgment is substantial (Hogarth 1975; Ericsson and Simon 1980). For example, in one study subjects used surprisingly different sets of attributes to judge whether a person was an introvert or an extrovert (Snyder and Cantor 1979). Studies like this suggest that analysts should ask their questions in two ways, once in positive terms and again in negative terms. Several tests should be conducted to ensure that the solicitation process succeeded. The first test ensures that the listed attributes are exhaustive by using them to describe several hypothetical patients and asking the expert to rate their prognosis. If the expert needs additional information for a judgment, solicit new attributes until you have sufficient information to judge severity. A second test checks that the attributes are not redundant by examining whether knowledge of one attribute implies knowledge of another. For example, the expert may consider "inability to administer AZT" and "cancer of GI tract" redundant if no patient with GI cancer can accept AZT. In such cases, either the two attributes should be collapsed into one, or one must be dropped from the analysis. A third test ensures that each attribute is important to the decision maker's judgment. You can test this by asking the decision makers to judge two hypothetical situations: one with the attribute at its lowest level and another with the attribute at peak level. If the judgments are similar, the attribute may be ignored. For example, gender may be unimportant if male and female AIDS patients with the same history of illness have identical prognoses. Fourth, a series of tests examines whether the attributes are related or dependent (Keeney and Raiffa 1976; Keeney 1977). These words are much abused and variously defined. By independence we mean that in judging two different patients, the shared feature among these patients does not affect how other features are judged. We distinguish this from other senses of independence by calling it preferential independence. There are many situations in which preferential independence does not hold. In predicting three‑year risks of hospitalization, age and lifestyle may be dependent (Alemi et al. 1987). Among young adults, drinking may be a more important concern than cholesterol risks, while among older adults, cholesterol is the more important risk factor. Thus, the relative importance of cholesterol and drinking risks depends on the age of the patients being compared. In many circumstances, preferential independence holds. It often holds even when experts complain that the attributes are dependent in other senses. When preferential independence holds, it is reasonable to break a complex judgment into components. Or, to say it differently, with preferential independence, it is possible to find a formula that translates scores on several attributes into an overall severity score. When preferential independence does not hold, it is often a sign that some underlying issue is poorly understood. In these circumstances, the analyst should query the expert further and revise the attributes to eliminate dependencies (Keeney 1980). Alemi at George Mason University
38
Attributes Judged Important in Measuring Severity of AIDS
Age Race Transmission mode Defining diagnosis Time since defining diagnosis Diseases of nervous system Disseminated diseases Gastrointestinal diseases Skin diseases Lung diseases Heart diseases Recurrence of a disease Functioning of the organs Co-morbidity Psychiatric co morbidity Nutritional status Drug markers Functional impairment In the AIDS severity study, discussions with the expert and later revisions led to the following set of 18 patient attributes for judging severity of AIDS Alemi at George Mason University
39
Step 4. Set Attribute Levels
Decide on the best and worst Select a target population and ask the expert to describe the possible range of the attribute. Avoid adjectives, describe the levels Decide on levels in between Fill out the gaps to describe all possible situations Now it is time to identify the possible levels of each attribute. We start by deciding if the attributes are discrete or continuous. Attributes such as age are continuous; attributes such as diseases of the nervous system are discrete. However, continuous attributes may be expressed in terms of a few discrete levels, so that age can be described in decades, not individual years. The four steps in identifying the levels of an attribute are to define the range, define the best and worst levels, define some intermediate levels, and fill in the other possible levels so that the listing of the levels is exhaustive (capable of covering all possible options or situations). To define the range, we must select a target population and ask the expert to describe the possible range of the variable in it. Thus, for the AIDS severity index, we ask our expert to focus on adult AIDS patients and, for each attribute, suggest the possible ranges. To assess the range of nervous system diseases, we asked: “In adult AIDS patients, what is a disease that suggests the most extensive involvement of the nervous system?” Next we asked the expert to specify the best and the worst possible level of each attribute. In the AIDS index, we could easily identify the level with the best possible prognosis: the normal finding within each attribute‑in common language, the healthy condition. We accomplished the more difficult task of identifying the level with the worst possible prognosis by asking the expert: “What would be the gravest disease of the central nervous system, in terms of prognosis?” A typical error in obtaining the best and the worst levels is failing to describe these levels in detail. For example, in assessing the value of nutritional status, it is not helpful to define the levels as: “Best nutritional status and Worst nutritional status”. Nor does it help to define the worst level as "severely nutritionally deficient" because the adjective "severe" is not defined. It is best to avoid using adjectives in describing levels, as experts perceive words like severely, or best in different ways. The levels must be defined in terms of the underlying physical process measured in each attribute, and the descriptions must be connected to the nature of the attribute. Thus, a good level for the worst nutritional status might be "patients on total parenteral treatment," and the best status might be "nutritional treatment not needed." Next, ask the expert to define intermediate levels. These levels are often defined by asking for a level between the best and worst levels. Alemi at George Mason University
40
Analyst: I understand that patients on total parenteral treatment have the worst prognosis. Can you think of other relatively common conditions with a slightly better prognosis?
41
Expert: Well, a host of things can happen. Pick up any book
Expert: Well, a host of things can happen. Pick up any book . on nutritional diseases and you find all kinds of things.
42
Analyst: Right, but can you give me three or four examples?
43
Expert: Sure. The patient may be on antiemetics or nutritional supplements.
44
Analyst: Do these levels‑include a level with a moderately poor prognosis and one with a relatively good prognosis?
45
Expert: Not really. If you want a level indicative of moderately poor prognosis, then you should include whether the patient is receiving Lomotil or Imodium.
46
Step 4. Set Attribute Levels
Decide on the best and worst Select a target population and ask the expert to describe the possible range of the attribute. Avoid adjectives, describe the levels Decide on levels in between Fill out the gaps to describe all possible situations It is not always possible to solicit all possible levels of an attribute from the expert interviews. In these circumstances, we can fill in the gaps afterward by reading the literature or interviewing other experts. The levels specified by the first expert are used as markers for placing the remaining levels, so that the levels range from best to worst. Alemi at George Mason University
47
Levels for Skin Infection
No skin disorder Kaposi's sarcoma Shingles Herpes complex Candida or mucus Thrush In the example, a clinician on the project team reviewed the expert's suggestions and filled in a long list of intermediate levels. Here you see a listing of the levels for the attribute skin infection. Alemi at George Mason University
48
Step 5. Assign Values to Single Attributes
Double anchored assessment method 100 to best 0 to worst Rate the remaining levels Check against different anchors Now we will evaluate a patient on a single attribute as a preparation for aggregating the value of all attributes to get the overall score. We prefer using the double-anchored estimation method for evaluating a single attribute (Kneppreth et al. 1974). This approach gets its name from the practice of selecting the best and worst levels first and rating the remaining levels according to these two "anchors." In this method, first the attribute levels are ranked, or, if the attribute is continuous, the most and least preferred levels are specified. Then the best and the worst levels are used as anchors for assessing the other levels. Alemi at George Mason University
49
Analyst: Which among the skin disorders has the worst prognosis?
AnalystYes, I understand that, but which is, the most serious? ExpertPatients with thrush perhaps have a worse prognosis than patients with other skin infections.Analyst Let's rate the severity of thrush at 100 and place the severity of no skin disorder at 0. How would you rate shingles?ExpertShingles is almost as serious as thrush.AnalystThis tells me that you might rate the severity of shingles nearer 100 than 0. Where exactly would you rate it?ExpertMaybe 90.AnalystCan you now rate the remaining levels?A Sample Dialog for Soliciting Double Anchored Attribute Values
50
Expert: None is really that serious.
ExpertPatients with thrush perhaps have a worse prognosis than patients with other skin infections.Analyst Let's rate the severity of thrush at 100 and place the severity of no skin disorder at 0. How would you rate shingles?ExpertShingles is almost as serious as thrush.AnalystThis tells me that you might rate the severity of shingles nearer 100 than 0. Where exactly would you rate it?ExpertMaybe 90.AnalystCan you now rate the remaining levels?A Sample Dialog for Soliciting Double Anchored Attribute Values
51
Analyst: Yes, I understand that, but which is, the most serious?
Analyst Let's rate the severity of thrush at 100 and place the severity of no skin disorder at 0. How would you rate shingles?ExpertShingles is almost as serious as thrush.AnalystThis tells me that you might rate the severity of shingles nearer 100 than 0. Where exactly would you rate it?ExpertMaybe 90.AnalystCan you now rate the remaining levels?A Sample Dialog for Soliciting Double Anchored Attribute Values
52
Expert: Patients with thrush perhaps have a worse prognosis than patients with other skin infections. ExpertShingles is almost as serious as thrush.AnalystThis tells me that you might rate the severity of shingles nearer 100 than 0. Where exactly would you rate it?ExpertMaybe 90.AnalystCan you now rate the remaining levels?A Sample Dialog for Soliciting Double Anchored Attribute Values
53
Analyst: Let's rate the severity of thrush at 100 and place the severity of no skin disorder at 0. How would you rate shingles? AnalystThis tells me that you might rate the severity of shingles nearer 100 than 0. Where exactly would you rate it?ExpertMaybe 90.AnalystCan you now rate the remaining levels?A Sample Dialog for Soliciting Double Anchored Attribute Values
54
Expert: Shingles is almost as serious as thrush.
ExpertMaybe 90.AnalystCan you now rate the remaining levels?A Sample Dialog for Soliciting Double Anchored Attribute Values
55
Analyst: This tells me that you might rate the severity of shingles nearer 100 than 0. Where exactly would you rate it? AnalystCan you now rate the remaining levels?A Sample Dialog for Soliciting Double Anchored Attribute Values
56
Expert: Maybe 90.
57
Analyst: Can you now rate the remaining levels for skin disorders?
58
Step 5. Assign Values to Single Attributes
Double anchored assessment method 100 to best 0 to worst Rate the remaining levels Check against different anchors Recently, several psychologists have questioned whether experts are systematically biased in assessing value. Yates and Jagacinski (1979) showed that using different anchors produced different value functions. For example, in assessing the value of money, Kahneman and Tversky (1979) showed that values associated with gains or losses are different from values related to the amount of monetary return. They argued that the value of money is judged according to the decision maker's current assets. Because value may depend on the anchors used, it is important to use different anchors from just the best or worst levels. Thus, if the value of skin infections is assessed by anchoring on "shingles" and "no. skin infections," then it is important to verify the ratings relative to other levels. We might ask: ”You have rated herpes complex halfway between shingles and candida. Is this OK?” It is occasionally useful to change not only the anchors but also the assessment method. Elsewhere we describe several alternative methods of assessing single‑attribute value functions. When a value is measured by two different methods, you can expect inadvertent discrepancies, which you must ask the expert to resolve. Alemi at George Mason University
59
Assigned Values to “Skin Infection” Attribute
No skin disorder Kaposi's sarcoma 10 Shingles 90 Herpes complex 95 Candida or mucus 100 Thrush Here are examples of values assigned by the experts to various levels of skin infections. Alemi at George Mason University
60
Step 6. Choose an Aggregation Rule
Additive multi-attribute utility model Used often Relatively robust Compensatory Multiplicative multi-attribute utility model Used less often Now we must find a way to aggregate the scores across all attributes. Note that our conventions have produced a situation in which the value of each attribute is somewhere between 0 and 100. Thus, the prognosis of patients with skin infection and the prognosis of patients with various GI disease have the same range. Adding these scores will be misleading because skin infections are less serious than GI problems, so we must find an aggregation rule that differentially weights the various attributes. The most obvious rule is the additive model. Assume that S represents the severity of AIDS. If a patient is described by a series of n attributes of {A1, A2, , Ai, , An}, then using the additive rule, the overall severity equals the weighted sum of values scores on each attribute. In order to have a consistent convention and scoring system, we make sure that all attributes range from 0 to 100 and sum of the weights of attributes is one. Several other models are possible in addition to the additive model. The multiplicative model is used less often and is described later. Alemi at George Mason University
61
Alemi at George Mason University
Step 7. Estimate Weights Assessing the ratio of the importance of two attributes Attributes are rank ordered Least important is assigned 10 points Decision maker is asked to rate next more important attributes by saying how many times more important it is We can estimate the weights for an additive value model in a number of ways. In this section, we present the method of rating the ratio of importance of the attributes. Appendix B presents other methods. As we mention below, we often find it useful to mix several approaches. Some analysts estimate weights by assessing the ratio of the importance of two attributes (Edwards 1977). The attributes are rank ordered, and the least important is assigned 10 points. Then the expert is asked to estimate the relative importance of the other attributes. There is no upper limit to the number of points other attributes can be assigned. For example, in estimating the weights for the three attributes (skin infections, lung infections, and GI diseases), the analyst and the expert might have the following discussion: Alemi at George Mason University
62
Analyst: Which of the three attributes is most important?
ExpertWell, they are all important, but patients with either lung infections or GI diseases have worse prognoses than patients with skin infections.AnalystDo lung infections have a worse prognosis than GI diseases?ExpertThat's more difficult to answer. No. I would say that for all practical purposes, they have the same prognosis. Well, now that I think about it, perhaps patients with GI diseases have a slightly worse prognosis.
63
Expert: Well, they are all important, but patients with either lung infections or GI diseases have worse prognoses than patients with skin infections. ExpertThat's more difficult to answer. No. I would say that for all practical purposes, they have the same prognosis. Well, now that I think about it, perhaps patients with GI diseases have a slightly worse prognosis.
64
Analyst: Do lung infections have a worse prognosis than GI diseases?
65
Expert: That's more difficult to answer. No
Expert: That's more difficult to answer. No. I would say that for all practical purposes, they have the same prognosis. Well, now that I think about it, perhaps patients with GI diseases have a slightly worse prognosis.
66
Alemi at George Mason University
Step 7. Estimate Weights Assessing the ratio of the importance of two attributes Attributes are rank ordered Least important is assigned 10 points Decision maker is asked to rate next more important attributes by saying how many times more important it is Having obtained the rank ordering of the attributes, we must estimate the importance weights. Alemi at George Mason University
67
Analyst: Let's say that we arbitrarily rate the importance of skin infection in determining prognosis at 10 points. GI diseases are how many times more important than skin infections? ExpertQuite a bit. Maybe three times.AnalystThat is, if we assign 10 points to skin infections, we should assign 30.points to the importance of GI diseases?ExpertYes, that sounds right. AnalystHow about lung infections? How many more times important are they than GI diseases? ExpertI would say about the same.AnalystChecking for consistency in the subjective judgments.) Would you consider lung infections three times more serious than skin infections?ExpertYes, I think that should be about right.
68
Analyst: Let's say that we arbitrarily rate the importance of skin infection in determining prognosis at 10 points. GI diseases are how many times more important than skin infections? AnalystThat is, if we assign 10 points to skin infections, we should assign 30.points to the importance of GI diseases?ExpertYes, that sounds right. AnalystHow about lung infections? How many more times important are they than GI diseases? ExpertI would say about the same.AnalystChecking for consistency in the subjective judgments.) Would you consider lung infections three times more serious than skin infections?ExpertYes, I think that should be about right.
69
Expert: Quite a bit. Maybe three times.
AnalystThat is, if we assign 10 points to skin infections, we should assign 30.points to the importance of GI diseases?ExpertYes, that sounds right. AnalystHow about lung infections? How many more times important are they than GI diseases? ExpertI would say about the same.AnalystChecking for consistency in the subjective judgments.) Would you consider lung infections three times more serious than skin infections?ExpertYes, I think that should be about right.
70
Expert: Quite a bit. Maybe three times.
ExpertYes, that sounds right. AnalystHow about lung infections? How many more times important are they than GI diseases? ExpertI would say about the same.AnalystChecking for consistency in the subjective judgments.) Would you consider lung infections three times more serious than skin infections?ExpertYes, I think that should be about right.
71
Analyst: That is, if we assign 10 points to skin infections, we should assign 30 points to the importance of GI diseases? AnalystHow about lung infections? How many more times important are they than GI diseases? ExpertI would say about the same.AnalystChecking for consistency in the subjective judgments.) Would you consider lung infections three times more serious than skin infections?ExpertYes, I think that should be about right.
72
Expert: Yes, that sounds right.
ExpertI would say about the same.AnalystChecking for consistency in the subjective judgments.) Would you consider lung infections three times more serious than skin infections?ExpertYes, I think that should be about right.
73
Analyst: How about lung infections
Analyst: How about lung infections? How many more times important are they than GI diseases? AnalystChecking for consistency in the subjective judgments.) Would you consider lung infections three times more serious than skin infections?ExpertYes, I think that should be about right.
74
Expert: I would say about the same.
ExpertYes, I think that should be about right.
75
Analyst: Would you consider lung infections three times more serious than skin infections?
76
Expert: Yes, I think that should be about right.
77
Alemi at George Mason University
Step 7. Estimate Weights Assessing the ratio of the importance of two attributes Attributes are rank ordered Least important is assigned 10 points Decision maker is asked to rate next more important attributes by saying how many times more important it is In the preceding dialogue, the analyst first found the order of the attributes, then asked for the ratio of the weights of the attributes. Knowing the ratio of attributes allows us to estimate the attribute weights. Weights are estimated by setting algebraic equations for each ratio expressed as well as the convention that the sum of the weights should equal one. Solving for the equations allows the estimation of the weights. One characteristic of this estimation method is that its emphasis on the ratio of the importance of the attributes leads to relatively extreme weighting compared to other approaches. Thus, some attributes may be judged critical, and others rather trivial. Other approaches, especially the direct magnitude process, may judge all attributes as almost equally important. In choosing a method to estimate weights, you should consider several trade‑offs. You can introduce errors by asking experts awkward and partially understood questions, but you can also cause error with an easier, but formally less justified, method. Our preference is to estimate weights in several ways and use the resulting differences to help experts think more carefully about their real beliefs. In doing so, we usually start with a rank order technique, then move on to assess ratios, obtain a direct magnitude estimate, identify discrepancies, and finally ask the expert to resolve them. One note of caution: Some scientists have questioned whether experts can describe how they weight attributes. Nisbett and Wilson (1977) argued that directly assessed weight may not reflect an expert's true beliefs, but other investigators have found to the contrary (John and Edwards 1978). The debate continues among scientists. The only way to decide if the directly assessed weights reflect the expert's opinions is to look at how the resulting models perform. In a number of applications, value models based on directly assessed weights correlated quite well with the subject's judgments (Fischer 1979). The typical correlation is actually in the upper 80s, which is high in comparison to most social science correlations. This success confirms our belief in the accuracy (perhaps we should say adequacy) of the subjective assessment techniques for constructing value models. Alemi at George Mason University
78
Alemi at George Mason University
Based on experts interviews we constructed a severity of AIDS model and solicited the necessary attributes, attribute levels and scoring. Alemi at George Mason University
79
Step 8. Evaluate the Model
Context dependence Face validity Based on available data Simple to use Inter-rater reliability Construct validity Simulates decision maker’s judgments Predicts behavior While researchers know the importance of carefully evaluating value models, analysts often lack the time and resources to do this. Because of the importance of having confidence in the models and being able to defend our analytical methodology, we will present several ways of testing the adequacy of value models (Gustafson et al. 1980). Most value models are devised to apply to a particular context, and they are not portable to other settings or uses. We view such context dependence, in general, as a liability, but this is not always the case. For example, the AIDS severity index maybe intended for evaluating clinicians' practice patterns, its use for evaluating treatment programs and prognosis of groups maybe inappropriate and possibly misleading. Alemi at George Mason University
80
Step 8. Evaluate the Model
Context dependence Face validity Based on available data Simple to use Inter-rater reliability Construct validity Simulates decision maker’s judgments Predicts behavior The value model should also seem reasonable to experts, something we term face validity. Thus, in our example, the severity index should seem reasonable to clinicians and managers. Otherwise, even if it accurate, one may experience problems with its acceptance. Clinicians who are unfamiliar with statistics will likely rely on their experience to judge the index, meaning that the variables, weights, and value scores must seem reasonable and practical to them. Face validity is tested by showing the model to a new set of experts and asking if they understand it and whether it is conceptually reasonable. Alemi at George Mason University
81
Step 8. Evaluate the Model
Context dependence Face validity Based on available data Simple to use Inter-rater reliability Construct validity Simulates decision maker’s judgments Predicts behavior The value model should require only available data for input. Relying on obscure data may increase the model's accuracy at the expense of practicality. Thus, the severity index should rely on reasonable sources of data, usually from existing data bases. A physiologically based database, for instance, would predict prognosis of AIDS patients quite accurately. However, such an index would be useless if physiological information is generally unavailable and routine collection of this information would take considerable time and money. While the issue of data availability may seem obvious, it is a very common error in the development of value models. Experts used to working in organizations with superlative data systems may want data that are unavailable at average institutions, and they may produce a value model with limited usefulness. If we do not care about comparing scores across organizations, we can tailor indexes to each institution's capabilities and allow each institution to decide whether the cost of collecting new data are justified by the expected increase in accuracy. However, if scores will be used to compare institutions or allocate resources among institutions, then we need a single value model that must be based on data available to all organizations. Alemi at George Mason University
82
Step 8. Evaluate the Model
Context dependence Face validity Based on available data Simple to use Inter-rater reliability Construct validity Simulates decision maker’s judgments Predicts behavior The index should be simple to use. The index of medical under service (Health Services Research Group 19'75) is a good example of the importance of simplicity. This index, developed to help the federal government set priorities for funding HMOs, community health centers, and health facility development programs, originally had nine variables, but the director of the sponsoring federal agency rejected it because of the number of variables. Because he wanted to be able to "calculate the score on the back of an envelope," the index was reduced to four variables. The simplified version performed as well as one with nine variables; it was used for eight years to help set nationwide funding priorities. This example shows that simplicity does not always equal incompetence. Simplicity nearly always makes an index easy to understand and use. Alemi at George Mason University
83
Step 8. Evaluate the Model
Context dependence Face validity Based on available data Simple to use Inter-rater reliability Construct validity Simulates decision maker’s judgments Predicts behavior When different people apply the value model to the same situation, they must arrive at the same scores, which is referred to as inter‑rater reliability. In our example, different registered record abstractors who use the index to rate the severity of a patient should produce the same score. If a model relies on hard‑to‑observe patient attributes, the abstractors will disagree about the condition of patients. If reasonable people using a value model reach different conclusions, then we lose confidence in the model's usefulness as a systematic method of evaluation. Inter-rater reliability is tested by having different abstractors rate the severity of randomly selected patients. Alemi at George Mason University
84
Step 8. Evaluate the Model
Context dependence Face validity Based on available data Simple to use Inter-rater reliability Construct validity Simulates decision maker’s judgments Predicts behavior A model is considered valid if several different ways of measuring it lead to the same finding. This method of establishing validity is referred to as construct validity. For example, the AIDS severity model should be correlated with other measures of AIDS severity. If the analyst has access to other severity indexes (such as physiologically based indexes), the predictions of the different approaches can be compared on a sample of patients. One such study was done for the index described in this section. Alemi, Walker, Carey, and Leggett report that the proposed index was more accurate than physiological markers typically used for measuring severity of AIDS. We can also explore construct validity by comparing the model with a surrogate measure of severity. Because severely ill patients stay longer in the hospital, we could correlate length of stay to severity scores. The point is that convergence among several measures of severity increases our confidence in the model. Alemi at George Mason University
85
Step 8. Evaluate the Model
Context dependence Face validity Based on available data Simple to use Inter-rater reliability Construct validity Simulates decision maker’s judgments Predicts behavior One way to establish the validity of a model is to show that it simulates the judgment of the experts; then if we believe in the experts acumen then we should also consider the model as valid (Fryback 1976). In this approach the expert is asked to score several (perhaps 100) hypothetical case profiles described only by variables included in the model. If the model accurately predicts the expert's judgments; our confidence in the model increases, but this measure has the drawback of producing optimistic results. After all, if the expert who developed the model can't get the model to predict his or her judgments, who can? It is far better to ask a separate panel of experts to rate the patient profiles. In the AIDS severity project, we collected the expert's estimate of survival time for 97 hypothetical patients and examined whether the value model could predict these ratings. The correlation between the additive model and the rating of survival was ‑0.53. (The negative correlation means that high severity scores indicate shorter survival, and the magnitude of the correlation ranges between 0 and ‑1.0.) The ‑0.53 correlation suggests low to moderate agreement between the model and the expert's intuitions; correlations closer to 1.0 or ‑1.0 imply greater agreement. We can judge the adequacy of the correlations by comparing them with agreement among the experts. The correlation between several pairs of experts rating the same 97 hypothetical patients was also similar. The value model agreed with our expert as much as the experts agreed with each other. Thus, the value model constructed with multiplication instead of addition had a higher correlation with experts’ judgments of severity. We discuss this alternative model in a later section. Alemi at George Mason University
86
Step 8. Evaluate the Model
Context dependence Face validity Based on available data Simple to use Inter-rater reliability Construct validity Simulates decision maker’s judgments Predicts behavior In some situations, one can validate a value model by comparing the model's predictions against observable behavior. This method of establishing validity is referred to as predictive validity. If a model is used to measure a subjective concept, its accuracy can be evaluated by comparing predictions to an observed and objective standard, which is often called the gold standard, to emphasize its status as being beyond debate. In practice, gold standards are rarely available for judging the accuracy of subjective concepts (otherwise, we would not need the models in the first place). For example, the accuracy of a severity index can be examined by comparing it to observed outcomes of patients' care. If the severity index accurately predicts outcomes, we have evidence favoring the model. For the index developed in this section, index predictions were compared to survival rates. Medical history of 91 patients were analyzed using the index and the ability of the severity score in predicting patients' prognosis were examined. Alemi at George Mason University
87
Testing Accuracy of the Severity Index for AIDS
Simulating decision maker’s judgments Generated scenarios of patients Asked experts to estimate prognosis of the patients Calculated model scores for each patient Compared model scores to average of experts’ estimates Test of predictive validity Followed patients over time Compared model scores to observed survival of patients Alemi at George Mason University
88
Alemi at George Mason University
Results In 81 cases, The index developed from judgment of experts was more predictive of months of survival than Composite Laboratory Index. In 26 hospice cases, severity index developed from judgment of experts was able to correctly classify patients who had less than 6 months to live. Alemi at George Mason University
89
Take Home Lesson Spend most of your time on getting the right attributes. A rough scoring is usually sufficient.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.