Download presentation
Presentation is loading. Please wait.
1
Validity & Reliability Measurement Scales
Week 7 Validity & Reliability Measurement Scales
2
Conceptualization & Measurement Examples
الدور الوسيط " للملكية النفسية تجاه الوظيفة " على العلاقة بين ممارسات التسويق الداخلي وسلوك المواطنة التنظيمية في المؤسسات الأكاديمية الفلسطينية
3
Conceptualization مفهوم الملكية النفسية
الملكية النفسية تجاه الوظيفة (متغير له بعد وحيد) الشعور بالملكية يعتبر جزءا من الظروف الإنسانية وأن هذه المشاعر الخاصة بالملكية تظهر لدى الفرد من خلال المراحل المبكرة في حياته (Furby, 1976; Rochberg, 1984) يفترضPierce et al. (1991) أن الملكية تعتبر ظاهرة متعددة الأبعاد ، وأن مشاعر الملكية قد تكون إما موضوعية أو قد تكون نفسية تعتبر الملكية النفسية ظاهرة تعبر عن المشاعر النفسية التي من خلالها يطور الشخص من مشاعر التملك للأشياء سواء كانت مادية أو غير مادية، ومن ثم يشعر بأنها ملك له (Dittmar,1992; Pierce et al., 2001; Kaur et al., 2013). وأخيراً يعتبر الشعور بالملكية النفسية جزءاً لا يتجزأ من الارتباط العاطفي بالمنظمة، من النواحي الإدراكية والسلوكية والوجدانية، وأيضا هذا الشعور بالملكية النفسية قد يكون اتجاه المنظمة أو الوظيفة أو العمل ذاته أو أدوات العمل (Dirks et al., 1996).
4
Peng & Pierce, 2015; Jian Li et al.,)
Operationalizationالتعريف الاجرائي للملكية النفسية المقاييس المستخدمة Measurement Scales الشعور بالملكية النفسية تجاه الوظيفة "الشعور الذي يتكون لدى العاملين اتجاه الوظيفة التي يودونها والذي من خلاله يشعر الموظف بملكيته جزء من وظيفته التي يقوم بأدائها ومن الممكن أن تصبح جزءاً من هويته النفسية ووعيه الذاتي". (Pierce et al., 2001; Pierce et al., 2003) الدراسة المقياس Peng & Pierce, 2015; Jian Li et al.,) 2015; Alok, 2014; Fu Pan et al., 2014; Tsung Hou et al., 2009; Mayhew et al., 2007) Van Dyne & Pierce, 2004 Mustafa et al., 2015)) Pierce et al., 1992 Olckers &Van Zyle, 2015;) Ghafoor et al., 2011) Avey et al., 2009
5
Van Dyne & Pierce, 2004مقياس الملكية النفسية
# البند 1. إنه عملي (أشعر بأنني صاحب العمل ومسئول عنه). 2. أشعر بأن هذه الوظيفة هي وظيفتنا(أنا وزملائي الذين يشاركونني نفس العمل). 3. أشعر بوجود درجة عالية من الملكية الشخصية للعمل الذي أؤديه. 4. أشعر بأن هذه هي وظيفتي. 5. هذه هي وظيفتنا (أنا وزملائي الذين يشاركونني نفس العمل). 6. معظم العاملين في المؤسسة لديهم شعور بأنهم أصحاب العمل ومالكيه. 7. من الصعب بالنسبة لي التفكير في هذه الوظيفة على أنها وظيفتي.
6
Types of Scales – Review مراجعة
7
Measure Development Only after rigorous literature review & there is no quantitative scale suits your needs, then you can develop your own measurement scale. Some considerations include: Ensure you develop your operational definition first for each variable & construct. Use simple language & words for each questions & when all the questions group together should referring to one variable / construct. Ensure there is no double / multi-barrels question i.e. a question ask more than 1 thing that respondents are confused not sure which thing the researcher is asking & when they responded, the researcher not sure which thing the respondents are answering (because too many things are asked in 1 question).
8
Measure Development Ensure you use formative or reflective questions as appropriate to represent a variable or construct – Formative questions are several questions in which each has its own unique attribute / characteristic & all questions group together to form / represent the variable. Reflective questions refer to several questions whereby each question is reflecting a variable from different angle for several times. Reason being formative / reflective questions can affect what data analysis modeling you need to use e.g. Partial Least Squares-Structural Equation Modeling (PLS-SEM) vs Covariance-based SEM etc.
9
Measure Development Since it is a new measure developed, you need to do a pilot test to evaluate its Reliability etc. Perform Exploratory Factor Analysis (EFA) on the variable / construct so that all factors generated are mapping to your operational definition e.g. if your operational definition for a construct consists of 3 attributes, there should be 3 factors surfaced after the EFA. Each question within a group of questions should focus on a single variable. Each question shouldn't link up 2 variables together i.e. questions should be "decoupled" / grouped easily & only represent a variable - that's the purpose we do EFA.
10
Evaluating Measurement Tools
Validity Criteria Practicality Reliability
11
Evaluating Measurement Tools
What are the characteristics of a good measurement tool? A tool should be an accurate indicator of what one needs to measure. It should be easy and efficient to use. There are three major criteria for evaluating a measurement tool. Validity is the extent to which a test measures what we actually wish to measure. Reliability refers to the accuracy and precision of a measurement procedure. Practicality is concerned with a wide range of factors of economy, convenience, and interpretability.
12
Validity Determinantsمحددات الصدق
Content Criterion Construct الصدق يعبر عن قدرة المقياس أو أداة القياس عن قياس ما يراد قياسه
13
Validity Determinantsمحددات الصدق
14
Validity Determinantsمحددات الصدق
There are three major forms of validity: Content validity refers to the extent to which measurement scales provide adequate coverage of the investigative questions. يقصد بصدق المحتوى درجة تمثيل بنود الأداة للمتغير If the instrument contains a representative sample of the universe of subject matter of interest, then content validity is good. To evaluate content validity, one must first agree on what elements constitute adequate coverage. To determine content validity, one may use one’s own judgment and the judgment of a panel of experts.
15
Increasing Content Validity
Literature Search Etc. Expert Interviews Question Database Group Interviews
16
Validity Determinantsمحددات الصدق
Criterion-related validityالصدق المرتبط بالمحك reflects the success of measures used for prediction or estimation. There are two types of criterion-related validity: concurrentتلازمي and predictiveتنبؤي . These differ only on the time perspective. An attitude scale that correctly forecasts the outcome of a purchase decision has predictive validity. An observational method that correctly categorizes families by current income class has concurrent validity. Criterion validity is discussed further on the following slide.
17
Validity Determinantsمحددات الصدق
Construct validity is a measurement scale that demonstrates both convergent validity and discriminant validity. يقصد بصدق المفهوم مدى نجاح الاختبار في قياس مفهوم فرضي معين . In attempting to evaluate construct validity, one considers both the theory and measurement instrument being used. For instance, suppose we wanted to measure the effect of trust in relationship marketing. We would begin by correlating results obtained from our measure with those obtained from an established measure of trust. To the extent that the results were correlated, we would have indications of convergent validity. We could then correlate our results with the results of known measures of similar, but different measures such as empathy and reciprocity. To the extent that the results are not correlated, we can say we have shown discriminant validity.
18
Reliability Estimatesمحددات الثبات
Stability Internal Consistency Equivalence
19
Reliability Estimatesمحددات الثبات
A measure is reliable to the degree that it supplies consistent results. Reliability is a necessary contributor to validity but is not a sufficient condition for validity. It is concerned with estimates of the degree to which a measurement is free of random or unstable error. Reliable instruments are robust and work well at different times under different conditions. This distinction of time and condition is the basis for three perspectives on reliability – stability, equivalence, and internal consistency الثبات هو الدرجة التي تعبر عن خلو القياسات من الخطأ، وبالتالي في وضع يمكنها من تحقيق نتائج متسقة خلال المحاولات المتكررة على مر الزمن
20
Reliability Estimatesمحددات الثبات
A measure is said to possess stabilityاستقرار if one can secure consistent results with repeated measurements of the same person with the same instrument. Test-retest (comparisons of two tests to learn how reliable they are) can be used to assess stability. A correlation between the two tests indicates the degree of stability.
21
Reliability Estimatesمحددات الثبات
Stability Internal Consistency Equivalence Internal consistency is a characteristic of an instrument in which the items are homogeneous. The split-half technique and Cronbach’s alpha can be used.
22
Reliability Estimatesمحددات الثبات
Stability Internal Consistency Equivalence
23
Reliability Estimatesمحددات الثبات
Equivalence is concerned with variations at one point in time among observers and samples of items. A good way to test for the equivalence of measurements by different observers is to compare their scoring of the same event. One tests for item sample equivalence by using alternate or parallel forms of the same test administered to the same persons simultaneously. The results of the two tests are then correlated. When a time interval exists between the two tests, the approach is called delayed equivalent forms.
24
Reliability Estimatesمحددات الثبات
25
Understanding Validity and Reliability
Exhibit 11-6 Exhibit 11-6 illustrates reliability and validity by using an archer’s bow and target as an analogy. High reliability means that repeated arrows shot from the same bow would hit the target in essentially the same place. If we had a bow with high validity as well, then every arrow would hit the bull’s eye. If reliability is low, arrows would be more scattered. High validity means that the bow would shoot true every time. It would not pull right or send an arrow careening into the woods. Arrows shot from a high-validity bow will be clustered around a central point even when they are dispersed by reduced reliability.
26
Practicality Economy Convenience Interpretability
The scientific requirements of a project call for the measurement process to be reliable and valid, while the operational requirements call for it to be practical. Practicality has been defined as economy, convenience, and interpretability. There is generally a trade-off between the ideal research project and the budget. A measuring device passes the convenience test if it is easy to administer. The interpretability aspect of practicality is relevant when persons other than the test designers must interpret the results. In such cases, the designer of the data collection instrument provides several key pieces of information to make interpretation possible. A statement of the functions the instrument was designed to measure and the procedures by which it was developed; Detailed instructions for administration; Scoring keys and instructions; Norms for appropriate reference groups; Evidence of reliability; Evidence regarding the intercorrelations of subscores; Evidence regarding the relationship of the test to other measures; and Guides for test use.
27
Practicality The scientific requirements of a project call for the measurement process to be reliable and valid, while the operational requirements call for it to be practical. Practicality has been defined as economy, convenience, and interpretability. There is generally a trade-off between the ideal research project and the budget. A measuring device passes the convenience test if it is easy to administer. The interpretability aspect of practicality is relevant when persons other than the test designers must interpret the results. In such cases, the designer of the data collection instrument provides several key pieces of information to make interpretation possible.
28
Sensitivity الحساسية Sensitivity – Sensitivity is the ability of a measurement instrument to accurately measure variability in stimuli or responses (e.g. on a scale, the choices very strongly agree, strongly agree, agree, don’t agree offer more choices than a scale with just two choices - agree and don’t agree – and is thus more sensitive)
29
Sources of Error Respondent Situation Measurer Instrument
The ideal study should be designed and controlled for precise and unambiguous measurement of the variables. Since complete control is unattainable, error does occur. Much error is systematic (results from bias), while the remainder is random (occurs erratically). Four major error sources may contaminate results and these are listed in the slide. Opinion differences that affect measurement come from relatively stable characteristics of the respondent such as employee status, ethnic group membership, social class, and gender. Respondents may also suffer from temporary factors like fatigue and boredom. Any condition that places a strain on the interview or measurement session can have serious effects on the interviewer-respondent rapport. The interviewer can distort responses by rewording, paraphrasing, or reordering questions. Stereotypes in appearance and action also introduce bias. Careless mechanical processing will distort findings and can also introduce problems in the data analysis stage through incorrect coding, careless tabulation, and faulty statistical calculation. A defective instrument can cause distortion in two ways. First, it can be too confusing and ambiguous. Second, it may not explore all the potentially important issues.
30
Sources of Error The ideal study should be designed and controlled for precise and unambiguous measurement of the variables. Since complete control is unattainable, error does occur. Much error is systematic (results from bias), while the remainder is random (occurs erratically تحدث بشكل متقلب). Opinion differences that affect measurement come from relatively stable characteristics of the respondent such as employee status, ethnic group membership, social class, and gender. Respondents may also suffer from temporary factors like fatigue and boredom. Any condition that places a strain on the interview or measurement session can have serious effects on the interviewer-respondent rapport. The interviewer can distort responses by rewording, paraphrasing, or reordering questions. Stereotypes in appearance and action also introduce bias. Careless mechanical processing will distort findings and can also introduce problems in the data analysis stage through incorrect coding, careless tabulation, and faulty statistical calculation. A defective instrument can cause distortion in two ways. First, it can be too confusing and ambiguous. Second, it may not explore all the potentially important issues.
31
Sources of Error The interviewer can distort responses by rewording, paraphrasing, or reordering questions. Stereotypes in appearance and action also introduce bias. Careless mechanical processing will distort findings and can also introduce problems in the data analysis stage through incorrect coding, careless tabulation, and faulty statistical calculation. A defective instrument can cause distortion in two ways: First, it can be too confusing and ambiguous. Second, it may not explore all the potentially important issues.
32
Nature of Attitudes Cognitive I think oatmeal is healthier
than corn flakes for breakfast. Affective I hate corn flakes. Behavioral I intend to eat more oatmeal for breakfast.
33
Attitude Measuring Attitude is a frequent undertaking in business research Attitude may be defined as an enduring disposition to consistently respond in a given manner to various aspects An attitude is a learned, stable predisposition to respond to oneself, other persons, objects, or issues in a consistently favorable or unfavorable way. Attitudes can be expressed or based cognitively, affectively, and behaviorally. 29 August 2005
34
Components of Attitude
Affective Component – Reflective of a person’s general feelings or emotions towards an object or subject (like, dislike, love, hate) Cognitive Component – Reflective of a person’s awareness of and knowledge about an object or subject (know, believe) Behavioral Component – Reflective of a person’s intentions and behavioral expectations, and predisposition to action 29 August 2005
35
Measuring Attitude It can be difficult to measure attitude, therefore, indicators such as verbal expression, physiological measurement techniques and overt behavior are used for this purpose. The three different components of attitude may require different measuring techniques Common techniques used in business research to determine attitude include rating, ranking, sorting and the choice technique 29 August 2005
36
Improving Predictability of Measurement
Specific Multiple measures Strong Factors Reference groups Direct Several factors have an effect on the applicability of attitudinal research for business. Specific attitudes are better predictors of behavior than general ones. Strong attitudes are better predictors of behavior than weak attitudes composed of little intensity or topic interest. Direct experiences with the attitude object produce behavior more reliably. Cognitive-based attitudes influence behaviors better than affective-based attitudes. Affective-based attitudes are often better predictors of consumption behaviors. Using multiple measurements of attitude or several behavioral assessments across time and environments improve prediction. The influence of reference groups and the individual’s inclination to conform to these influences improves the attitude-behavior linkage. Basis
37
Applicability of Attitudinal Research
Several factors have an effect on the applicability of attitudinal research for business. Specific attitudes are better predictors of behavior than general ones. Strong attitudes are better predictors of behavior than weak attitudes composed of little intensity or topic interest. Direct experiences with the attitude object produce behavior more reliably. Cognitive-based attitudes influence behaviors better than affective-based attitudes. Affective-based attitudes are often better predictors of consumption behaviors. Using multiple measurements of attitude or several behavioral assessments across time and environments improve prediction. The influence of reference groups and the individual’s inclination to conform to these influences improves the attitude-behavior linkage. 29 August 2005
38
Selecting a Measurement Scale
Research objectives Response types Data properties Number of dimensions Balanced or unbalanced Forced or unforced choices Attitude scaling is the process of assessing an attitudinal disposition using a number that represents a person’s score on an attitudinal continuum ranging from an extremely favorable disposition to an extremely unfavorable one. Scaling is the procedure for the assignment of numbers to a property of objects in order to impart some of the characteristics of numbers to the properties in question. Selecting and constructing a measurement scale requires the consideration of several factors that influence the reliability, validity, and practicality of the scale. These factors are listed in the slide. Researchers face two types of scaling objectives: 1) to measure characteristics of the participants who participate in the study, and 2) to use participants as judges of the objects or indicants presented to them. Measurement scales fall into one of four general response types: rating, ranking, categorization, and sorting. These are discussed further on the following slide. Decisions about the choice of measurement scales are often made with regard to the data properties generated by each scale: nominal, ordinal, interval, and ratio. Measurement scales are either unidimensional or multidimensional, balanced or unbalanced, forced or unforced. These characteristics are discussed further as is the issue of number of scale points and rater errors. Number of scale points Rater errors
39
Selecting a Measurement Scale
Attitude scaling is the process of assessing an attitudinal disposition using a number that represents a person’s score on an attitudinal continuum ranging from an extremely favorable disposition to an extremely unfavorable one. Scaling is the procedure for the assignment of numbers to a property of objects in order to impart some of the characteristics of numbers to the properties in question. Selecting and constructing a measurement scale requires the consideration of several factors that influence the reliability, validity, and practicality of the scale. 29 August 2005
40
Selecting a Measurement Scale
Researchers face two types of scaling objectives: to measure characteristics of the participants who participate in the study, and to use participants as judges of the objects or indicants presented to them. Measurement scales fall into one of four general response types: rating, ranking, categorization, and sorting. These are discussed further on the following slide. Decisions about the choice of measurement scales are often made with regard to the data properties generated by each scale: nominal, ordinal, interval, and ratio. 29 August 2005
41
Response Types Rating scale Ranking scale Categorization Sorting
A rating scale is used when participants score an object or indicant without making a direct comparison to another object or attitude. For example, they may be asked to evaluate the styling of a new car on a 7-point rating scale. Ranking scale constrain the study participant to making comparisons and determining order among two or more properties or objects. Participants may be asked to choose which one of a pair of cars has more attractive styling. A choice scale requires that participants choose one alternative over another. They could also be asked to rank-order the importance of comfort, ergonomics, performance, and price for the target vehicle. Categorization asks participants to put themselves or property indicants in groups or categories. Sorting requires that participants sort card into piles using criteria established by the researcher. The cards might contain photos or images or verbal statements of product features such as various descriptors of the car’s performance. Sorting
42
Response Types A rating scale is used when participants score an object or indicant without making a direct comparison to another object or attitude. For example, they may be asked to evaluate the styling of a new car on a 7-point rating scale. Ranking scale constrain the study participant to making comparisons and determining order among two or more properties or objects. Participants may be asked to choose which one of a pair of cars has more attractive styling. A choice scale requires that participants choose one alternative over another. They could also be asked to rank-order the importance of comfort, ergonomics, performance, and price for the target vehicle. 29 August 2005
43
Response Types Categorization asks participants to put themselves or property indicants in groups or categories. Sorting requires that participants sort card into piles using criteria established by the researcher. The cards might contain photos or images or verbal statements of product features such as various descriptors of the car’s performance. 29 August 2005
44
Number of Dimensions Unidimensional Multi-dimensional
45
Number of Dimensions With a unidimensional scale, one seeks to measure only one attribute of the participant or object. One measure of an actor’s star power is his or her ability to “carry” a movie. It is a single dimension. A multidimensional scale recognizes that an object might be better described with several dimensions. The actor’s star power variable might be better expressed by three distinct dimensions - ticket sales for the last three movies, speed of attracting financial resources, and column-inch/amount of TV coverage of the last three movies. 29 August 2005
46
Balanced or Unbalanced
A balanced rating scale has an equal number of categories above and below the midpoint. Scales can be balanced with or without a midpoint option. How good an actress is Angelina Jolie? Very bad Bad Neither good nor bad Good Very good Poor Fair Good Very good Excellent An unbalanced rating scale has an unequal number of favorable and unfavorable response choices.
47
Forced or Unforced Choices
An unforced-choice rating scale provides participants with an opportunity to express no opinion when they are unable to make a choice among the alternatives offered. Very bad Bad Neither good nor bad Good Very good Very bad Bad Neither good nor bad Good Very good No opinion Don’t know A forced-choice scale requires that participants select one of the offered alternatives
48
Number of Scale Points What is the ideal number of points for a rating scale? A scale should be appropriate for its purpose. For a scale to be useful, it should match the stimulus presented and extract information proportionate to the complexity of the attitude object, concept, or construct. E.g., A product that requires little effort or thought to purchase can be measured with a simple scale (perhaps a 3 point scale). When the product is complex, a scale with 5 to 11 points should be considered. As the number of scale points increases, the reliability of the measure increases. In some studies, scales with 11 points may produce more valid results than 3, 5, or 7 point scales. Some constructs require greater measurement sensitivity and the opportunity to extract more variance, which additional scale points provide. A larger number of scale points are needed to produce accuracy when using single-dimension versus multiple dimension scales.
49
Rater Errors Error of central tendency Error of leniency
Some raters are reluctant to give extreme judgments and this fact accounts for the error of central tendency. Adjust strength of descriptive adjectives Space intermediate descriptive phrases farther apart Provide smaller differences in meaning between terms near the ends of the scale Use more scale points Error of central tendency Error of leniency Suggestions for addressing these tendencies are provided in the slide. Participants may also be “easy raters” or “hard raters” making what is called error of leniency.
50
Rater Errors Primacy Effect Recency Effect
A primacy effect is one that occurs when respondents tend to choose the answer that they saw first. Primacy Effect Recency Effect Reverse order of alternatives periodically or randomly These problems can be avoided by randomizing the order in which responses are presented. When respondents choose the answer seen most recently, the recency effect has occurred.
51
Rater Errors Rate one trait at a time Halo Effect
The halo effect is the systematic bias that the rater introduces by carrying over a generalized impression of the subject from one rating to another. Halo Effect Rate one trait at a time Reveal one trait per page Reverse anchors periodically Ways of counteracting the halo effect are listed in the slide. For instance, a teacher may expect that a student who did well on the first exam to do well on the second.
52
Rating Techniques to Measure Attitude
Rating Scales are frequently employed in business research for measuring attitude, and many scales have been developed for this purpose, including: Simple Attitude Scales Category Scales Likert Scale Semantic Differential Numerical Scales Constant-Sum Scale Stapel Scale Graphic Scales 29 August 2005
53
Simple Attitude Scales
In attitude scaling, individuals are typically asked whether they agree or disagree with a question (or questions) put to them, or they are asked to respond to a question or questions Simple attitude scales have the properties of a nominal scale and the disadvantages that go with it, also, they do not permit fine distinctions in the respondents’ answers because their choice of answers is limited, but they can be useful in instances where the respondents’ education level is low and questionnaires lengthy 29 August 2005
54
Category Scales A category scale consists of several response categories to provide the respondent with alternative ratings Category scales are more sensitive than rating scales which allow only two answer categories (because of the larger number of choices), and thus provides more data and information 29 August 2005
55
Simple Category Scale I plan to purchase a MindWriter laptop in the
12 months. Yes No This scale is also called a dichotomous scale. It offers two mutually exclusive response choices. could be other response choices too such as agree and disagree.
56
Multiple-Choice, Single-Response Scale
When there are multiple options for the rater but only one answer is sought, the multiple-choice, single-response scale is appropriate. The other response may be omitted when exhaustiveness of categories is not critical or there is no possibility for an other response. This scale produces nominal data. What newspaper do you read most often for financial news? East City Gazette West City Tribune Regional newspaper National newspaper Other (specify:_____________)
57
Multiple-Choice, Multiple-Response Scale
What sources did you use when designing your new home? Please check all that apply. Online planning services Magazines Independent contractor/builder Designer Architect Other (specify:_____________) This scale is a variation of the last and is called a checklist. It allows the rater to select one or several alternatives. The cumulative feature of this scale can be beneficial when a complete picture of the participant’s choice is desired, but it may also present a problem for reporting when research sponsors expect the responses to sum to 100 percent. This scale generates nominal data.
58
The Likert Scale A likert Scale is a measure of attitudes designed to allow respondents to indicate how strongly they agree or disagree with carefully constructed statements that range from very positive to very negative towards an object or subject The number of alternatives on the Likert scale can vary, often five alternatives are foreseen (see text book examples) A Likert Scale may include a number of question items, each covering some aspect of the respondent’s attitude, and these items collectively form an index 29 August 2005
59
Likert Scale The Internet is superior to traditional libraries for
comprehensive searches. Strongly disagree Disagree Neither agree nor disagree Agree Strongly agree The Likert scale was developed by Rensis Likert and is the most frequently used variation of the summated rating scale. Summated rating scales consist of statements that express either a favorable or unfavorable attitude toward the object of interest. The participant is asked to agree or disagree with each statement. Each response is given a numerical score to reflect its degree of attitudinal favorableness and the scores may be summed to measure the participant’s overall attitude. Likert-like scales may use 7 or 9 scale points. They are quick and easy to construct. The scale produces interval data. Originally, creating a Likert scale involved a procedure known as item analysis. Item analysis assesses each item based on how well it discriminates between those people whose total score is high and those whose total score is low. It involves calculating the mean scores for each scale item among the low scorers and the high scorers. The mean scores for the high-score and low-score groups are then tested for statistical significance by computing t values. After finding the t values for each statement, the statements are rank-ordered, and those statements with the highest t values are selected. Researchers have found that a larger number of items for each attitude object improves the reliability of the scale.
60
The Likert Scale The Likert scale was developed by Rensis Likert and is the most frequently used variation of the summated rating scale. Summated rating scales consist of statements that express either a favorable or unfavorable attitude toward the object of interest. The participant is asked to agree or disagree with each statement. Each response is given a numerical score to reflect its degree of attitudinal favorableness and the scores may be summed to measure the participant’s overall attitude. Likert-like scales may use 7 or 9 scale points. They are quick and easy to construct. The scale produces interval data. Researchers have found that a larger number of items for each attitude object improves the reliability of the scale. 29 August 2005
61
The Semantic Differential
The semantic differential is an attitude measuring technique that consists of a series of seven bi-polar rating scales which allow response to a concept (e.g. organization, product, service, job) An advantage of the semantic differential is its versatility, on the other hand, it uses extremes which may influence respondents’ answers 29 August 2005
62
Semantic Differential
63
The Semantic Differential
The semantic differential scale measures the psychological meanings of an attitude object using bipolar adjectives. Researchers use this scale for studies of brand and institutional image, employee morale, safety, financial soundness, trust, etc. The method consists of a set of bipolar rating scales, usually with 7 points, by which one or more participants rate one or more concepts on each scale item. The scale is based on the proposition that an object can have several dimensions of connotative meaning. The meanings are located in multidimensional property space, called semantic space. The semantic differential scale is efficient and easy for securing attitudes from a large sample. Attitudes may be measured in both direction and intensity. The total set of responses provides a comprehensive picture of the meaning of an object and a measure of the person doing the rating. It is standardized and produces interval data. 29 August 2005
64
Convenience of Reaching the Store from Your Location
Adapting SD Scales Convenience of Reaching the Store from Your Location Nearby ___: ___: ___: ___: ___: ___: ___: Distant Short time required to reach store Long time required to reach store Difficult drive Easy Drive Difficult to find parking place Easy to find parking place Convenient to other stores I shop Inconvenient to other stores I shop Products offered Wide selection of different kinds of products Limited selection of different kinds of products Fully stocked Understocked Undependable products Dependable products High quality Low quality Numerous brands Few brands Unknown brands Well-known brands The steps in constructing a semantic differential scale are provided in Exhibit 12-7.
65
SD Scale for Analyzing Actor Candidates
A scale used by a consulting firm to help a movie production company evaluate actors for the leading role of a risky film venture. The selection of concepts is driven by the characteristics they believe the actor must possess to produce box office financial targets
66
Graphic of SD Analysis In Exhibit 12-9, the data are plotted on a snake diagram. Here the adjective pairs are reordered so evaluation, potency, and activity descriptors are grouped together, with the ideal factor reflected by the left side of the scale. Profiles of the three actor candidates may be compared to each other and to the ideal.
67
Numerical Scale Numerical scales have equal intervals that separate their numeric scale points. The verbal anchors serve as the labels for the extreme points. Numerical scales are often 5-point scales but may have 7 or 10 points. The participants write a number from the scale next to each item. It produces either ordinal or interval data.
68
Multiple Rating List Scales
A multiple rating scale is similar to the numerical scale but differs in two ways: it accepts a circled response from the rater, and the layout facilitates visualization of the results. The advantage is that a mental map of the participant’s evaluations is evident to both the rater and the researcher. This scale produces interval data. “Please indicate how important or unimportant each service characteristic is:” IMPORTANT UNIMPORTANT Fast, reliable repair Service at my location Maintenance by manufacturer Knowledgeable technicians Notification of upgrades Service contract after warranty
69
Stapel Scales From Exhibit 12-3:
The Stapel scale is used as an alternative to the semantic differential, especially when it is difficult to find bipolar adjectives that match the investigative question. In the example, there are three attributes of corporate image. The scale is composed of the word identifying the image dimension and a set of 10 response categories for each of the three attributes. Stapel scales produce interval data.
70
Constant-Sum Scales From Exhibit 12-3:
The constant-sum scale helps researchers to discover proportions. The participant allocates points to more than one attribute or property indicant, such that they total a constant sum, usually 100 or 10. Participant precision and patience suffer when too many stimuli are proportioned and summed. A participant’s ability to add may also be taxed. Its advantage is its compatibility with percent and the fact that alternatives that are perceived to be equal can be so scored. This scale produces interval data.
71
Graphic Rating Scales From Exhibit 12-3:
The graphic rating scale was originally created to enable researchers to discern fine differences. Theoretically, an infinite number of ratings is possible if participants are sophisticated enough to differentiate and record them. They are instructed to mark their response at any point along a continuum. Usually, the score is a measure of length from either endpoint. The results are treated as interval data. The difficulty is in coding and analysis. Graphic rating scales use pictures, icons, or other visuals to communicate with the rater and represent a variety of data types. Graphic scales are often used with children.
72
Paired-comparison scale
Ranking Scales Paired-comparison scale Forced ranking scale From Exhibit 12-3: In ranking scales, the participant directly compares two or more objects and makes choices among them. The participant may be asked to select one as the best or most preferred. Comparative scale
73
Paired-Comparison Scale
From Exhibit 12-10: Using the paired-comparison scale, the participant can express attitudes unambiguously by choosing between two objects. The number of judgments required in a paired comparison is [(n)(n-1)/2], where n is the number of stimuli or objects to be judged. Paired comparisons run the risk that participants will tire to the point that they give ill-considered answers or refuse to continue. Paired comparisons provide ordinal data.
74
Forced Ranking Scale From Exhibit 12-10:
The forced ranking scale lists attributes that are ranked relative to each other. This method is faster than paired comparisons and is usually easier and more motivating to the participant. With five item, it takes ten paired comparisons to complete the task, but the simple forced ranking of five is easier. A drawback of this scale is the limited number of stimuli (usually no more than 7) that can be handed by the participant. This scale produces ordinal data.
75
Comparative Scale From Exhibit 12-10:
When using a comparative scale, the participant compares an object against a standard. The comparative scale is ideal for such comparisons if the participants are familiar with the standard. Some researchers treat the data produced by comparative scales as interval data since the scoring reflects an interval between the standard and what is being compared, but the text recommends treating the data as ordinal unless the linearity of the variables in question can be supported.
76
Numerical Scale (MindWriter’s Favorite) Hybrid Expectation Scale
MindWriter Scaling Likert Scale The problem that prompted service/repair was resolved Strongly Disagree Disagree Neither Agree Nor Disagree Agree Strongly Agree 1 2 3 4 5 Numerical Scale (MindWriter’s Favorite) To what extent are you satisfied that the problem that prompted service/repair was resolved? Very Dissatisfied Very Satisfied Hybrid Expectation Scale Resolution of the problem that prompted service/repair. Met Few Expectations Met Some Expectations Met Most Expectations Met All Expectations Exceeded Expectations Exhibit 12-12 There is never just one correct way to ask a question. The MindWriter Close-Up gives you the opportunity to discuss why MindWriter chose the scales that they did. In the Close-Up, Jason and Myra are conversing with the general manager of MindWriter about the necessity of testing their measurement questions. T Henry & Associates has developed three scales shown in the exhibit in the slide. They also debated the wording of the anchors. This would be a good place to discuss the MindWriter scale exercise from the vignette and the Close-Up.
77
Ideal Scalogram Pattern
Item Participant Score 2 4 1 3 X __ * X = agree; __ = disagree. Exhibit 12-14 With a cumulative scale, a participant’s agreement with one extreme scale item endorses all other items that take a less extreme position. A pioneering scale of this type was the scalogram. Scalogram analysis is a procedure for determining whether a set of items forms a unidimensional scale. A scale is unidimensional if the responses fall into a pattern in which endorsement of the item reflecting the extreme position results in endorsing all items that are less extreme. The scalogram and similar procedures for discovering underlying structure are useful for assessing attitudes and behaviors that are highly structured, such as social distance, organizational hierarchies, and evolutionary product stages.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.