Outline Test bias – definitions The basic issue: group differences What causes group differences? Arguments that tests are not biased Differential item.

Slides:



Advertisements
Similar presentations
Cal State Northridge Psy 427 Andrew Ainsworth PhD
Advertisements

Unequal Treatment: Confronting Racial and Ethnic Disparities in Healthcare Institute of Medicine.
Presented by Breanna Dailey.  What is causing a seemingly “major disparity” between African American students and Caucasian students assessment scores?
Lecture Outline Being the Target of Prejudice Stereotype Threat Positive Prejudice.
Correlation AND EXPERIMENTAL DESIGN
Outline Test bias – definitions The basic issue: group differences What causes group differences? Arguments that tests are not biased Differential item.
MGTO 231 Human Resources Management Equal Opportunity and the Law Dr. Kin Fai Ellick WONG.
Black Americans Reduce the IQ Gap: Evidence from Standardization Samples William T. Dickens The Brookings Institution James R. Flynn University of Otago.
Validity Does test measure what it says it does? Is the test useful? Can a test be reliable, but not valid? Can a test be valid, but not reliable?
Today Concepts underlying inferential statistics
What is Intelligence? Definition: 3 main characteristics 1) 2) 3)
Intelligence What is it? How do we measure it? Are those tests valid?
INTELLIGENCE HOW IS IT MEASURED AND DEFINED?. DEFINE INTELLIGENCE The ability to learn from experience, solve problems, and use knowledge to adapt to.
 What makes a good intelligence test?  Do Intelligence Tests actually measure intelligence?
1 Intelligence. 2 What is Intelligence? Intelligence - the ability to learn from experience, solve problems, and use our knowledge to adapt to new situations.
Step Up To: Psychology by John J. Schulte, Psy.D. Psychology, Eighth Edition By David G. Myers Worth Publishers (2007)
Intelligence Smart, How? Different Strokes Take a Test How do we measure it? Where do you get yours?
Tests and Measurements Intersession 2006.
EPSY 625 LECTURE 3 COGNITIVE ASSESSMENT. AFFECT TASK DEMANDS: STRUCTURING COGNITIVE TESTS TYPES ARTIFICIAL ANALOG ACTUAL TESTS: 1. ACHIEVEMENT 2. INTELLIGENCE.
Genetics vs. Environment
PSYCHOMETRICIANS: develop tests -try to make constructs measurable and quantifiable -purpose is to differentiate between test- takers 3 Qualities of Tests:
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Technical Adequacy Session One Part Three.
Intelligence and Adaptive Behavior for Classroom Practices Litigation of IQ Assessment.
Unit 6: Testing & Individual Differences
PSYCHOLOGY, Ninth Edition in Modules David Myers PowerPoint Slides Aneeq Ahmad Henderson State University Worth Publishers, © 2010.
Multicultural Awareness This from the University of Georgia…(and other places)
Assessing Intelligence
Assessing Intelligence. Why was intelligence tests created? Is it better to separate students into ability groups or to have mainstreamed classes? Why?
MGTO 231 Human Resources Management Personnel selection II Dr. Kin Fai Ellick WONG.
Intelligence Intelligence: the ability to learn from one’s experiences, acquire knowledge, and use resources effectively in adapting to new situations.
Chapter 9: Intelligence and Individual Differences in Cognition Module 9.1 What is Intelligence? Module 9.2 Measuring Intelligence Module 9.3 Special Children,
Intelligence and Adaptive Behavior for Classroom Practices
Measurement Validity.
Intelligence intelligence: usually defined as the ability to profit from experience, acquired knowledge, think abstractly, act purposefully, and/or adapt.
Chapter 11 pt. 2: Intelligence Assessment. Warm Up Pick up warm up off of the overhead Pick up warm up off of the overhead Work Alone Work Alone 15 minutes.
Step Up To: Psychology PERCEPTION Psychology, Eighth Edition By David G. Myers.
Chapter 8: Intelligence and Individual Differences in Cognition.
Origins of Intelligence Testing  Intelligence Test  a method of assessing an individual’s mental aptitudes and comparing them to those of others, using.
Intelligence – Part 2. Write EVERYTHING in BLUE You Do NOT need to write what is in BLACK.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
1 Stereotypes Carolyn R. Fallahi, Ph. D.. 2 How can two children of different ethnicities have completely different experiences? Is it possible that they.
Intelligence testing. What is Intelligence? Intelligence is a construct (i.e, concrete observational entities), not a concrete object. Intelligence is.
Chapter 11 pt. 2: Intelligence Assessment. Agenda 1. Bell Ringer: How is intelligence measured in the WAIS test? Unit 9 and Unit 10 cover pages 2. Lecture:
What makes us intelligent?. The ability to learn from experience, solve problems, and use knowledge to adapt to new situations. Is socially constructed.
Chapter 11 - Intelligence Do I belong in this class? Just Kidding.
Intelligence What makes us intelligent Or Not so intelligent.
ASSIMILATION AND ITS DISCONTENTED Kay Deaux Western Migration Conference London, Ontario April 30, 2011.
Chapter 11 Intelligence “Just Think Mr. Thompson”.
 Girls better spellers, more verbally fluent, better at locating objects, detecting emotions, more sensitive to touch, taste & color  Boys better a.
Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.
Chapter 9 Intelligence. Objectives 9.1 The Nature of Intelligence Define intelligence from an adaptation perspective. Compare and contrast theories of.
1. Which diagram results from folding the diagram on the left?
WHAT IS INTELLIGENCE? l MYTH: Each individual’s capacity to think & solve problems, including learning, memory, reasoning is: Ü -innate, genetically determined.
Intelligence.
Genetic and Environmental Influences on Intelligence
Chapter 3: Legal, Ethical, and Diversity Foundations and Perspectives in Assessment ONLINE MODULE.
Test Validity.
AP Unit 11 Testing and Individual Differences pt. 1
Classroom Assessment Validity And Bias in Assessment.
Journal Suppose you were asked to select the best person to be your teacher from among a group of applicants. How would you go about making the selections?
Individual Differences and Group Differences in Intelligence
Genetics vs. Environment
Chapter 10: Intelligence & Testing
Cal State Northridge Psy 427 Andrew Ainsworth PhD
PSYCHOLOGY, Ninth Edition in Modules David Myers
Presentation transcript:

Outline Test bias – definitions The basic issue: group differences What causes group differences? Arguments that tests are not biased Differential item functioning analysis Criterion-related sources of bias

Outline Other approaches to testing minority groups Chitling test BITCH test SOMPA Models of test Bias Regression Constant Ratio Cole/Darlington Quota

Test bias – definition A test is biased if it gives a systematically wrong result when used to predict something. So, an intelligence test would be biased if, for example, it underestimated one group’s probability of success in a given endeavor.

Test bias – the basic issue Various groups within society differ in their average scores on some psychological tests African-Americans score 1 standard deviation lower than Whites Asian-Americans score slightly higher than Whites Ashkenazi Jews score highest of all

What causes group differences? We don’t know. Here are some candidate accounts: Genetics Socioeconomic factors Caste Culture Stereotype threat

Genetics Highest IQ scores are for Ashkenazi Jews Cochran et al. (2006): medieval social environment for European Jews selected for verbal & math intelligence (but not spatial) Some relation to disease genes?

Socioeconomic factors Much higher proportion of African- Americans are poor than of Whites, with consequences for nutrition, health care, resources such as books in the home But AA – White difference is not eliminated when groups are equated on SES

Caste “Involuntary” minorities all over the world do less well in school and drop out earlier than majority children Ogbu: African- American children lack “effort optimism” – the sense that hard work will be rewarded

Culture A. Wade Boykin:African-American culture has a “deep- structure” that conflicts with the demands made by typical American schools

When children are ordered to do their own work, arrive at their own individual answers, work only with their own materials, they are being sent cultural messages. When children come to believe that getting up and moving about the classroom is inappropriate, they are being sent powerful cultural messages. When children come to confine their 'learning' to consistently bracketed time periods, when they are consistently prompted to tell what they know and not how they feel, when they are led to believe that they are completely responsible for their own success and failure, when they are required to consistently put forth considerable effort for effort's sake on tedious and personally irrelevant tasks.., then they are pervasively having cultural lessons imposed on them" (Boykin, 1994, p. 125).

Racial identity & test scores Awad (2007)313 African-American university students at a historically Black university GRE (Verbal) and several psychological tests

Awad (2007) Cross Racial Identity Scale Cross (1991) Rosenberg Self- Esteem Scale Rosenberg (1965) Academic Self- Concept Scale Reynolds (1988)

Racial identity & test scores Academic self- concept predicted GPA but not GRE test scores Racial identity predicted neither GPA nor GRE scores Self-esteem didn’t predict either

Stereotype threat Steele & Aronson (1995) A social- psychological threat produced in a situation in which a negative stereotype about your group is made salient You fear you will confirm the stereotype This affects highly able, school-identified African-Americans because they feel the most pressure to do well

Arguments that tests are not biased Major tests have been subjected to impressive scrutiny for decades Enormous resources are devoted to this purpose Criterion validity has been established very securely for the major intelligence tests – they do predict college and job performance

Arguments that tests are not biased It is not appropriate to focus on individual items on a test, which some critics of testing do Items should be drawn from a variety of domains, not all of which will be familiar to anyone

Arguments that tests are not biased Test developers evaluate tests on the basis of overall patterns of prediction utility They’re future- oriented, not past- oriented: “How will you do in college or in a job?” Not “have you had the opportunity to learn?”

Arguments that tests are not biased Do you think of test score results as “outcomes” or as “information” (predictors)? Test developers say, results are the beginning, not the end – they are information that will guide us Opponents see test results as outcomes

Arguments that tests are not biased Systematic studies have asked whether biased items produce group differences on tests such as Stanford-Binet and Wechsler tests These studies found no evidence that group differences disappeared when allegedly biased items were removed

Argument that tests are not biased Group differences just as large on what is considered the most culture fair test, Ravens Progressive Matrices, as on WAIS IQ scores have same utility for prediction regardless of race or socio-economic status.

Differential item functioning analysis In this approach to testing for bias, you first form groups for comparison which are equated on overall test score Implication: groups are equivalent in overall ability Then, you look for differences between groups on individual items Where difference is found, you conclude that the item is biased (since groups are not different on ability)

Differential item functioning analysis But removing such items does not eliminate group differences E.g., people depicted in test items may typically be White & male But changing this has little effect (McCarty, Noble, & Huntley, 1989)

Criterion-related sources of bias We evaluate criterion validity by looking at correlation between test scores and criterion scores E.g., SAT scores vs. GPA after 4 years at university

Criterion-related sources of bias If correlation is good, we use test scores (e.g., SAT) to predict criterion – and make selection decisions What do we do if the correlation is different for different groups? This would imply that test scores mean different things for different groups

Criterion-related sources of bias In this graph, Group B performs better than Group A but the correlation is the same for both Test score Criterion Group B Group A

Criterion-related sources of bias In this graph, the slopes of the lines are the same but the intercepts are different Equal slopes means equal correlations – that is, equally good predictions Test score Criterion Group B Group A

Criterion-related sources of bias Here, the intercepts are different and the slopes are different, so predictions for Groups A and B would not be equally good Such cases are rare Group A Group B X1X1 X2X2

Criterion-related sources of bias Major tests, such as SAT and WISC-R, have equal criterion validity for various ethnic groups (e.g, African-American, White, Latino/Latina) Similar results have been found in other multi-ethnic countries, such as Israel

Other approaches to testing minority groups The Chitling Test The BITCH Test SOMPA

The Chitling Test (Dove, 1968) Developed to make a point about testing for information a group is unlikely to have acquired Questions require a particular form of “street smarts” to answer correctly No validity data exist for this test If you want to predict college performance for minority students, this test won’t help

The BITCH test (Williams, 1974) Task: define 100 words drawn from the Afro-American Slang Dictionary and Williams' personal experience African-Americans score higher than Whites Williams argues that this test is analogous to the standard IQ tests, which are also culture-bound

The BITCH test (Williams, 1974) Problem: there is no reason to accept the claim that this is an intelligence test. There is no validity evidence – no prediction of any performance Does not test reasoning skills May have some value for testing familiarity with African-American culture

SOMPA (Mercer, 1979) System of Multi- cultural Pluralistic Assessment Based on idea that what constitutes knowledge is socially- constructed Mercer also suggested that IQ tests are a tool Whites use to keep minority groups “in their place”.

SOMPA (Mercer, 1979) Inspired originally in part by over- representation of minority group children in EMR classes in US schools Mercer: this over- representation resulted from both More medical problems Unfamiliar cultural references on tests

SOMPA (Mercer, 1979) Fundamental assumption: all cultural groups have the same potential on average On this view, if one cultural group does more poorly than another on a test, that is a fact about the test, not the groups.

SOMPA (Mercer, 1979) Combines 3 kinds of evaluation: Medical Health, vision, hearing, etc. Social Entire WISC-R Pluralistic Compare WISC-R scores to those of same community

SOMPA (Mercer, 1979) Estimated Learning Potentials: WISC-R scores adjusted for socio-economic background But these ELPs don’t predict school performance as well as the original WISC- R scores Mercer: ELPs are intended to assess who should be in EMR classes

SOMPA (Mercer, 1979) A major problem, in my view, is that we don’t know what consequences arise for children who are removed from EMR classes on basis of ELPs Is what we call these children important? It is if the label has an effect, but data do not show that effect SOMPA used much less today than it used to be

Models of test Bias Regression Constant Ratio Cole/Darlington Quota

Regression Basis – unqualified individualism: Treat each person as an individual, not as a member of a group Select people with highest scores for job or college place Ignores sex, race, other group characteristics Leads to highest average performance on criterion

Constant Ratio Basis – choose so that selection ratio for groups = success ratio for groups Select the best candidate but give a boost to minority group members’ scores so that selection probability = success probability

Constant Ratio Adjust test scores for minority groups upwards by half the mean difference between groups Leads to somewhat lower average performance on criterion

Cole/Darlington Basis – If there is special value in selecting minority group members, then a minority score of Y on criterion is equal to a majority score of Y + k on criterion Separate regression equations used for different groups and adjustment made Leads to lower average performance on criterion

Cole/Darlington If a value is placed on selection of minority group members, and intercept is lower for that group, then we consider minority test score X 1 and majority test score X 2 equal X1X1 X2X2 k

Quota Basis – idea that all groups should have equal outcomes Selection based on different regression equations for each group Produces lower average performance on criterion

Quota If 10% of population is Asian then 10% of student body should be Asian Another way to look at this: if 10% of population is Jewish then no more than 10% of professors should be Jewish. This puts the quota idea in a different light.