Assessing Intelligence
The Origins of Intelligence Testing Western attempts to assess differences in intelligence started over a century ago. Francis Galton(1822-1911), cousin to Charles Darwin, wanted to measure the “natural ability” of people, in order to encourage those of high ability to mate with each other (I guess survival of the fittest runs in the family). He assessed more than 10,000 people at the 1884 London Exposition. However, his results were inconclusive, with “highly-regarded” individuals performing no better than their less exceptional peers. However, Galton did give us some statistical techniques we still use today, as well as the terms “nature” and “nurture”.
Binet and Mental Age At the turn of the last century, France began to make public education mandatory for children. However, it soon realized that some children couldn’t benefit from the system, and needed special education. Because they were unsure if a teacher could state whether or not a child’s needs was based on mental ability or some other social reason, they hired Alfred Binet (1857-1911) to study the problem. Binet and his collaborator, Theodore Simon, began by assuming that all children follow the same course of intellectual development, but at different speeds. A “dull” child may have a mind more similar to a younger child, and a “bright” child might have an older mind. Thus their goal became measuring each child’s mental age,the chronological age that most typically corresponds to a given level of performance. For example, if a child performs as well on something as the average 8-year-old, they have a mental age of 8. Binet did not think his test measured “inborn” intelligence, and instead only showed who needed additional help, perhaps to improve things like attention and discipline.
The Innate IQ Binet feared that his test might be used to label children and limit their opportunities, and was proven right shortly after his death. Lewis Terman (1877-1956), a professor at Stanford University, adapted Binet’s test to work better with California children. His revised version became known as the Stanford-Binet, which it is still called today. For Terman, intelligence tests revealed the intelligence with which a person was born.
Intelligence Quotient From the Binet tests, German psychologist William Stern was able to derive what’s called the Intelligence Quotient, or IQ. Today, IQ is a person’s mental age divided by the “average” for their chronological age (set at 100), then multiplied by 100. Originally it was just mental age divided by chronological age times 100, but when 50 year olds who “thought” like 25 year olds got scores of 50, it was changed to the current system. Terman was a proponent of eugenics, the belief of measuring human traits to encourage the “breeding” of only the smart and fit. Using his new test, the U.S. Government studied the “intelligence” of World War I recruits and incoming immigrants, the results of which allowed for discrimination against many immigrants from Southern and Eastern Europe. Even Terman eventually realized the tests were faulty.
Modern Tests of Mental Abilities As a student and adolescent, you’ve taken tons of tests. School tests, course exams, driving tests, intelligence tests, etc. Psychologists classify these as either achievement tests, which assess what a person has learned, or aptitude tests, which predict a person’s future performance (aptitude is the capacity to learn). The tests you take in this class (including the future AP Exam) are achievement tests. The SAT and ACT are aptitude tests, as they are used to predict how successful you will be in college. In fact, many consider the SAT to be a “thinly veiled intelligence test”.
Wechsler Adult Intelligence Scale (WAIS) Psychologist David Wechsler (1896-1981) created what is now the most widely used individual intelligence test, the Wechsler Adult Intelligence Scale (WAIS). He also created a version for school age and preschool age children. The most recent version contains 15 subtests, including sections like: Similarities: “In what way are cotton and wool similar?” Vocabulary: Naming pictured objects/defining words “What is a guitar?” Block design: Using colored blocks to make abstract pictures and others. This test not only yields an overall intelligence score, but also separate scores for sections like verbal comprehension and working memory.
Principles of Test Construction To be widely accepted, psychological tests must be standardized, reliable, and valid. The Stanford-Binet and Wechsler tests meet these criteria. The number of questions you answer correctly on an intelligence tests tells us very little on its own. To evaluate performance, we must compare a score to a baseline of pretested people. We then determine your position relative to others. This process of defining meaningful scores relative to the pretested group is called standardization.
The Normal Curve Group members’ scores are typically distributed in a bell-shaped pattern that forms a normal curve, the symmetrical bell-shaped curve that describes the distribution of many physical and psychological attributes. Most scores are near the average (middle), with fewer and fewer scores as you approach either extreme. On an intelligence test, we call the midpoint, the average score, 100. For both the IQ tests we’ve discussed, the person’s score shows whether the tester’s performance fell below or above the average. To keep the average at 100, the tests are occasionally restandardized. The trend has shown that from the early 20th century to now, the overall average intelligence has gone up. An “average” test taker in the 1920s would only be around a score of 76 by today’s average. This increase in average intelligence is known as the Flynn Effect.
Reliability & Validity Reliability: the extent to which a test yields consistent results, as assessed by the consistency of scores on two halves of the test, on alternate forms of the test, or on retesting. The tests we’ve talked about (Stanford-Binet and WAIS) have reliability scores of about +.9, which is quite high. Validity: the extent to which a test measures or predicts what it is supposed to. Content Validity: the extent to which a test samples the behavior that is of interest. Predictive Validity: the success with which the test predicts the behavior it is designed to predict.