Download presentation
1
Psychologist use statistics for 2 things
Summarize the information from the study/experiment Measures of central tendency Mean Median Mode Make judgements and decisions about the data See if groups differ from each other See if two variables are related to each other
2
Measures of Variation Range Standard Deviation
Difference b/t the highest score and the lowest score extreme scores can also create a deceptive range Standard Deviation Measure of how much scores vary around the mean score Are scores packed together or dispersed?
3
Standard Deviation Test Scores in Class A Test Scores in Class B 72 -8
Deviation from the Mean Squared Deviation 72 -8 64 60 -20 400 74 -6 36 77 -3 9 70 -10 100 79 -1 1 82 +2 4 90 +10 84 +4 16 85 +5 25 +20 87 +7 49 204 2000 Sum of all scores 640 640 Mean=640÷8=80
4
Standard Deviation 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛= 𝑠𝑢𝑚 𝑜𝑓 (𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠 ) 2 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑐𝑜𝑟𝑒𝑠 Class A Class B = =15.8 So, what does this tell us about each class’s score on the test? Remember the mean is 80. Goal of standard deviation: Calculate how far your data points are from the average We can also define a normal distribution with two pieces of information—mean and SD Standard deviation is a number used to tell how measurements for a group are spread out from the average (mean), or expected value. A low standard deviation means that most of the numbers are very close to the average (more reliable). A high standard deviation means that the numbers are spread out. (less reliable)
5
Well, what’s all this look like on a graph….
6
Inferential Statistics: Involves estimating what is happening in a sample population for the purpose of making decisions about that population’s characteristics (based in probability theory). Basically, inferential stats allow us to say: “If it worked for this population, we can estimate it will work for the rest of the population.” ie - Drug Testing -- if the meds worked for the sample, we estimate they will have the same effect on the rest of the population. There is always a chance for error in whatever the findings may be, so the hypothesis & results must be tested for significance.
7
Inferential statistics
How do we know whether an observed difference can be generalized to other populations? Inferential statistics Representative samples are better than biased samples. Less-variable observations are more reliable than those that are more variable. More cases are better than fewer. Statistical significance: indicates the likelihood that a result will happen by chance (psychologists like 5% left to chance) t-test F-test or analysis of variance chi-square Pg. 42 in text
8
Inferential Statistics
Statistical Significance - difference observed between 2 groups is probably NOT due to chance. The difference instead is likely due to a real difference between the samples. Data is “significant” when the likelihood of a difference being due to chance is less than 5 times out of 100. In other words... There is a 95% chance (or greater) likelihood that any difference seen is due to your independent variable shown numerically as p < .05 Important because if research is statistically significant it means that the results are probably not a fluke or due to chance.
9
Practice Questions Descriptive statistics ______, while inferential statistics _______. indicate the significance of the data; summarize the data describe data from experiments; describe data from surveys and case studies are measures of central tendency; are measures of variance. determine if data can be generalized to other populations; summarize data summarize data; determine if data can be generalized to other populations
10
Practice Questions In a normal distribution, what percentage of the scores in the distribution falls within one standard deviation on either side of the mean? 34% 40% 50% 68% 95%
11
Practice Questions When a distribution of scores is skewed, which of the following is the most representative measure of central tendency? Inference Standard deviation Mean Median Correlation coefficient
12
IQ Mini-Lesson What is intelligence?
13
Intelligence The ability to learn from experience, solve problems, and use knowledge to adapt to new situations. Is socially constructed thus… Can be culturally specific. According to this definition, are both Einstein and Ruth intelligent? Human beings are uniquely intelligent, and the form of that intelligence is unique in all of us.
14
Intelligence Wars Intelligence: mental quality consisting of the ability to learn from experience, solve problems, and use knowledge to adapt to new situations Socially constructed by a culture Usually referred to as “school smarts” Several intelligence theories Do we have an inborn mental capacity? Can it be quantified with a number?
15
Intelligence Wars Spearman’s General Intelligence or g
Gardner’s Multiple Intelligences Sternberg’s Triarchic Theory of Intelligence Create a trifold foldable that compares and contrasts historic and contemporary theories of intelligence
16
Assessing Intelligence
Intelligence is whatever intelligence tests measure Intelligence test—a method for assessing an individual's mental aptitudes and comparing them with those of others, using numerical scores
17
Assessing Intelligence (not in your book)
Francis Galton Eugenics— “well-born”; the practice of encouraging supposedly superior people to reproduce, while discouraging or even preventing those judged inferior from doing so 19th Century British mathematician and naturalist; founded the eugenics movement; believed that intelligence is a product of how quickly and accurately people respond to stimuli—measured sensory abilities & reaction times, measured head size and muscular strength. His tests not only did not correlate with each other, but also had almost no relation to accepted criteria of intellectual functioning.
18
Assessing Intelligence
Alfred Binet Lewis Terman David Wechsler Create a trifold foldable to identify key contributors in intelligence research and testing, and list characteristics of how psychologist measure intelligence Binet—French, “father of modern intelligence tests”; used mental age—a measure of intelligence; the chronological age that most typically corresponds to a given level of performance; created the Binet-Simon test (inexpensive, easily administered, objective measure of intelligence that could identify lower-performing children in need of special education; contended that cognitive development follows the same course in all children, but some learn faster and more easily than others; first measure to be proved valid What’s that mean?(accurately identified lower performing students) Terman– “Terman’s Termites”; Stanford; adopted William Stern’s method of comparing mental age & chronological age (IQ=MA/CAx100); this is a ratio IQ; revised Binet’s test and renamed Stanford-Binet; test is used to place students in specialized programs; usually given to 7-8 year olds; no longer represented as a ratio (use age group norms or deviation IQ) Wechsler– responsible for developing the deviation IQ score; today the most widely used set of intelligence tests is the Wechsler Intelligence Scales (WAIS, WPIS); contains both verbal and performance subscales;
19
Modern Tests Aptitude tests—designed to predict a person’s future performance as aptitude is the capacity to learn ACT (American College Test) seeks to predict your ability to do well in college Achievement tests—designed to assess what a person has learned EOC (End of Course exam) seeks to assess what you learned in the course
20
Principles to Test Construction: Standardization
Standardization—defining meaningful scores by comparison with the performance of a pretested standardization group, while also using uniform instructions for administration of test Tests need to be constantly restandardized to properly assess different generations—Flynn Effect (intelligence scores have been rising over time) 3 characteristics: Standardized administration Standardized scoring Comparison against norms To keep the average score near 100, the Stanford-Binet and Wechler scales are periodically restandardized. If you recently took the WAIS fourth edition, your scores would be compared with the standardization sample of 2007, not the original sample taken in the 1930’s.
21
Principles to Test Construction: Standardization
When a test is standardized, the results when graphed typically form a normal curve—symmetrical bell-shaped curve that describes the distribution of many physical and psychological attributes; most scores fall near the average and fewer and fewer scores lie near the extreme On an intelligence test, the average score is 100.
22
Principles of Test Construction: Reliability
The extent to which a test yields consistent results Measured by 2 test halves, alternate forms, or retesting Internal consistency—similar questions about the same learning goal to measure if similar constructs assess the appropriate goal People should generally score the same when the test is taken multiple times Test-Retest: measure of reliability obtained by administering the same test twice over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then be correlated in order to evaluate the test for stability over time. A test designed to assess student learning in psychology could be given to a group of students twice, with the second administration perhaps coming a week after the first. The obtained correlation coefficient would indicate stability of the scores. The Stanford-Binet, WAIS, and WISC all have reliabilities of about +.9 which is very high. When retested, people’s scores generally match their first score closely. Split-half reliability is another subtype of internal consistency reliability. The process of obtaining split-half is begun by “splitting in half” all items of a test that are intended to probe the same area of knowledge (e.g. WWII) in order to form two “sets” of items. The entire test is administered to a group of individuals, the total score for each “set” is computed, and finally the split-half reliability is obtained by determining the correlation between the two total “set” scores. Parallel forms reliability is a measure of reliability obtained by administering different versions of an assessment tool (both versions must contain items that probe the same construct, skill, knowledge base, etc) to the same group of individuals. The scores from the two versions can then be correlated in order to evaluate the consistency of results across alternate versions. Internal consistency--For example, if a respondent expressed agreement with the statements "I like to ride bicycles" and "I've enjoyed riding bicycles in the past", and disagreement with the statement "I hate bicycles", this would be indicative of good internal consistency of the test.
23
Principles of Test Construction: Validity
The extent to which a test measures or predicts what it is supposed to Content validity—the extent to which a test samples the behavior that is of interest A driving test assess driving tasks Predictive validity—the success with which a test predicts the behavior it is designed to predict Criterion—the behavior a test is designed to predict ACT is designed to predict future college performance with is the criterion
24
Practice Questions Which type of correlation would show the best reliability in a test-retest situation?
25
Practice Questions A test developer defines uniform testing procedures and meaningful scores by comparison with the performance of a pretested group. Which of the following best describes this process? Reliability testing Validation Content validation Standardization Predictive validity
26
Practice Questions The Flynn effect refers to the
Superiority of certain racial and ethnic groups on intelligence tests Extreme scores (very high and very low) that are more common for males than females on math tests Stereotype threat that may cause some African-American students to underperform on standardized tests Predictive ability of intelligence tests Gradual improvement in intelligence test scores over the last several decades
27
Practice Questions Students who do well on college entrance exams generally do well in their first year of college. This helps establish that these exams have Predictive validity Split-half reliability Content validity Test-retest reliability Standard validity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.