Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measurement, Data Collection, Validity & Reliability Data is your friend.

Similar presentations


Presentation on theme: "Measurement, Data Collection, Validity & Reliability Data is your friend."— Presentation transcript:

1 Measurement, Data Collection, Validity & Reliability Data is your friend

2 Agenda Measurement Measures (aka, ways to collect data) Validity/reliability, up close and personal

3 Educational Measurement Measurement: assignment of numbers to differentiate values of a variable GOOD RESEARCH MUST HAVE SOUND MEASUREMENT!!

4 Thought Question Consider the following scores on a test Marco 90 Adriane 85 Linda 75 Christy 99 Chantelle 88 Jay 45 Remi 68 Marcus 97 Chi Bo 92 Donnie 85 Which measure of central tendency would Adriane use when telling her parents about her performance?

5 Descriptive Statistics Statistics: procedures that summarize and analyze quantitative data Descriptive statistics: statistical procedures that summarize a set of numbers in terms of central tendency or variation Important for understanding what the data tells the researcher

6 Descriptive Statistics: A Caution Statistics can provide us with useful information, but they can be interpreted in different ways to say different things

7 Thought Question If Jay scored an 85 instead of a 45, what changes? Highly deviant scores (called "outliers") have no more effect on the median than those scores very close to the middle. However, outliers can greatly affect the mean.

8 Descriptive Statistics Frequency distributions (see Figure 6.2) Normal - scores equally distributed around middle Positively skewed - large number of low scores and a small number of high scores; mean being pulled to the positive Negatively skewed - large number of high scores and a small number of low scores; mean being pulled to the negative

9 Normal Distribution

10 An Extreme Example Consider the salaries of 10 people Group A – All are teachers. Salaries: $45,000$45,000$45,000 $50,000$50,000$50,000 $50,000 $55,000 $55,000 $55,000

11 An Extreme Example Consider the salaries of 10 people Group B – Nine are teachers; 1 is Donovan McNabb. Salaries: $45,000$45,000$45,000 $50,000$50,000$50,000 $50,000 $55,000 $55,000 $6,300,000

12 An Extreme Example What happens to the mean and median in these 2 examples? Does it change? What happens to the normal distribution?

13 Positive Skew

14 Negative Skew

15 Case in Point: Teacher Salary Compare Radnor to Philadelphia Is the salary distribution for Philadelphia going to be positively or negatively skewed? (Hint: Look at the # years of experience)

16 Descriptive Statistics Variability How different are the scores? Types Range: the difference between the highest and lowest scores Standard deviation The average distance of the scores from the mean The relationship to the normal distribution ±1 SD = 68% of all scores in a distribution ±2 SD = 95% of all scores in a distribution

17 Variability

18 Standard Deviation

19 Variability Why does variability matter?

20 Descriptive Statistics Relationship How two sets of scores relate to one another Correlation (positive) Low.10 -.39 Moderate.40 -.69 High >.70

21 Example of Correlation

22 Measures of Data Collection Tests Questionnaires Observations Interviews

23 Measures (Means of Data Collection) You must match the instrument to the research question!

24 Questionnaires http://www.authentichappiness.sas.upenn.edu/ Thoughts on those you responded to Approaches to Happiness Optimism Grit

25 Examples to critique Measures Questionnaire – Psychological School Membership Survey used with middle school students Interview protocol – for teachers & counselors regarding professional development issues Observation instrument – PDE 430 for student teachers What are 2 benefits and 2 limitations of this measure?

26 Questionnaires Used to obtain a subject’s perceptions, attitudes, beliefs, values, opinions, or other non-cognitive traits Scales - a continuum that describes subject’s responses to a statement Likert Checklists Ranked items

27 Questionnaires Likert scales Response options require the subject to determine the extent to which they agree with a statement Debate over odd v. even number responses Statements must reflect extreme positive or extreme negative positions Example – CATS evaluations

28 Questionnaires Checklists Choose options Ranked items Sequential order Avoids marking everything high or low

29 Questionnaires Problems with measuring non-cognitive traits Difficulty clearly defining what is being measured Self-concept or self-esteem Response set Responding same way (Ex - all 4’s on CATS) Social desirability “PC filter” Faking Agreeing with statements because of the negative consequences associated with disagreeing

30 Questionnaires Controlling problems Equal numbers of positively and negatively worded statements Alternating positive and negative statements Providing confidentiality or anonymity to respondents

31 Designing Questionnaires Online resources http://pareonline.net/getvn.asp?v=5&n=3 http://www.peecworks.org/PEEC/PEEC_Inst/I0 004E536http://www.peecworks.org/PEEC/PEEC_Inst/I0 004E536 http://www.statpac.com/surveys/

32 Observations Observations - direct observations of behaviors Provide first hand account (ameliorates issues of self-reporting in questionnaires) Natural or controlled settings Ex – classroom vs. lab (child attachment studies) Structured or unstructured observations Ex – frequency counts vs. narrative record Detached or involved observers

33 Observations Inference Low inference - involves little if any inference on the observers part On-task/Off-task behavior instrument High inference - involves high levels of inference on the observers part Teacher effectiveness – PDE form 430

34 Observations Controlling observer effects Observer bias Training Inter-rater reliability (Cronbach’s alpha) Multiple observers Contamination - knowledge of the study influences the observation Training Targeting specific behaviors Observers do not know of the expected outcomes Observers are “blind” to which group is which

35 Observations Observer effects Halo effectHalo effect - initial ratings influence subsequent ratings Hawthorne effectHawthorne effect - increased performance results from awareness of being part of study LeniencyLeniency - wanting everyone to do well Central TendencyCentral Tendency - measuring in the middle Observer DriftObserver Drift - failing to record pertinent information

36 Interviews What are some challenges to doing this kind of interviewing? http://www.youtube.com/watch?v=d6bXH2k9MKE

37 Interviews Advantages Establish rapport & enhance motivation Clarify responses through additional questioning Capture the depth and richness of responses Allow for flexibility Reduce “no response” and/or “neutral” responses

38 Interviews Disadvantages Time consuming Expensive Small samples Subjective – interviewer characteristics, contamination, bias

39 Validity and Reliability What’s all the fuss about?

40 Validity/Reliability and Trustworthiness Why do we need validity and reliability in quantitative studies and “trustworthiness” in qualitative studies? We can’t trust the results if we can’t trust the methods!

41 Reader’s Digest version… Reliability The extent to which scores are free from error Error is measured by consistency Validity The extent to which inferences are appropriate, meaningful, and useful “Does the instrument measure what it is supposed to measure??”

42 Thought Question On the ACT and SAT assessments, there is a definitive script that test administrators are required to follow exactly. What measurement issue are the test makers addressing?

43 Reliability of Measurement Reliability - The extent to which measures are free from error Error is measured by consistency

44 Reliability of Measurement Reliability Measurement 0.00 indicates no reliability or consistency 1.00 indicates total reliability or consistency <.60 = weak reliability >.80 = sufficient reliability

45 Reliability of Measurement Types of reliability evidence Stability (i.e. test-retest) Testing the same subject using the same test on two occasions Limitation - carryover effects from the first to second administration of the test Equivalence (i.e. parallel form) Testing the same subject with two parallel (i.e. equal) forms of the same test taken at the same time Limitation - difficulty in creating parallel forms

46 Reliability of Measurement Equivalence and stability Testing the same subject with two forms of the same test taken at different times Limitation - difficulty in creating parallel forms

47 Reliability of Measurement Internal consistency Testing the same subject with one test and “artificially” splitting the test into two halves Limitations - must have a minimum of ten (10) questions Often see “Chronbach’s alpha” for reliability coefficient (ex – Learning styles)

48 Reliability of Measurement Agreement / Inter-rater reliability Observational measures Multiple observers coding similarly

49 Reliability of Measurement Enhancing reliability Standardized administration procedures (e.g. directions, conditions, etc.) Appropriate reading level Reasonable length of the testing period Counterbalancing the order of testing if several tests are being given

50 Validity of Measurement Validity: the extent to which inferences are appropriate, meaningful, and useful Current example – content tests and teacher licensure

51 Validity of Measurement For research results to have any value, validity of the measurement of a variable must exist Use of established and “new” instruments and the implications for establishing validity Importance of establishing validity prior to data collection (e.g. pilot tests)

52 Validity Content Predictive (criterion-related) Concurrent Construct

53 Thought Question Criticisms of standardized tests like the SAT claim that they discriminate against particular groups of students (especially minorities) and do not represent a broad enough domain of knowledge to adequately assess a student’s academic potential. What issue of validity is operating in these arguments?

54 Thought Question Other arguments against the SAT state that the tests do not adequately estimate an individual’s ability to succeed in college. What issue of validity is operating here?

55 Reliability & Validity of Measurement What is the relationship of reliability to validity? If a watch consistently gives the time at 1:10 when actually it is 1:00, it is ____ but not ____. ______ is necessary but not sufficient condition for _______. To be _____, an instrument must be ______, but a ____ instrument is not necessarily _____.

56 Reliability & Validity of Measurement What is the relationship of reliability to validity? If a watch consistently gives the time at 1:10 when actually it is 1:00, it is reliable but not valid. Reliability is necessary but not sufficient condition for validity To be valid, an instrument must be reliable, but a reliable instrument is not necessarily valid.

57 Midterm Multiple Choice: 50 pts Short Answer: 25 pts Article Critique: 25 pts Bring article with you to class. It’s ok to have notes on it.


Download ppt "Measurement, Data Collection, Validity & Reliability Data is your friend."

Similar presentations


Ads by Google