Presentation is loading. Please wait.

Presentation is loading. Please wait.

Http://faculty.vassar.edu/lowry/VassarStats.html Class 5 Statistical Tests http://faculty.vassar.edu/lowry/VassarStats.html Class 5.

Similar presentations


Presentation on theme: "Http://faculty.vassar.edu/lowry/VassarStats.html Class 5 Statistical Tests http://faculty.vassar.edu/lowry/VassarStats.html Class 5."— Presentation transcript:

1 http://faculty.vassar.edu/lowry/VassarStats.html Class 5
Statistical Tests Class 5

2 For Thursday Revise Chapter 3 - Method Prepare Final Presentation
For next Monday – final version of proposal by 9am.

3 Parametric Assumptions
Interval Data Normality - Scores are normally distributed in each group Homogeneity of Variance - The amount of variability in scores is similar between each group (Levin’s test)

4 Independent Samples t-test [see data set]
Used to determine whether differences between two independent group means are statistically significant n = < 30 for each group. Though many researchers have used the t test with larger groups. Groups do not have to be even. Only concerned with overall group differences w/o considering pairs [A robust statistical technique is one that performs well even if its assumptions are somewhat violated by the true model from which the data were generated. Unequal variances = alternative t test or better Mann-Whitney U]

5 Correlated (paired, dependent) Samples t-test [see data set]
Used to determine whether differences between two means taken from the same group, or from two groups with matched pairs are statistically significant e.g., pre-test achievement scores for the whole song group vs. post-test achievement scores for the whole song group Group size must be even (paired) N = < 30 for each group Non parametric equivalent = Wilcoxon Signed Ranks Test

6 ANOVA (use ACT Explore test data from Day 4)
Analyze means of 2+ groups Homogeneity of variance Independent or correlated (paired) groups More rigorous than t-test (b/w group & w/i group variance). Often used today instead of T test. F statistic One-Way = 1 independent variable Two-Way/Three-Way = 2-3 independent variables (one active & one or two an attribute)

7 One-Way ANOVA Calculate a One-Way ANOVA for data-set Difference b/w subject tests for 2007 Non-band students? (correlated) Implications? Difference b/w reading scores for non-band students 2007/2008/2009 Post Hoc tests Used to find differences b/w groups using one test. You could compare all pairs w/ individual t tests or ANOVA, but leads to problems w/ multiple comparisons on same data Bonferroni correction – reduce alpha to compensate for multiple comparisons. (.05/N comparisons) Tukey – Equal Sample Sizes (though can be used for unequal sample sizes as well) [HSD = honest significant difference] Sheffe – Unequal Sample Sizes (though can be used for equal sample sizes as well) Non parametric equivalents = Kruskal-Wallis ANOVA (independent) & Friedman ANOVA (dependent)

8 Two Way ANOVA (2X2) [see data set day 5]

9 Interpreting Results of 2x2 ANOVA
(columns) CAI was more effective than Traditional methods for both high and low achieving students (rows) High Achieving students scored significantly higher than Low achieving students, regardless of teaching method There was no significant interaction between rows & columns If there was significant interaction, we would need to do post hoc Tukey or Sheffe do determine where the differences lie.

10 ANCOVA – Analysis of Covariance [data set day 5]
Statistical control for unequal groups Adjusts posttest means based on pretest means. [example – see data set] You are testing a sight-singing method w/ a treatment & control group. In the pretest, you find that the control group is already better at SS than the treatment group (not equal). So you decide to use the pretest as a covariant to equalize the groups on the posttest. [The homogeneity of regression (moving together or opposite) assumption is met if within each of the groups there is a correlation between the post test and the covariate (pretest) and the correlations are similar b/w groups] (Judgement call)

11 Correlational Research

12 Correlational Research Basics
Relationships among two or more variables are investigated The researcher does not manipulate the variables Direction (positive [+] or negative [-]) and degree (how strong) in which two or more variables are related Uses Clarifying and understanding important phenomena (relationship b/w variables—e.g., height and voice range in MS boys) Explaining human behaviors (class periods per weeks correlated to practice time) Predicting likely outcomes (one test predicts another)

13 Correlation Research Basics
Particularly beneficial when experimental studies are difficult or impossible to design Allows for examinations of relationships among variables measured in different units (decibels, pitch; retention numbers and test scores, etc.) DOES NOT indicate causation Reciprocal effect (a change in weight may affect body image, but body image does not cause a change in weight) Third (other) variable actually responsible for difference (Tendency of smart kids to persist in music is cause of higher SATs among HS music students rather than music study itself)

14 Interpreting Correlations
Correlation coefficient (Pearson = r; Spearman = rs) Can range from to +1.00 Direction Positive As X increases, so does Y and vice versa Negative As X decreases, Y increases and vice versa Degree or Strength (rough indicators) ; weak ; moderate ; strong ; very strong 1.0 = perfect r2 (% of shared variance) Overlap b/w two variables Percent of the variation in one variable that is related to the variation in the other. Example: Correlation b/w musical achievement and minutes of instruction is r = .86. What is the % of shared variance (r2)? Easy to obtain significant results w/ correlation. Strength is most important

15 Interpreting Correlations (cont.)
Words typically used to describe correlations Direct (Large values w/ large values or small values w/ small values. Moving parallel. 0 to +1 Indirect or inverse (Large values w/small values. Moving in opposite directions. 0 to -1 Perfect (exactly 1 or -1) Strong, weak High, moderate, low Positive, Negative Correlations vs. Mean Differences Groups of scores that are correlated will not necessarily have similar means. Correlation also works w/ different units of measurement.

16 Statistical Assumptions
The mathematical equations used to determine various correlation coefficients carry with them certain assumptions about the nature of the data used… Level of data (types of correlation for different levels) Normal curve (Pearson, if not-Spearman) Linearity (relationships move parallel or inverse) Young students initially have a low level of performance anxiety, but it rises with each performance as they realize the pressure and potential rewards that come with performance. However, once they have several performances under their belts, the anxiety subsides. (non linear relationship of # of performances & anxiety scores) Presence of outliers (all) Homoscedascity – relationship consistent throughout Performance anxiety levels off after several performances and remains static (relationship lacks Homoscedascity) Subjects have only one score for each variable Minimum sample size needed for significance

17 Correlational Approaches for Assessing Measurement Reliability
Consistency over time test-retest (Pearson, Spearman) Consistency within the measure internal consistency (split-half, KR-20, Cronbach’s alpha) Among judges Interjudge (Cronbach’s Alpha) Consistency b/w one measure and another (Pearson, Spearman)

18 Reliability of Survey What broad single dimension is being studied?
e.g. = attitudes towards elementary music Preference for Western art music “People who answered a on #3 answered c on #5” Use Cronbach’s alpha Measure of internal consistency Extent to which responses on individual items correspond to each other

19 Examples Calculate the Pearson correlation between each a subject test and combined score on the ACT Explore for (each take one subject) Calculate a Spearman Correlation for Contest ratings each judge vs. final rating Calculate internal consistency (reliability) of all three judges using Cronbach’s alpha.

20 Confidence Limits/Interval http://graphpad.com/quickcalcs/CImean1.cfm
Attempts to define range of true population mean based on Standard Error estimate. Confidence Level 95% chance vs. 99% chance Confidence Limits 2 numbers that define the top & bottom of the range Confidence Interval (margin of error) expressed in units or % The confidence interval (also called margin of error) is the plus-or-minus figure usually reported in newspaper or television opinion poll results. For example, if you use a confidence interval of 4 and 47% percent of your sample picks an answer you can be "sure" that if you had asked the question of the entire relevant population between 43% (47-4) and 51% (47+4) would have picked that answer. OR = the range b/w confidence limits

21 Standard Error of the Mean
Estimate of the average SD for any number of samples of the same size taken from the population. Example: If I tested 30 students on music theory Test 0-100 Mean 75; SD 10 Standard Error (SE) would estimate average SD among any number of same size samples taken from the population SEM = SD/sq root N Calculate for example on the left. 95% Confidence Interval 95% of the area under a normal curve lies within roughly 1.96 SD units above or below the Mean (rounded to +/-2) 95% CI = M + or – (SEM X 1.96) 99% CI = M + or – (SEM X 2.76) Sq root of N = 5.477 SEM = 1.826 SEM x 1.96 = 3.579 SEM x 2.76 = 5.04 95% CI = – 99% CI = – 69.96

22 Normal Curve/Distribution (+/- 1 = 68.26%, 2 = 95%, 3 = 99.7%)

23 Confidence Calculate at a 95% confidence level (by hand)
ACT Explore scores Confidence Limits Interval How close to your scores represent true population?

24 Phillip M. Hash Calvin College Grand Rapids, Michigan pmh3@calvin.edu
Development and Validation of a Music Self-Concept Inventory for College Students Phillip M. Hash Calvin College Grand Rapids, Michigan

25 Abstract The purpose of this study was to develop a music self-concept inventory (MSCI) for college students that is easy to administer and reflects the global nature of this construct. Students (N = 237) at a private college in the Midwest United States completed the initial survey, which contained 15 items rated on a five-point Likert scale. Three subscales determined by previous research included (a) support or recognition from others, (b) personal interest or desire, and (c) self-perception of music ability. Factor analysis indicated that two items did not fit the model and, therefore, were deleted. The final version of the MSCI contains 13 items and demonstrates strong internal consistency for the total scale (α = .94) and subscales (α = ). A second factor analysis supported the model and explained 63.6% of the variance. Validity was demonstrated through correlation between the MSCI and another measure of music self-perception (r = .94), MSCI scores and years of participation in music activities (r = .64), and interfactor correlations (r = ). This instrument will be useful to researchers examining music self-concept and to college instructors who want to measure this construct among their students.

26 Definitions Self-concept: Beliefs, hypotheses, and assumptions that the individual has about himself. It is the person's view of himself as conceived and organized from his inner vantage…[and] includes the person's ideas of the kind of person he is, the characteristics that he possesses, and his most important and striking traits. (Coopersmith & Feldman, 1974, p. 199) [Global – “Am I musical?”] Self-efficacy: One’s confidence in their ability to accomplish a task or succeed at a particular activity (Pajares & Schunk, 2001). [Specific – “Can I play the trumpet?”] Self-esteem: Positive or negative feelings a person holds in relation to their self-concept (Pajares & Schunk, 2001). [Evaluative - “Am I a good musician compared to my peers or a professional in a major symphony orchestra?”]

27 Self Concept Self-concept is hierarchical. Individuals hold both global self- perceptions and more specific perceptions about different areas or domains in their lives. The hierarchy can progressively narrow from beliefs regarding academic, social, emotional, or physical facets into more discreet types of self-concepts such as those related to specific academic areas (e.g., language arts, music), social relationships (e.g., family, peers), or physical traits (e.g., height, complexion) (Pajares & Schunk, 2001). Self-concept can influence achievement, motivation, self-regulation, perseverance, and participation (Reynolds, 1992).

28 Three- Factor Model of Music Self-Concept (Schmitt, 1979; Austin, 1990)
Support/ Recognition from Others Personal Interest/ Desire MUSIC SELF CONCEPT Perception of Music Ability

29 Purpose The purpose of this study was to develop a brief music self-concept inventory for college students that is easy to administer, tests the three-factor model defined by Austin (1990), and demonstrates acceptable internal reliability of α ≥ .80 (e.g., Carmines & Zeller, 1979; Krippendorff, 2004). Inventory will be useful to researchers and college professors who want to measure music self- concept among their students.

30 Initial Draft Fifteen statements in three equal subscales related to three-factor model. Five-point Likert Scale (1 = strongly disagree, 5 = strongly agree). Reliability - Total Scale: α = .95; Subscales: α = Factor analysis explained 63.1% of the variance. Bartlett’s test (χ2 = , p < .001) and the KMO measure (.94) indicated adequate sample size. Subject-to-variable ratio = 15.8:1. Items intended to constitute each factor loaded highest under their respective columns. “I am a capable singer or instrumentalist” & “I can be creative with music” were deleted due to cross-loadings.

31 Final Version Thirteen items.
Reliability - Total scale: α = .94; Subscales α = Factor analysis explained 63.6% of the variance. Influence of others (54.1%), interest (5.7%), and ability (3.7%). All but one of the rotated factor loadings exceeded .50. Eigenvalues = 7.37 (others), 1.17 (interest), and 0.84 (abilities). Bartlett’s test (χ2 = , p < .001) and the KMO measure (.94) indicated adequate sample size. Subject-to-variable ratio equaled 18.2:1.

32 Pattern Matrix for Principal Factor Analysis with Promax Rotation of the MSCI (final version)
Factors Item I Others II Interest III Abilities My family encouraged me to participate in music. .95 I have received praise or recognition for my musical abilities. .92 Teachers have told me I have musical potential. .85 My friends think I have musical talent. .60 .32 Other people like to make music with me. .52 .39 I like to sing or play music for my own enjoyment. Music is an important part of my life. .74 I want to improve my musical skills. .71 I enjoy singing or playing music in a group. .62 I like to sing or play music for other people. .56 I can hear subtle differences or changes in musical sounds. .84 I have a good sense of rhythm. .76 Learning new musical skills would be easy for me. .45

33 Conclusions The MSCI is an effective measure of music self-concept as described by (a) support or recognition from others, (b) personal interest or desire, and (c) perception of music ability (Austin, 1990). Use the MSCI to (a) assess change or development in music self-concept, (b) identify differences in various aspects of music self-concept, (c) compare music self-concept among different populations, or (d) examine relationships between music self-concept and other variables. Instructors of music courses for elementary classroom teachers or the general college population might use MSCI scores to (a) assess students’ attitudes towards their musical potential and accomplishment, (b) select leaders for small group work, or (c) identify students who might require extra support.

34 Conclusions The MSCI might be useful for students below the college level. The structure of music self-concept likely does not change between junior high school and college (Vispoel, 2003). Flesch-Kincaid Reading Ease score of 67.3. Average grade level reading score of 7.0 (Readability-Score.com). MSCI might be effective with students as young as middle school. Future studies should test the MSCI among people of varying ages and backgrounds, and examine additional variables, models, and theories that might explain this construct (e.g., Schnare, MacIntyre, & Doucett, 2012).

35 The Ratings and Reliability of the 2010 VBODA Concert Festivals
Phillip M. Hash Calvin College Grand Rapids, Michigan

36 Abstract The purpose of this study was to analyze the ratings and interrater reliability of concert festivals sponsored by the Virginia Band and Orchestra Directors Association (VBODA) in Research questions examined the distribution, reliability, and group differences of ratings by ensemble type (band vs. orchestra), age level (middle school vs. high school), and classification (1-6). The average final rating was 1.58 (SD = .66) and 91.5% (n = 901) of ensembles (N = 985) earned either a I/Superior or II/Excellent out of five possible ratings. Data indicated a high level of interrater reliability regardless of contest site, ensemble type, age level, or classification. Although final ratings differed significantly by age (middle school bands vs. high school bands), ensemble type (middle school bands vs. middle school orchestras), and classification (lower vs. higher), these results were probably due to performance quality rather than adjudicator bias, since interrater reliability remained consistent regardless of these variables. Findings from this study suggested a number of opportunities for increasing participation and revising contest procedures in festivals sponsored by the VBODA and other organizations.

37 Factors Influencing Contest Ratings/Inter Rater Reliability
Adj. experience (e.g., Brakel, 2006) Familiarity w/ repertoire (e.g., Kinney, 2009) Adjudication form (e.g., Norris & Borst, 2007) Length of contest day (e.g., Barnes & McCashin, 2005) Performance Time (e.g., Bergee & McWhirter, 2007) Size of judging panel (e.g., Bergee, 2007) Difficulty of repertoire (e.g., Baker, 2004) Size of Ensemble (e.g., Killian, 1998, 1999, 2000) Adjudicator Bias Special circumstances (Cassidy & Sims, 1991) Conductor race (VanWeelden & McGee, 2007) Conductor Expressivity (Morrison, Price, Geiger, & Cornacchio, 2009) Ensemble Label (Silvey, 2009) Grade Inflation (Boeckman, 2002) Event type Concert performance vs. sight-reading (Hash, in press) Research in laboratory settings rather than actual large-group contest adjudication Taped excerpts Undergraduate evaluators rather than contest judges

38 Method - Participants 985 middle school (n = 498) and high school (n = 487) bands (n = 596) and orchestras (n = 389) 13 VBODA districts/36 Contest Sites 144 Judges 108 concert-performance judges in 36 panels 36 sight-reading judges

39 Method-Analysis M, SD, frequency counts
Contest site Ensemble type (Band vs. Orchestra) Classification (I-V) Age group (MS vs. HS) Inter Rater Reliability (.80 = good reliability) % agreement (Pairwise & Combined) Cronbach’s alpha (α) Group Differences (Kruskal-Wallis ANOVA & post hoc Mann-Whitney U tests) Age Level (MS vs. HS) Classifications (I-V) Ensemble Type (Band vs. Orchestra)

40 Distribution & Reliability Significant Group Differences
Results Distribution & Reliability Average Final Rating = 1.58 (SD = .66) Final Ratings Distribution I/Superior = 50.6%, n = 498 II/Excellent = 40.9%, n = 403 III/Good = 8.0%, n = 79 IV/Fair = 0.5%, n = 5 V/Poor = 0.0%, n = 0 D Interrater Reliability Interrater Agreement (Pairwise) = .76 Interrater Agreement (combined) = .87 Cronbach’s alpha = .87 Significant Group Differences Age & Ensemble Type MS orchs > MS bands HS bands > MS bands Classification Class 6 bands > 54321 Class 5 bands > 321 Class 4 bands > 2 Class 6 orchs > 5421 Sight-Reading Rating HS sight-reading > MS sight- reading Sight-reading > concert- performance Weak correlation b/w grade level & rating (rs = -.16 to -. 32)

41

42

43 Discussion Preponderance of I & II ratings may not accurately differentiate b/w ensemble quality Consider new rating system e.g., spread current I & II ratings over three levels such as gold, silver, or bronze (Brakel, 2006) Only award I, II, III in order to increase validity of ratings Use rubric w/ descriptors (e.g., Norris & Borst, 2007) Consider adding a second sight-reading judge to compensate for possible grade inflation.

44 Discussion Only best bands participated in contests?
Consider promoting “comments only” option Perhaps some directors/administrators do not feel large-group festival affordable and/or worthwhile Create support for novice directors or those working in challenging situations Overall, 2010 VBODA contests demonstrate good statistical reliability, regardless of contest site, ensemble type, age level, or classification.

45 Future Research Additional studies should examine:
Correlation of grades given in individual categories and final ratings) Effect of various evaluation forms on reliability Quality and consistency of written comments Attitudes of directors, students and parents toward festivals Extent to which contests are supporting instrumental programs in light of current education reform and changing attitudes towards the arts Effectiveness of current rating and adjudication systems to determine if revisions in contest procedures are necessary. All organizations sponsoring contests should analyze ratings and reliability to insure consistent, fair results Music organizations and researchers should work together throughout this process to insure that school music festivals serve teachers and students as much as possible.

46 APA Format Headers Remember title page, page #s
Chapter title = Level 1 Others = Level 2 (Flush left) Remember title page, page #s Running Head ≤ 50 characters total. Goes in the header flush left Research Question after purpose statement & before need for study (Header?) Commas = …apples, oranges, and grapes. Citation before period and after end quotation mark. No page number needed unless you are citing a quote. Two spaces b/w sentences; 1 space b/w reference elements! Spacing: Double space, paragraph set to 0, no additional spaced added.

47 Chapter 3 – Data Analysis
Open your chapter 3 Data analysis What are you analyzing? (responses, test scores) Write procedures in bullet point, step-by-step form


Download ppt "Http://faculty.vassar.edu/lowry/VassarStats.html Class 5 Statistical Tests http://faculty.vassar.edu/lowry/VassarStats.html Class 5."

Similar presentations


Ads by Google