 Internal Validity  Construct Validity  External Validity * In the context of a research study, i.e., not measurement validity.

 Generally relevant only to studies with causal relationships. ◦ Temporal precedence ◦ Correlation ◦ No plausible alternative  Key question: can the outcome be attributed to causes other than the designed interventions ◦ If so, it is likely that internal validity needs to be tightened up

 Threats to Internal Validity ◦ Single Group Threats ◦ Multiple Group Threats ◦ Social threats to internal validity

Image an educational program where two different testing regimens are used. In one, an intervention and then a post-test is used. In the second, a pre- test, intervention and post-test is used. What are the single group threats for this design?

 Single Group Threats ◦ History (something happened at the same time) ◦ Maturation (something would have happened at the same time) ◦ Testing (testing itself induced an effect) ◦ Instrumentation (changes in the testing) ◦ Mortality (attrition in study participants) ◦ Regression (regression to the mean)

 Suppose for the previous study we had multiple groups instead of single groups?  Multiple Group Threats are variations on the Single Group Threat with selection bias added. If the added second group is a control, for instance, it must be selected in a way that makes it fully comparable to the first group (random assignment).  If participants cannot be randomly assigned, then we get quasi-experimental design.

 Applicable to social sciences (because people do not react simply to stimuli) ◦ Diffusion (people in treatment groups talk to one another) ◦ Compensatory rivalry (treatments groups know what is happening and develop a rivalry) ◦ Resentful demoralization (same as above, but with an opposite sign) ◦ Compensatory equalization (researchers or others equalize groups).

 Are the results valid for other persons in other places and at other times? ◦ Do they generalize?  Types of generalization  Threats to external validity

 Generalizations ◦ Sampling Model: try to make certain that your study groups are a random sample of the population you wish your generalization to extend to. ◦ “Proximal Similarity”: measure or stratify the sample on the things you cannot randomize.

 Threats to external validity ◦ People ◦ Places ◦ Times

 An assessment of how well ideas or theories are translated into actual programs.  Mapping of concrete activities into theoretical constructs.

 Formal articulations: ◦ Nomological network (Cronbach and Meehl, 1955): researchers were to establish a theoretical network of what to measure, empirical frameworks of what to measure and the linkages between the two. ◦ Multitrait-Multimethod Matrix (Campbell and Fiske, 1959): Convergent concepts should show higher correlations divergent concepts lower correlations. ◦ Pattern matching (Trochim, 1985): Linking a theoretical pattern with an operational pattern.

 Threats to Construct Validity ◦ Poorly defined constructs ◦ Mono-operation bias: The construct is larger than the single program / treatment you devised. ◦ Mono-method bias: the construct is larger than the limited set of measurements you devised. ◦ Test and treatment interaction: measurement changes the treatment group ◦ Other threats generally fall under “labeling” threats: a construct is essentially a metaphor, and if not precisely articulated differing meanings can be held by different persons.

 Social Threats to Construct Validity ◦ Hypothesis guessing: participants guess at the purpose of your study and attempt to game it. ◦ Evaluation apprehension: if apprehension causes participants to do poorly (or to pose as doing well) then the apprehension becomes a confounding factor. ◦ Researcher expectancies: Researcher expectancies confound the outcome.  Hawthorne effect: people change behavior when observed  Rosenthal effect: researcher expectations can change outcomes even when subjects are uninformed.

 Authors see methodology as intellectual infrastructure.  Believe that rapid change in CS produces outdated methodology.  Three key claims: ◦ Workloads used need to be appropriate ◦ Experimental design needs to be appropriate ◦ Analysis needs to be rigorous

 For this paper, the authors focus on Java ◦ Modern language additions (type safety, memory management, secure execution) have been added to Java ◦ Authors believe that these additions make previous benchmarks untenable:  Tradeoffs due to garbage collection where heap size is a control variable  Non-determinism due to adaptive optimization and sampling technologies  System warm-up from dynamic class loading and just-in- time compilation

 Authors created a suite (DaCapo) of benchmark tools suitable for research. The suite consists of open source applications.  DaCapo validates diversity a variety of tests and then applying PCA.  Authors point to “cherry picking” research by Perez, showing that dropping diversity of measures increases ambiguous and incorrect conclusions.

 The authors in their results show four ways to evaluate garbage collection. Any specific measure can be “gamed” to produce a desired result.  Classic comparison of Fortran / C / C++: control for host platform and language runtime.  New comparisons: control for host platform, language runtime, heap size, nondeterminism and warm-up.

 To obtain meaningful data from noisy estimates, data must be collected and aggregated.  Current practices sometimes lack statistical rigor.  Presenting all the results from the suite (as opposed to one number) will reduce “cherry picking”.

 Internal Validity  Construct Validity  External Validity * In the context of a research study, i.e., not measurement validity.

Similar presentations

Presentation on theme: " Internal Validity  Construct Validity  External Validity * In the context of a research study, i.e., not measurement validity."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

 Internal Validity  Construct Validity  External Validity * In the context of a research study, i.e., not measurement validity.

Similar presentations

Presentation on theme: " Internal Validity  Construct Validity  External Validity * In the context of a research study, i.e., not measurement validity."— Presentation transcript:

Similar presentations

About project

Feedback