Experimental Design
More threats to internal validity… Instrumentation or measurement procedure If your way of measuring something changes over time, it could alter how your outcome is measured E.g. if how I ask the question “did you have sex” varies, responses may vary over time E.g. If grading papers, and my view of what a “good answer” changes as I read answers, this could change how I grade later versus earlier responses
Instrumentation? Avoid this by piloting your measure first and then sticking with it If you must change your measure then collect new data Keep track of changes in questions
Testing When testing or surveying people at baseline influences their responses E.g. giving someone a survey about sex could also operate as a way of teaching them about sex Giving someone a math test could also give them a chance to practice their math skills
Testing Avoid testing by Having a control group so that if you have a testing effect you can at least control for it Consider carefully the ways a survey or test may be a learning experience for respondents
Regression to the mean Extreme scores will be less extreme when tested again Very low scores will be less low Very high scores will be less high Why? If you have more extreme scores in one condition This could make it look like your intervention works or doesn’t work when they are less extreme at the second measurement
Avoiding regression to the mean If you have extreme scores use stratification or block randomization to make sure groups are equally balanced for scores Remember that simple randomization won’t always fix this, especially with a small sample
Placebo and demand characteristics When they think they are getting better, they feel better! We control for this using blinding—participants are blind to the group they are in But what about the researcher? Researchers may influence results by communicating expectations So where possible we use double-blind
Double blind When both the participant and the researcher are unaware of the treatment the participant receives This can be very difficult to achieve in psychology Why?
Confounding When a third variable accounts for the influence of your IV on your DV Many types of threats to internal validity end up functioning as confounds Placebo History Selection…etc. The point is that a confound is an unmeasured influence that is actually responsible for the effect
External validity? So for whom and under what circumstances is this treatment actually effective?
External validity To whom and under what conditions can results be generalized? A question of great practical and theoretical significance If your intervention only works under very specific conditions, is it really useful?
External validity--example A university clinic uses an intervention to treat depressed patients. --only patients diagnosed with depression alone --using graduate students who see only a few patients each week --each patient gets a 3 hour battery of tests plus an indepth diagnostic interview --graduate students who get weekly supervision to make sure they are maintaining the treatment approach --treatment is free --it takes place in a quiet clinic on an attractive university campus
Selection bias The whole sample is biased Not in a way that makes intervention different from control But in a way that makes them all different from the likely population E.g. most people with depression don’t ONLY have depression
Testing Testing won’t make the intervention and control groups different It may make their experiences different from those of people who get the treatment later E.g. a 3 hour battery of tests and interviews may itself be therapeutic under some circumstances
Reactive effects of experimental arrangements Most therapists don’t only see a few patients a week Most therapists don’t get weekly supervision to make sure they are maintaining protocol Most patients don’t get therapy in lovely quiet offices on pretty university campuses
How to avoid problems with external validity? Difficult The higher your internal validity—the more you control alllllll the factors that could muddy or influence your outcome The lower your external validity will be
Building external validity? Takes time You may have to “redo” the intervention several times, changing and varying and measuring the circumstances Start neat and tidy Then slowly add in and measure real-world messiness This can be costly and time consuming.
“pre experimental designs” Also called pilot studies Generally low in internal validity But a good place to start Cheaper Quicker You want to know you have something before you go to the trouble and expense of a full blown randomized experiment
Quasi-experimental designs Use experimental and control groups Do not use random assignment Why? May use “matching.” Matching on qualities of interest
Pilot studies Pretest-posttest No control group Just measure if your intervention scores change from baseline to post-test E.g. if I treated my depressed people with my intervention and just measured their improvement Why would I do this?
Pilot studies Post test only Ok for measuring an outcome E.g. the SAT could be considered a posttest only design Gives your achievement scores Gives no sense at all of what your achievement might have been before, how it changed, and what caused the change
Pilot Studies Static group design If I simply examined the outcomes of two different treatments I don’t control selection I don’t measure baseline Again, this can be a useful first step
Equivalent time sample design More succinctly known as single subject design May have one participant Design is: Baseline (no treatment) Treatment No treatment treatment
Single subject designs Quite common in behavioral research E.g. treatment for OCD Baseline-picking behavior Treatment—withhold “reinforcement”—behavior goes away Withdraw treatment (return to reinforcement) Behavior returns Treatment—again withhold reinforcement—behavior goes away
Single subject designs Are actually a very powerful experimental technique Commonly used in behavior analysis and treatment A good way to establish causality