Basics of Experimental design (Notes Week 5) Sampling (Notes Week 6) Exam 2 topics This exam is worth 10% of your grade. Items will be very similar to exam #1 3/17/16 Basics of Experimental design (Notes Week 5) Sampling (Notes Week 6) Quasi-experimental designs (Quasi-experimental notes) Descriptive Research (Descriptive Research notes) Surveys (Surveys notes) Some basic statistics Dr. David J. McKirnan, University of Illinois at Chicago, Psychology; mckirnanuic@gmail.com
Larger question the research addresses Experimental Design Phenomenon Larger question the research addresses Theory Explanatory processes & how they are related Hypothesis Concrete variables Specific prediction Methods / Data Operational definitions Study procedures Results Hypothesis-wise analysis of outcomes Discussion & Conclusion Relate results back to theory Study limitations & Future studies What needs explaining? Why is it important? What assumptions are you making; what does not need to be explained? How / why does it work? What Hypothetical Constructs or basic processes are involved? How are they related? NOT just a statement of the hypothesis. Specific prediction? Must clearly relate to / express the theory. Concrete evidence or data? Operational definitions Procedures Sample (population you will generalize to). You all know this, but just to remind you… What was the outcome? Was the hypothesis supported? How do you know? What do the results mean for the theory or question? What is still unanswered?
Basics of Design: Internal Validity Internal Validity core concept: Can we validly determine what is causing the results of the experiment? Default Hypothesis: the experimental outcome (value of the Dependent Variable) is caused only by the Independent Variable. Confound: a “3rd variable” (unmeasured variable other than the Independent Variable) actually led to the results. Design requirements: Appropriate control group Equivalent experimental & control groups Be able to easily articulate and use the concept of internal validity. It should be obvious to you by now. Key distinction: Simple random error (such as a biased sample, poorly thought through measurement, incompetent procedures…) may threaten external validity (the data may not generalize), but not internal validity, because it affects all participants / groups equallly. Non-Random Error that affects one group only (or one more than the other) is a confound since it may lead to group differences rather than the independent Variable.
Key threats to internal validity I may ask you to define one or two of these (slide 12 of the lecture). Lack of a control group Group Intervention or event Observe2 Observe1 Confound Maturation: Participants may be older / wiser by the post-test History; Cultural or historical events may occur between pre- and post-test that change the participants Mortality: Participants may non-randomly drop out of the study Regression to baseline: Participants who are more extreme at baseline look less extreme over time as a statistical confound. Reactive Measurement: Participants may change their scores due to being measured twice, not the experimental manipulation.
How well can we generalize to: Generalizability External Validity: Can we validly generalize from this experiment to the larger world? How well does your research sample represent the larger population. What might be an example of each “arm” of external validity? The larger population Does your measure of the Dep. Var. reflect how the process works outside of the lab? If you are studying, e.g., happiness, are you really (validly) assessing it? Other outcomes (dependent variables) Other settings External Validity How well can we generalize to: How representative (or realistic) is the physical and cultural setting of the research study? Other conditions (independent variables) How well does your experimental manipulation represent how that variable works in nature? Is it artificial? To extreme? Have you administered the correct “dose”...
How well can we generalize to: Generalizability External Validity: Can we validly generalize from this experiment to the larger world? How well does your research sample represent the larger population. Understand issues in Probability v. non-probability sampling. Be able to give an example of each, The larger population Does your measure of the Dep. Var. reflect how the process works outside of the lab? If you are studying, e.g., happiness, are you really (validly) assessing it? Other outcomes (dependent variables) External Validity How well can we generalize to: Be able to define: Face Validity Content Validity Predictive Validity Construct Validity
Social desirability responding Some key terms: Replicate Repeat the study exactly “Converging” study Address the same hypothesis with a different form of study Blind Participant does not know what group s/he is in Double blind Neither participant nor Researcher knows what group the Participant is in Social desirability responding Responding in a way that makes you look good Particularly common with face valid survey items Reactive measures People change their behavior if they know they are being assessed Operational Definition The specific, “hands on” operations you use to measure or manipulate a variable. Hypothetical Construct Abstract psychological / social process that: Cannot be observed directly Underlies behaviors than can be observed Key element of a theory You might have to define some of these….
Mammals Humans Who do you want to generalize to? All Western people Sampling Who do you want to generalize to? Mammals Humans All Western people All Americans Young Americans College students UIC Students This class Specificity (and ease) of sampling frame. Generally increases internal validity. Breadth of population to sample from (i.e., size of sampling frame). Represents increasing external validity.
Non-probability sampling targeted / multi-frame snowball quota, etc. Most externally valid Involves some systematic and random form of selection. Assumes: Clear sampling frame Population is available Less externally valid for hidden groups. Probability sampling simple multi-stage stratified Be able to talk about basic sampling questions in YOUR course research project. Understand this basic distinction (especially in terms of external validity). Non-probability sampling targeted / multi-frame snowball quota, etc. Less externally valid High “convenience” Best when: No clear sampling frame Hidden / avoidant population.
I may ask you to choose one from each category and describe them…! Sampling Probability sampling simple multi-stage Cluster / stratified I may ask you to choose one from each category and describe them…! Non-probability sampling Haphazard Modal instance Venue – time / space Multi-frame Snowball / Respondent driven Heterogeneity
What is the criterion for ‘membership’ in your sample? Sampling What is the criterion for ‘membership’ in your sample? What are your inclusion criteria? Exclusion criteria? Be able to use these terms in a sentence. Demographic Gender, zip code, ethnicity… Behavioral Voting history, spending patterns… Self-identification Identification as, e.g., ‘conservative’ vs. ‘liberal’ The criteria used to define the group – key part of your sampling frame - will determine who specifically gets sampled.
Quasi-experiments 1. Study naturally occurring events that could not be brought into a lab or a true experiment. Measurement studies of, e.g., disasters, historical events Retrospective designs; using archival data Understand these two basic forms of quasi- experiments. Be able to provide an example of each. Be able to discuss the internal v. external validity trade-off for each. 2. Evaluate existing groups or program(s), or issues where groups cannot be equivalent at baseline In Quasi-experiments the researcher has less control over all aspects of the research, since it is occurring in the field / in nature. This lessens internal validity (less control) …and increases external validity (generalizes better to the “real world”...)
Naturally occurring event or social change Four general types of quasi-experiments Group Naturally occurring event or social change Observe Study the aftermath or consequences of an unanticipated, naturally occurring event.. No control or comparison group is possible; e.g., natural disaster. Group Intervention or event Observe2 Observe1 Study the effect of an event where measurement is already taking place... No comparison group is possible; e.g., introduction of a new school-wide curriculum. Group1 Group2 Observe1 Intervention or event (No baseline) Contrast group Study a naturally occurring, unanticipated event. A comparison group is possible, but the groups cannot be equivalent (…compare two cities after a major cultural event.) Group Observe1 Observe2 Intervention or event Contrast group Equivalent of a true experiment or randomized controlled trial, except that the groups cannot be made equivalent (…e.g., intervention group participants cannot be blind...).
Naturally occurring event or social change Naturally occurring events Group Naturally occurring event or social change Observe No control over who is exposed to the event Possible control over who is selected for the research sample May compromise both internal and external validity. Event not controlled / manipulated. Not a true Independent Variable Often no control group Significant threat to internal validity May or may not have control over measures. Archival measures – medical records, climate data, crime reports – may not assess exactly what the study needs. Survey or other post- hoc measures can address hypotheses. Heuristic value of field studies: generating hypotheses for later experimental study, confirm controlled / experimental data in a “real world” (field) setting.
One group pre- post-test Observe1 Intervention or event Observe2 Selected or convenience sample. Baseline Assessment May or may not have control over measures (e.g., surveys v. archival measures). Outcome Assessment Typically controllable, but may be archival. Event or intervention May or may not be controllable by researcher, e.g., policy change. Uses: Educational & social environments Political or health policy change Not feasible to have a control group System-wide intervention / social change (school, public health campaign..)
Key design feature: no control group. One group pre- post-test Group Observe1 Intervention or event Observe2 Key design feature: no control group. Confound Observe2 Observe1 Threats to internal validity (confounds): Historical / cultural events occur between baseline & follow-up. History Individual maturation or growth occurs between baseline & follow-up. Maturation People respond to being measured or being a measured a second time. Reactive measures Extreme scores at baseline “regress” to a more moderate level over time. Statistical regression People leave the experiment non-randomly (i.e., for reasons that may affect the results…). Mortality / drop-out
One group pre- post-test Observe1 Intervention or event Observe2 Confound Observe2 Observe1 I will ask you about these; be able to define them. Threats to internal validity (confounds): Historical / cultural events occur between baseline & follow-up. History Individual maturation or growth occurs between baseline & follow-up. Maturation People respond to being measured or being a measured a second time. Reactive measures Extreme scores at baseline “regress” to a more moderate level over time. Statistical regression People leave the experiment non-randomly (i.e., for reasons that may affect the results…). Mortality / drop-out
Quasi-experiments Study interventions or programs where groups cannot be equivalent. Group Observe1 Intervention or event Observe2 Group Observe1 Contrast group Observe2 Groups are not equivalent at baseline, due to.. Self-selection Non-random assignment Use of existing groups Participants not blind Intervention & Assessments are typically controlled or designed by the researcher; Research in psychotherapy or behavioral interventions. Health-related research… The same as a true experimental design, except for non-equivalent groups Baseline observation: Measure change, Compare groups prior to the intervention I will ask you about these for forms of non-equivalence.
True v. quasi-experimental designs Take a good look at this slide True experiments: Quasi-experiments: Emphasize internal validity Assess cause & effect (in relatively artificial environment) Test clear, a priori hypotheses Emphasize external validity Describe “real” / naturally occurring events May have clear or only exploratory hypotheses Participants assigned to experimental v. control groups Random or matching Participants & experimenter Blind to assignment Existing or non-equivalent groups Self-selection Non-random assignment Use of existing groups Participants not blind High Control over study procedures Create / manipulate independent variable Control procedures & measures Control often not possible May not be able to manipulate the independent variable Partial control of procedures & measures Control group not possible?
Understand time series data, e.g.: Be able to identify when / where we may use this design. Understand these key elements. Understand time series data, e.g.:
Also understand … This design uses a Blocking Variable to examine > 1 group before & after some event. Understand clearly what a blocking variable is.
Basic overview of a behavior Descriptive research Simple but important: “Descriptive” or survey research does not just count stuff. They can test hypotheses. Basic overview of a behavior “Who, what, where & when” … In depth (qualitative) portrayal of the behavior Generate hypotheses Use qualitative or quantitative descriptions to begin asking “why?” or “how?” a behavior occurs. Develop hypotheses about how to change a behavior…
Qualitative or Observational Existing data Forms of Descriptive research Quantitative Qualitative or Observational Existing data Describe an issue via valid & reliable numerical measures Simple: frequency counts of key behavior “Blocking” by other variables Correlational research: “what relates to what” Complex modeling Study behavior “in nature” (high ecological validity). Qualitative In-depth interviews Focus (or other) groups Textual analysis Qualitative quantitative Observational Direct Unobtrusive Use existing data for new quantitative (or qualitative) analyses Accretion Study “remnants” of behavior Wholly non-reactive Archival Use existing data to test new hypothesis Typically non-reactive What are these approaches? What would be an example? When might you use them?
Testing hypotheses with simple correlations: Procedures: Descriptive research: Correlational designs Testing hypotheses with simple correlations: Procedures: Careful selection of sample to reflect target population Systematic development of measurements: Core virtues: “Natural” look at how variables relate Less control = less reactivity than experimental designs Can model very complex phenomena Reliability Validity
Descriptive research: Correlational designs 1. Causality; a simple correlation may confuse cause & effect: which causes which?. ? Alcohol consumption Depression 2. Confounds!; unmeasured 3rd variable problem: is something else causing both our measured variables? Understand both these problems in inferring causality from a simple correlation. I will ask you about them!! Genetic disposition ? Depression Alcohol use
Descriptive research: Observational studies Assess behavior directly rather than by participants’ self-reports or recall Typical data collection is highly reactive: participants know they are being studied, and react to that. Often by making socially desirable responses: faking to look good. Observational methods are often less (or non-) reactive. Directly observe the social & physical settings or environments of behavior. Direct observation; visual observation & note taking or recording Unobtrusive observation; participants unaware of data collection Major advantage: Eliminate reactive effects of data collection Participant observation; become part of social phenomenon to describe it e.g., sitting in on classroom discussion, therapy session, etc. e.g., 1-way mirror & therapy research, “stake out” drug scene e.g., joining political organization or cult, posing as prostitute (c.f.; Hunter S. Thomson Hells Angels; NY Times Down Low article here).
Participants react to the knowledge that they are being measured. Descriptive research Reactive measurement Participants react to the knowledge that they are being measured. Threatens External Validity if all participants are giving skewed answers due to the the measures. Threatens Internal Validity – is a confound - if participants in one group have more reactive measures than the other group. Reactive bias increases with.. Clarity (face validity) of measures Face-to-face interview methods Often lessened with computer interviews Understand how something like reactive measurement Can just add error (junk variance) to the data, or can… Represent a confound. Also get: Error variance only = threat to external validity Confound = threat to internal validity
Accretion; Study remnants of behavior Descriptive research; existing data Accretion; Study remnants of behavior Data wholly unobtrusive / non-reactive Indirect; may only partially map onto phenomenon. Archival; data collected for other purposes Often in highly reliable, large & rich data sets Provide unbiased correlations, but most be adapted to new purpose or hypothesis (may not “map on” fully..). Generally understand what these methods are and why we use them. Look at the lecture notes for examples.
Issues in survey design Surveys All this info. is in the Survey focus module. These are the key concepts in surveys; be able to give an example of them or say why they are important. knowledge Attitudes & Beliefs Behavior Core topic areas Descriptive data Hypothesis testing Pragmatic; planning & evaluation Uses of surveys Closed v. open-ended Face-to-face (interview, telephone) Questionnaire (pencil & paper, internet). Survey formats Access to the target population Social desirability Time frame of question Question order Bias, political uses of surveys Participant sophistication Issues in survey design
What do surveys measure? Knowledge Information re: current events, political or consumer choices Awareness of Public health resources, health practices, etc. Attitudes and Beliefs Preferences or evaluations: e.g., attitudes toward gays, ethnic groups, etc., consumer preferences. Beliefs about political or social events: “which party provides the strongest security for the U.S….?” Feelings or moods: quality of life, depression / anxiety, marital satisfaction, etc. Behavior Behavioral intentions; Intent to vote, financial plans, etc. Self-reports of previous or on-going behavior; topics range from voting to alcohol and drug use. What might be some examples of these?
Chief virtue: clear operationalization Closed-ended items Chief virtue: clear operationalization Very specific & concrete; know exactly what participant is responding to Easy to quantify & use statistically Can be tested for reliability Chief liability: potential insensitivity Often brief, simply worded; potentially superficial “Top down”; issues or constructs are imposed on participant Understand: Closed v. open-ended items, Examples of each Advantages & disadvantages of each
General textual / qualitative response; Open-ended items General textual / qualitative response; More sensitive to the respondent How have you enjoyed your methods class so far? Please list the three things that first come to mind when you think of Psychology 242. Virtues and drawbacks of qualitative or “open ended” items? What might be an example? More difficult to interpret Can be analyzed as qualitative data (see discussion in Descriptive data.) Can be quantified; frequency counts of citations or statements “linkages” analysis of co-occurring statements Often presented as textual portrayal plus minor quantitative analysis.
Example of mixed survey formats Closed-ended attitude scale Open-ended description Simple behavioral index. Example of mixed question format from survey of women’s sexual practices. Personal Safer Sex Guidelines How strict are your personal guidelines or rules for safer sex (e.g., condom use, “safe relationships,” etc.)? 1 2 3 4 5 6 7 Not at all Somewhat Very Extremely strict strict strict strict What are your rules for safer sex? Have you ever refused to have sex 0 1 2 3 with someone in order to stay safe? Never once or a few many twice times times ✓ ✓ Be able to describe the main advantages & disadvantages of each item format. ✓
Social Desirability Responding Clear face-valid items addressing embarrassing topics yield less valid responses How often are you dishonest with your friends? Have you ever cheated on an exam....? High social desirability wording elicits inaccurate responses… Do you support protecting our Nation’s forests for future generations? (Does “yes” make you an “environmentalist”?). Do you feel there are ways your husband could be closer...? (Does “yes” make you are unhappy in your marriage”?). Populations differ in social desirability responding; confound in analyses of population effects Women report more suicidal thoughts, but may be more willing to disclose those thought, creating a possible confound (are women more suicidal, or more open…) Desirability can be minimized by: Anonymous surveys Assurances of confidentiality Computer administration (no personal interaction) Careful wording / pilot testing of items Understand the basic issues here
Summary Surveys Survey administration Internet increasingly important as self-report method Face-to-face interviews more common in clinical research Time frames & question order can influence responses Population access & sophistication Some groups are difficult to reach Creates threat to External validity Assumption that participants understand survey materials often questionable. Social desirability responding Inhibited responding threatens Internal Validity May represent a confound if groups differ in desirability set. Understand the highlighted terms
Evaluating our measures: Reliability and Validity. If we are assessing a stable characteristic (IQ, personality, temperament, core values…) a good measure will give about the same result Reliability each time we administer it and for different sections of the measure. Our survey or scale must actually measure what we designed it to. There are several ways we think about validity, each getting at a different element… Validity Be familiar enough with the different variants on reliability and validity that you can define or give an example of them
Test - retest; similar responses over time? Reliability Test - retest; similar responses over time? Assume stable attribute; e.g., “personality” disposition If measure is reliable, should show similar scores across time, e.g., at baseline and after a year. Split-half; similar responses across item sets? Assume redundant / converging items or scales. If scale is reliable, each half should yield similar scores. Chronbach’s alpha; overall internal reliability Converging items should inter-correlate.
Descriptive research: Validity Face validity Scale appears to measure what it is designed to E.g., interview item; “How dependent are you on heroin?” Simple skill index; assess computer skills by writing program Intuitively valid; clearly addresses topic May yield socially desirable responses. Content validity Assesses all key components of a topic or construct: e.g., the various components of complex political attitudes… Mid-term; test all core skills for research design… Predictive validity Validly predicts a hypothesized outcome: e.g., I.Q. is a moderately good predictor of college success, criminality, etc. A measure may be predictive valid without being face or content valid: the MMPI. Know / understand this.
Descriptive research: Validity (2) Construct validity Test whether the hypothetical construct itself is valid (differs from other constructs, corresponds to measures or outcomes it should..). E.g.; “anxiety” and “depression” and “anger” may not be separate constructs, but may all be part of “negative affectivity”. Test if the Measure addresses the construct it was designed for e.g., measures of social support (“do you have people who care for you”) often strongly influenced by depression, a separate construct… “Ecological” validity Measure corresponds to how the construct “works” in the real world External validity of assessment device.