- Apply the criteria of causality to experimental design
- Define internal validity and external validity
- Identify threats to validity
As we discussed at the beginning of this chapter, experimental design is commonly understood and informally implemented in everyday life. We often say that we are conducting an experiment when we try a new restaurant or date a new person. As you’ve learned from the last two sections, you must rigorously apply the various components of experimental design for something to be a true experiment, or even a quasi- or pre-experiment. If you wanted to trying a new restaurant to be a true experiment, you would need to recruit a large sample, randomly assign participants to control and experimental groups, pretest and posttest, and clearly and objectively use defined measures of restaurant satisfaction.
Social scientists use this level of rigor and control because they try to maximize the internal validity of their experiment. Internal validity is the confidence researchers have about whether their intervention produced variation in their dependent variable. Thus, experiments are attempts to establish causality between two variables—your treatment and its intended outcome. As we talked about in Chapter 7, nomothetic causal relationships must establish four criteria: covariation, plausibility, temporality, and nonspuriousness.
The logic and rigor of experimental designs allows for causal relationships to be established. Experimenters can assess covariation on the dependent variable through pre- and posttests. The use of experimental and control conditions ensures that some people receive the intervention and others do not, providing variation in the independent variable. Moreover, since the researcher controls when the intervention is administered, they can be assured that changes in the independent variable (the treatment) happened before changes the dependent variable (the outcome). In this way, experiments assure temporality. In our restaurant experiment, the assignment to experimental and control groups would show us that people varied in the restaurant they attended. The use of pre- and posttest measures would allow us to know whether their level of satisfaction changed, and our design would assure us that the changes in our diners’ satisfaction occurred after they left the restaurant.
Additionally, experimenters will have a plausible reason why their intervention would cause changes in the dependent variable. Either theory or previous empirical evidence should indicate the potential for a causal relationship. Perhaps we discover a national poll that found pizza, the type of food our experimental restaurant served, is the most popular food in America. Perhaps this restaurant has good reviews on Yelp or Google. This evidence would give us a plausible reason to establish our restaurant as causing satisfaction.
One of the most important features of experiments is that they allow researchers to eliminate spurious variables. True experiments are usually conducted under strictly controlled laboratory conditions. The intervention must be given to each person in the same way, with a minimal number of other variables that might cause their posttest scores to change. In our restaurant example, this level of control might prove difficult. We cannot control how many people are waiting for a table, whether participants saw someone famous there, or if there is bad weather. Any of these factors might cause a diner to be less satisfied with their meal. These spurious variables may cause changes in satisfaction that have nothing to do with the restaurant itself, which is an important problem in real-world research. For this reason, experiments use the laboratory environment try to control as many aspects of the research process as possible. Researchers in large experiments often employ clinicians or other research staff to help them. Researchers train their staff members exhaustively, provide pre-scripted responses to common questions, and control the physical environment of the lab so each person who participates receives the exact same treatment.
Experimental researchers also document their procedures so others can review how well they controlled for spurious variables. My favorite example of this concept is Bruce Alexander’s Rat Park (1981) experiments because it spoke to my practice as a substance abuse and mental health social worker.  Much of the early research conducted on addictive drugs like heroin and cocaine was conducted on non-human animals, usually mice or rats. While this may seem strange, the systems of our mammalian relatives are similar enough to humans that causal inferences can be made from animal studies to human studies. It is certainly unethical to deliberately cause humans to become addicted to cocaine and measure them for weeks in a laboratory, but it is currently more ethically acceptable to do so with animals. There are specific ethical processes for animal research, similar to an IRB review.
Before Alexander’s experiments, the scientific consensus was that cocaine and heroin were so addictive that rats would repeatedly consume the drug until they perished. Researchers claimed that this behavior in rats explained how addiction worked in humans, however Alexander was not so sure. He knew rats were social animals and the procedure from previous experiments did not allow them to socialize. Instead, rats were isolated in small cages with only food, water, and metal walls. To Alexander, social isolation was a spurious variable that was causing changes in addictive behavior that were not due to the drug itself. Alexander created an experiment of his own, in which rats were allowed to run freely in an interesting environment, socialize and mate with other rats, and of course, drink from a solution that contained an addictive drug. In this environment, rats did not become hopelessly addicted to drugs. In fact, they had little interest in the substance.
The results of Alexander’s experiment demonstrated to him that social isolation was more of a causal factor for addiction than the drug itself. This makes intuitive sense to me. If I were in solitary confinement for most of my life, the escape of an addictive drug would seem more tempting than if I were in my natural environment with friends, family, and activities. One challenge with Alexander’s findings is that subsequent researchers have had mixed success replicating his findings (e.g., Petrie, 1996; Solinas, Thiriet, El Rawas, Lardeux, & Jaber, 2009).  Replication involves conducting another researcher’s experiment in the same manner and seeing if it produces the same results. If the causal relationship is real, it should occur in all (or at least most) replications of the experiment.
One of the defining features of experiments is that they diligently report their procedures, which allows for easier replication. Recently, researchers at the Reproducibility Project have caused a significant controversy in social science fields like psychology (Open Science Collaboration, 2015).  In one study, researchers attempted reproduce the results of 100 experiments published in major psychology journals between 2008 and the present. Their findings were shocking: Only 36% of the studies had reproducible results. Despite close coordination with the original researchers, the Reproducibility Project found that nearly two-thirds of psychology experiments published in respected journals were not reproducible. The implications of the Reproducibility Project are staggering, and social scientists are developing new ways to ensure researchers do not cherry-pick data or change their hypotheses simply to get published.
Returning to our discussion of Alexander’s Rat Park study: Consider the implications of his experiment for a substance abuse professional like me. The conclusions he drew from experimenting on rats were meant to generalize to the population of people with substance use disorders with whom I worked. Experiments seek to establish external validity, or the degree to which their conclusions generalize to larger populations and different situations. Alexander contends that his conclusions about addiction and social isolation help us understand why people living in deprived, isolated environments are more likely to become addicted to drugs when compared to people living in more enriching environments. Similarly, earlier rat researchers contended that their results showed these drugs to be instantly addictive, often to the point of death.
Neither study will match up perfectly with real life. In my practice, I met many individuals who may have fit into Alexander’s social isolation model, but social isolation is complex for humans. My clients lived in environments with other sociable humans, worked jobs, and had romantic relationships, so it may be difficult to consider them “isolated.” On the other hand, many faced structural racism, poverty, trauma, and other challenges that may contribute to social isolation. Alexander’s work helped me understand part of my clients’ experiences, but the explanation was incomplete. The real world was much more complicated than the experimental conditions in Rat Park, just as humans are more complex than rats.
Social workers are especially attentive to how social context shapes social life. We are likely to point that experiments are rather artificial. How often do real-world social interactions occur in the lab? Experiments that are conducted in community settings may be less subject to artificiality, though their conditions are less easily controlled. This relationship demonstrates the tension between internal and external validity. The more researchers tightly control the environment to ensure internal validity, the less they can claim external validity and that their results are applicable to different populations and circumstances. Correspondingly, researchers whose settings are just like the real world will be less able to ensure internal validity, as there are many factors that could pollute the research process. This is not to suggest that experimental research cannot have external validity, but experimental researchers must always be aware that external validity problems can occur and be forthcoming in their reports of findings about this potential weakness.
Threats to validity
Internal validity and external validity are conceptually linked. Internal validity refers to the degree to which the intervention causes its intended outcomes, and external validity refers to how well that relationship applies to different groups and circumstances. There are a number of factors that may influence a study’s validity. You might consider these threats to all be spurious variables, as we discussed at the beginning of this section. Each threat proposes another factor that is changing the relationship between intervention and outcome. The threats introduce error and bias into the experiment.
Throughout this chapter, we reviewed the importance of experimental and control groups. These groups must be comparable for the experimental design to work. Comparable groups are groups that are similar across factors important for the study. Researchers can help establish comparable groups by using probability sampling, random assignment, or matching techniques. Control or comparison groups pose a counterfactual consideration— what would have happened to my experimental group had I not given them my intervention? Two very different groups would not allow you to answer that question. Intuitively, we know that no two people are the same, so groups are ever perfectly comparable. Importantly, we must ensure that groups are comparable along the variables that are relevant to our research project.
If one of the groups in our restaurant example had numerous vegetarians or gluten-free individuals, their satisfaction with the restaurant might be influenced by their dietary needs. In that case, our groups would not be comparable. Researchers also account for these effects by measuring other variables like dietary preference, and by statistically controlling for their effects after the data are collected. We discussed control variables like these in Chapter 7. Similarly, if we were to pick people that we thought would “really like” our restaurant and assign them to the experimental group, we would be introducing selection bias into our sample. Experimenters use random assignment so that conscious and unconscious bias do not influence the group to which a participant is assigned.
Experimenters themselves are often the source of threats to validity. They may choose measures that do not accurately measure participants or implement the measure in a way that biases participant responses in one direction or another. The act of simply conducting an experiment may cause researchers to influence participants to perform differently. Experiments are different from participants’ normal routines, so the novelty of a research environment or experimental treatment may cause them to expect to feel differently, independent of the actual intervention. You have likely heard of the placebo effect, in which a participant feels better, despite having received no intervention at all.
Researchers may also introduce error by expecting participants in each group to behave differently. They may expect the experimental group to feel better and the researchers may give off conscious or unconscious cues to participants that influence their outcomes. Control groups will be expected to fare worse, and research staff could cue participants that they should feel worse than they otherwise would. For this reason, researchers often use double-blind designs where research staff that interact with participants are unaware of who is in the control group and who is in the experimental group. Proper training and supervision are also necessary to account for these and other threats to validity. If proper supervision is not applied, research staff administering the control group may try to equalize treatment or engage in a rivalry with research staff administering the experimental group (Engel & Schutt, 2016). 
No matter how tightly the researcher controls the experiment, participants are humans and are therefore curious, problem-solving creatures. Participants who learn they are in the control group may react by trying to outperform the experimental group or by becoming demoralized. In either case, their outcomes in the study would be different had they been unaware of their group assignment. Participants in the experimental group may begin to behave differently or share insights from the intervention with individuals in the control group. Whether through social learning or conversation, participants in the control group may receive parts of the intervention of which they were supposed to be unaware. As a result, experimenters try to keep experimental and control groups as separate as possible. This is significantly easier inside a laboratory study, as the researchers control access and timing at the facility. This problem is more complicated in agency-based research. If your intervention is effective, then your experimental group participants may impact the control group by behaving differently and sharing the insights they’ve learned with their peers. Agency-based researchers may locate experimental and control conditions at separate offices with separate treatment staff to minimize the interaction between their participants.
- Experimental design provides researchers with the ability to best establish causality between their variables.
- Experiments provide strong internal validity but may have trouble achieving external validity.
- Experimental designs should be reproducible by future researchers.
- Threats to validity come from both experimenter and participant reactivity.
Comparable groups– groups that are similar across factors important for the study
Double-blind– when the researchers who interact with participants are unaware of who is in the control group or the experimental group
External validity– the degree to which experimental conclusions generalize to larger populations and different situations
Internal validity– the confidence researchers have about whether their intervention produced variation in their dependent variable
Placebo effect– when a participant feels better, despite having received no intervention at all
Replication– conducting another researcher’s experiment in the same manner and seeing if it produces the same results
Selection bias– when a researcher consciously or unconsciously influences assignment into experimental and control groups
One of Juno’s solar panels before illumination test by NASA/Jack Pfaller public domain
- Alexander, B. (2010). Addiction: The view from rat park. Retrieved from: http://www.brucekalexander.com/articles-speeches/rat-park/148-addiction-the-view-from-rat-park ↵
- Petrie, B. F. (1996). Environment is not the most important variable in determining oral morphine consumption in Wistar rats. Psychological reports, 78(2), 391-400.; Solinas, M., Thiriet, N., El Rawas, R., Lardeux, V., & Jaber, M. (2009). Environmental enrichment during early stages of life reduces the behavioral, neurochemical, and molecular effects of cocaine. Neuropsychopharmacology, 34(5), 1102. ↵
- Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. ↵
- Engel, R. J. & Schutt, R. K. (2016). The practice of research in social work (4th ed.). Washington, DC: SAGE Publishing. ↵