Beyond the Limits of Survey Experiments: How Conjoint Designs Advance Causal Inference in Political Communication Research

This paper calls attention to what is arguably the most notable advancement in survey experiments over the last decade: conjoint designs. The benefit of conjoint design is its capacity to study and compare the causal effects of several dimensions simultaneously. Although survey experiments have long been a preferred method for assessing causal effects, the method falls short when studying multidimensional causal relations. Researchers face a trade-off between a lack of statistical power or a restriction in experimental conditions. Conjoint designs solve this problem by letting the researcher vary an indefinite number of factors in one experiment. This method is quickly gaining ground in social and political science but has yet to be widely practiced in political communication research. This article argues that conjoint designs are ideal for studying political communication effects and highlights the possible benefits of using and innovating conjoint designs in political communication research. We make available sample scripts and demonstrate the value of this methodological technique through empirical examples of trust in news media and selective exposure to political news.

A prime example is survey experiments (Sniderman, 2011), now a preferred method for testing causal effects (Arceneaux, 2010). Survey experiments elegantly combine the internal validity of experiments with the external validity of representative surveys. Standard survey experiments, however, can vary only a small number of factors.
This limitation is pertinent for political communication research, as the study of political communication is to a large degree a study of multidimensional causal relations. Understanding when and how messages affect the preferences and choices of audiences, voters, political actors, and government officials means navigating a jungle of conditioning and countervailing effects. With standard survey experiments, the only way to cover ground is with persistence and perseverance, testing one isolated factor at a time. The problem with this is the high fixed cost of conducting even one survey, let alone tens or hundreds.
Our aim is to call attention to an alternative approach to this problem: conjoint designs. Conjoint designs let the researcher vary an indefinite number of factors in one experiment, so that researchers can include more factors and easily study multidimensional choices. The method solves key problems researchers face when studying multidimensional preferences with survey experiments: the trade-off between statistical power and the desire to employ many experimental conditions.
Although conjoint designs have become increasingly popular in the social sciences in the past several years, the method is (with notable exceptions : Helfer, 2016;Mummolo, 2016) yet to be widely practiced in political communication research. This is surprising, as this methodological advancement can help answer foundational questions in political communication that hinge on the opportunity to study multidimensional causal relations. As the benefits for social science research and general assumptions of causal inference with conjoint experiments are thoroughly and formally described elsewhere (Hainmueller, Hopkins, & Yamamoto, 2014), we focus on the application to political communication and demonstrate how the conjoint technique can be innovated and tailored to study phenomena that are specific to political communication research. We show that political communication requires different considerations than other fields, and that such considerations need to be addressed when designing conjoint analyses. We contribute with empirical demonstrations of the method and offer sample scripts researchers can use to analyze and innovate conjoint designs in their own surveys. We proceed by detailing the technique, demonstrating some of its potential benefits for political communication research, and suggest how the method can be applied in future political communication studies.

The Renaissance of Conjoint Design
Conjoint designs, also called vignette analysis or factorial surveys, were introduced in the 1970s in the fields of marketing research (Green & Rao, 1971) and sociology (Jasso & Rossi, 1977) but did not become popular in fields such as political science until recently. Due to the meticulous and imaginative work of Hainmueller and his colleagues, conjoint designs are now experiencing a renaissance. They established a solid methodological footing, including causal inference in conjoint analysis under the Neyman-Rubin model (Hainmueller et al., 2014;Neyman, 1923;Rubin, 1974), validating conjoint experiments by thoroughly comparing them with actual decisions made by voters in the real world (Hainmueller, Hangartner, & Yamamoto, 2015), and even systematically testing the effect of different design choices that are central to conjoint

The Added Benefits of the Conjoint Design
Compared to the traditional survey experiment, conjoint design's strengths lie in its capacity to include more factors and to study multidimensional choices. As an example, consider a study that identifies how certain attributes of a newspaper affect its credibility. One highly important factor could be the newspaper's distribution mode-that is, testing whether people trust offline newspapers more than their online counterparts. We then have two issues we need to overcome. First, the effect of the distribution mode is ambiguous. For instance, if we, following the aforementioned example, experimentally manipulate the distribution mode, we cannot know whether we have identified the effect of the newspaper format or simply that this effect is masking the effect of other factors, such as the newspaper's age or use of entertainment news. After all, newspapers with a traditional paper format were probably founded a long time ago, and online newspapers-at least in the Norwegian context-might be more oriented toward entertainment news than printed newspapers. Thus, the effect of the distribution mode may mask that people trust a source because of its legacy and content, rather than that the format as such has an effect on trust. To isolate the effect of distribution mode, we need to account for other factors that might mask the actual effects. One approach is to technically include these other factors but simply hold them constant at one value (e.g., include only old newspapers with no entertainment news). This we could do with a conventional survey experiment. Then we would know the effect of the distribution mode, but only for one particular case. However, conjoint designs add the possibility of identifying the effect of the distribution mode more generally (i.e., averaged over all possible combinations of related factors).
Second, respondents' judgment of credibility, like most other judgments and decisions, are, conceptually, multidimensional. To evaluate a newspaper's credibility, respondents would ideally need information about other relevant factors as well. Even if we isolate the average effect of the distribution mode on credibility, we cannot know how important this factor is compared to other relevant factors, such as the newspaper's amount of entertainment news, party affiliation, and ethical violations. Whatever distribution mode effect we find may seem substantive in isolation but in reality may be potentially insignificant compared to other factors. Conjoint designs solve this issue by enabling the researcher to identify the effect of the distribution mode and many other factors at the same time. Thus, we can assess the effect of one factor and compare this effect to the effects of various other factors.

Empirical Examples of How Conjoint Experiments Can Be Applied in Political Communication Research
We have explained the added benefits of conjoint designs in general. In this section, we present two very different applications of conjoint analysis that are specific for political communication research. We do not go into detailed analysis of the results here but use these examples to walk the reader through methodological choices that need to be made and how the results are analyzed. The code for replicating these exemplary analyses can be retrieved from the online supplemental material.

Beyond The Limits of Survey Experiments 261
The first study is an example of the traditional conjoint design and illustrates how the technique detailed by Hainmueller and colleagues (2014) can be beneficial for studying political communication phenomena. The second study illustrates how the logic of conjoint design can be innovated, extended to, and tailored for studying phenomena specific to political communication.

Data
We collected the data for both examples through the Norwegian Citizen Panel (NCP), a probability-based online survey panel in Norway. We fielded the experiments in the eight (March 6 to April 21, 2017) and ninth (May 11 to June 6, 2017) waves of the NCP. The respondents were invited through a postal recruitment of 25,000 Norwegian individuals, randomly sampled from Norway's National Registry-an official list of all residents of Norway (for details about response rates or other methodological matters, see Skjervheim andHøgestøl [2017a, 2017b]).

Example 1: The Traditional Choice-Based Conjoint Experiment
The first example illustrates the traditional choice-based conjoint design (Hainmueller et al., 2014). In these designs, respondents face a choice between two profiles. These profiles list a range of attributes in a table where the particular levels for each attribute in each profile are randomly assigned. In this case, the table with the two profiles contains information about two news publications (see Figure 1 for a screenshot of the design), and the choice task is to choose which news source is the most trustworthy.
Choosing the Attributes. When designing conjoint experiments, one must choose which, and how many, attributes to include in the experiment. Bansak and colleagues (2017) test how far researchers can push these limits in terms of the number of attributes included in such profiles and show that treatment effects are robust to a large number of tasks and attributes.
In the present design, we include eight different theoretically relevant attributes that we assume affect people's trust in a news source. The full list of attributes and attribute levels are shown on the Y-axis in Figure 2. This design enables an analysis of the effects of a publication's distribution mode (Kiousis, 2001) and reveals possible masking effects, and compares the distribution mode effects to the effects of other relevant attributes such as the amount of entertainment news (Ladd, 2012), and the age of the publication.
The design is a 3 × 2 × 3 × 3 × 3 × 3 × 3 × 10 factorial design, equaling more than 29,000 possible combinations. This means that only a fraction of the possible profile combinations is ever observed. However, as Hainmueller and colleagues (2014) show, we do not need to observe all possible combinations to identify the average marginal treatment effects of each component. These effects are identifiable under a set of assumptions that is likely to hold in a typical conjoint experiment: (a) that the respondent would make the same choice if presented with exactly the same profiles again, (b) that the ordering of profiles within a choice task does not affect the response, and (c) that the randomization of each attribute is either conditionally or completely independent of the other attributes (see Hainmueller et al., 2014, pp.8-9,13,16).
The inclusion of several attributes can result in absurd or impossible combinations (e.g., a 20-year-old medical doctor with 30 years of work experience) (Hainmueller et al., 2014,p.9). We may choose to keep these highly unlikely combinations, remove and replace them, or strive for a design that does not produce them. In the present design,

262
Erik Knudsen and Mikael Poul Johannesson we chose the latter option. If specific combinations are removed, certain measures must be taken in the analysis (see Hainmueller et al., 2014, p.20).
Choosing the Procedure. In order to achieve the required statistical power, researchers should aim for a large number of observations. When designing conjoint experiments, researchers must typically choose whether to field a study with a large sample (e.g., representative survey) and few choice tasks or a study with a small sample (e.g., lab experiment) and many choice tasks. Bansak and colleagues (2018) test how many choice tasks respondents can rate in a row before survey satisficing degrades response quality and show that treatment effects are robust to a large number of tasks in a row. Thus, the choice of procedure is often guided by cost-in larger time-sharing surveys (e.g., TESS) with many respondents, it might be cheaper to run one choice task with many respondents and vice versa in surveys where sample size is more expensive than survey space (e.g., Amazon Mechanical Turk).
As we fielded this experiment in a large time-sharing survey, the respondents evaluated one comparison between a pair of news publications, as shown in Figure 1. The study puts a random sample of 1955 participants in the NCP in the position of news consumers. We ask them to choose between two hypothetical online news publications.
We show respondents a screen with profiles of the two news publications (see Figure 1) with the following introduction: "We are interested in examining what makes people trust different sources of news. Below, we have created two hypothetical news sources. Please read the descriptions of both sources carefully and answer the question below." We then instruct the respondents to indicate "which of these two do you think would be the most reliable source to report the news in a fully accurate and fair manner?" Analyzing the Data. Analyzing a typical conjoint design is straightforward. Following Hainmueller and colleagues (2014), we wish to estimate the average marginal component

264
Erik Knudsen and Mikael Poul Johannesson effects (AMCEs): the marginal effect of one attribute averaged over the joint distribution of the other attributes. For example, the AMCE of readership (few versus many readers) represents the average effect of readership on the probability that the news publication will be chosen as reliable-that is, the average of the effect of readership across all possible combinations of the remaining attributes, weighted by the probability of getting each combination (and in this case, all combinations are equally likely). Each attribute level is compared to a different attribute level within the same attribute. The researcher just chooses a reference category. The AMCEs can conveniently be estimated without bias with a linear regression model (under assumptions a, b, and c) where we include an observation for every individual profile and regress the dependent variable (i.e., selected a profile or not) on all levels of each attribute (except the reference level for each attribute) (Hainmueller et al., 2014, pp.14-15). To get unbiased estimates of the variance, because respondents are given two profiles in each task, and often perform several choice tasks, the standard errors need to be corrected for with within-respondent clustering (e.g., using "cluster" in Stata; see Hainmueller et al., 2014, pp.16-17). Available statistical software libraries in R (e.g., the cjoint package by Strezhnev, Berwick, Hainmueller, Hopkins, and Yamamoto, 2017) or Stata, for instance, makes estimating and plotting the AMCEs straightforward as displayed in Figure 2. In Figure 2, dots indicate point estimates, bars illustrate 95% confidence intervals, and dots without bars are reference categories. Here we see that an "offline and online newspaper" is more trustworthy than an "online newspaper," thus demonstrating that the distribution mode effect is substantive in isolation and not masking the effects of age and entertainment news. We can also compare this effect to the effects of the other attributes and observe that the effect of "online newspapers" is statistically indistinguishable from the effect of primarily focusing on entertainment news.
Although previous studies isolated the effects of factors such as use of advertising and comment fields on people's trust evaluations, this example illustrates that conjoint experiments can provide insights into the relative effects of such factors and reveal the explanatory strength of different hypotheses identified by previous research. This means that we can study whether one or more factors are more important than others and overcome issues of masking effects.

Example 2: The Automated Sentence Generator in a Choice-Based Design
The second example uses a conjoint experimental design to study selective exposure: citizens preferring to encounter information that is consistent, rather than at odds, with their existing political attitudes (Knobloch-Westerwick, Mothes, & Polavin, 2017;Mummolo, 2016).
In experimental approaches to selective exposure, respondents are often required to make a choice and select one or more news stories over others. However, selective exposure in the real world involves multidimensional choices with many factors, such as the partisan reputation of the source (source cues; Mummolo, 2016), the pro or con message frame of an issue (message cues; Knobloch-Westerwick et al., 2017), the valence of the headline (negativity bias; Knobloch-Westerwick et al., 2017), and the political actors (e.g., a political candidate) mentioned in the headline (party cues; Iyengar, Hahn, Krosnick, & Walker, 2008). Although researchers have used the traditional choice based conjoint experiment to examine selective exposure (Mummolo, 2016), researchers have yet to study all of these cues simultaneously. In order to account for all these factors Beyond The Limits of Survey Experiments simultaneously, we introduce a new conjoint experiment template that is tailored for political communication research.
Research Design. In contrast to the profile tables shown in the first example, the stimuli in this example are presented as a list of headlines. Instead of randomly assigning attributes for profiles in a table, we randomly assign attributes for profiles in a headline. Using the logic of conjoint design, one can randomly vary a variety of information in a headline and subsequently analyze the relative importance of each component.
We use this approach to produce a script that constructs 756 headlines that vary on four attributes. Each headline has a partisan actor that signals a preference about a topic and is mentioned with a neutral, negative, or positive valence. We randomize the party of the actor (nine parties), the message topic (seven topics), the message direction (two preferences), and the valence of the mentioned actor (three valence categories). The seven message topics each have two unique "recipes" for where and how the remaining information is imputed (the attributes and attribute levels, except valence, are shown in the Y-axis in Figure 4). For instance, one of the two recipes for headlines about privatization of public services looks like this: [party]-politician receives [valence] for a new proposal: want to [preference] the Norwegian Railroad Service The following is an example of the headlines: Labor Party politician receives criticism for a new proposal: want to privatize the Norwegian Railroad Service Procedure. We asked 2,071 respondents in the NCP to closely read a selection of four randomly generated news headlines and decide which two headlines they would most likely choose to spend their time on, as displayed in Figure 4.
The headlines were introduced with the following vignette: "We wish to study people's news habits. Below you will find some hypothetical headlines, which we have constructed, similar to those you may find in Norwegian online newspapers. Please read all of the headlines carefully and imagine that the headlines are real," We followed this with, "You would perhaps not read any of these articles on a normal day, but let's say that you had to read two of these articles. Which articles would you prefer to spend your time on?" Analysis. The data include 8,284 observations of selection decisions. Because we force respondents to make a choice, we have information about which attributes respondents selected and which they did not. As with the first example, the analysis of the headline selections is straightforward. Given the assumptions mentioned earlier, we can estimate the average marginal treatment effect of the components in the headlines.
In the analysis of these headline selections, we focus on two so-called cues that can guide people's headline selection: message cues (i.e., people's preferences for political messages in line with their attitudes) and party cues (i.e., people's preference for news stories that feature a party or candidate they prefer). Figure 3a displays the AMCEs of all the headline attributes for all respondents on the probability of selecting a headline. However, we learn little about selective exposure from these results without matching these attributes with the respondents' attitudes and political preferences. In order to match the headlines' message cues with previous attitudes, we used measures of seven different statements that match the statements in the headlines,

266
Erik Knudsen and Mikael Poul Johannesson measured on a scale from 1 (strongly disagree) to 7 (strongly agree). For instance: "How much do you agree or disagree with the following statements?": "Commercial private schools should be allowed." These statements were then coded as "attitude consistent" and "attitude inconsistent." The party cues were matched with respondents' evaluations of each party, measured by asking respondents, "We would like to ask you to consider how much you like or dislike the various political parties in Norway" on a scale from 1 (intensely dislike) to 7 (intensely like). These measures were then matched with the attribute values in the headlines and coded as "likes party" or "dislikes party." Figure 3b shows the conditional AMCEs when the attributes of the headlines are matched with the attitudes of the respondents. We observe that party cues yield a clear effect, while the effects of message cues do not yield a statistically significant effect, suggesting that the effects of party cues are stronger than message cues. The fact that the effects of the matched AMCEs ( Figure 3a) are smaller than the AMCEs for message topics (Figure 3b) supports Mummolo's (2016) argument about the importance of topic relevance.
This example highlights the need and opportunity for modifications of conjoint designs to study issues that are specific to political communication research. Previous studies in social science optimized the conjoint technique to study people's political preferences (Hainmueller et al., 2014); however, this automated sentence generator displays headlines that are closer to what readers actually meet in their day-to-day media exposure and is easier to comprehend than the traditional choice-based design, when the objective is to compare people's selection of news headlines. Although conjoint experiments are often limited to a choice between two profiles, this approach also enables a design that more easily can include three or more profiles (i.e., headlines) in a choice task.
This selective exposure design illustrates how political communication requires different considerations than other fields, and that such considerations should be addressed when applying the method. For instance, Knobloch-Westerwick and colleagues (2017) show through a lab experiment why we should study the effects of different subtypes (i.e., confirmation bias, in-group bias, and negativity bias) of selective exposure simultaneously. They cannot separate the effects of each subtype because they do not use a conjoint experiment. Their argument would be strengthened by a tailored conjoint design that could enable a comparison of the effect of each cue. Crucially, our headline template demonstrates that party cues have a larger effect than message cues on people's propensity to engage in selective exposure.
We also argue that this example illustrates that political communication research is an ideal field for further innovating applications of the method. In the following section, we suggest a research agenda for how conjoint analysis can improve political communication research and how political communication can improve the method.  There are several research areas where conjoint experiments can further our understanding of multidimensional political communication effects. As we illustrated with two empirical examples, this method can be used to study whether one attribute is noticeably stronger than another and to solve issues of possible masking effects in causal inference. For that reason, conjoint experiments can help clarify ongoing debates in the political communication literature. In addition, we demonstrated that conjoint designs can be tailored and innovated to address issues that are specific to political communication, such as selective exposure. We suggest three possible future applications of the method. First, as illustrated with the first example, traditional conjoint designs can improve causal inference in research where one is interested in how a range of different characteristics of a phenomenon affects people's probability of trusting, selecting, or using another phenomenon (for instance, how politicians' characteristics [such as the way they communicate] shape people's trust in politicians) in a study that randomly varies certain communication styles or rhetorical techniques between two hypothetical politicians and asks respondents to compare and contrast them in terms of who they trust. Future research should seek to use conjoint experiments in such instances.
Second, as illustrated by the second example, researchers have the opportunity to innovate conjoint designs tailored to political communication research. For instance, scholars interested in the effects of different attributes of sentences or headlines can use the logic of conjoint experiments to gain knowledge about how different parts of a sentence affect people's choices or attitudes. For instance, one can employ a similar approach to study effects of a range of variations of message framing, such as decomposing possible multidimensional relationships of framing being an information effect rather than an emphasis effect (Leeper & Slothuus, 2017).
Third, as we have not illustrated or detailed here, conjoint designs are well-suited to study mediation effects and investigate whether the effects of the attributes in a conjoint design are conditional on specific attributes and whether the result is conditional on what attributes are included in the conjoint (e.g., Dafoe, Zhang, & Caughey, in press). Acharya, Blackwell, and Sen (2016) demonstrate and detail an approach for dealing with mediation in conjoint experiments by testing the effects of randomly including or excluding some specific attributes on the effects of the other attributes. For instance, we can test whether selective exposure effects in social media environments are contingent on the attributes of the person who shares a story with you.
Political communication scholars also have the opportunity to engage in methodological discussions and extend our knowledge of the limitations and external validity of the method. Hainmueller and colleagues (2015) validated conjoint designs in one particular case but we have yet to learn what the results from conjoint designs on political communication truly teach us about phenomenon in the real world (e.g., Barabas & Jerit, 2010).

Concluding Remarks
This article has highlighted how conjoint experiments can be used as a fruitful addition to political communication scholars' arsenal of research approaches. The traditional survey experiment has well-known restrictions regarding the number of factors we can study at any one time. Conjoint analysis addresses these issues by separately identifying several component-specific causal effects.
We believe that conjoint experiments can be employed considerably more than thus far in political communication research. In studies where researchers aim to study multidimensional causal relations and pit two or more hypotheses against one another, or where answers to scholarly debates hinge on the opportunity to overcome the survey experiment's constraints in number of experimental conditions, the conjoint experiment is a superior choice. Political communication scholars also have an opportunity to continue to innovate, enhance, and tune the conjoint design to better understand how political communication shapes modern political reality.