Idea representation and elaboration in design inspiration and fixation experiments

Abstract Design fixation experiments often report that participants exposed to an example solution generate fewer ideas than those who were not. This reduced ‘idea fluency’ is generally explained as participants’ creativity being constrained by the example they have seen. However, the inclusion of an example also introduces other factors that might affect idea fluency in the experiments. We here offer an additional explanation for these results: participants not exposed to the example tend to generate ideas with little elaboration, while the level of detail in the example encourages a similar level of elaboration among stimulated participants. Because idea elaboration is time consuming, non-stimulated participants record more ideas overall. We investigated this hypothesis by reanalyzing data from three different studies; in two of them we found that non-stimulated participants generated more ideas and more ideas containing only text, whilst stimulated participants generated ideas that were more elaborated. Based on the creativity literature, we provide several explanations for the differences in results found across studies. Our findings and explanations have implications for the interpretation of creativity experiments reported to date and for the design of future studies.


Introduction
For over twenty-five years, fixation has been a crucial research topic for those interested in design creativity and innovation. In design, fixation is known to limit the creative output of individuals and to reduce the chances of developing innovative products. Design fixation was first observed experimentally when students were asked to generate ideas in response to a design problem while being exposed to an example solution. The exposure to the example was found to constrain the students' idea generation and to block the successful completion of the problem (Jansson & Smith, 1991). Since then, design fixation has been the subject of many studies and has been characterized in many different ways.
Fixation studies typically require that participants generate multiple solutions to a given problem in a controlled environment and over a short time period. The participants are divided into experimental groups: those in a stimulated group are exposed to external stimuli while those in a baseline group are not. Then, the inspiration effects in these groups are compared through metrics that are similar to those used in creativity studies, such as quantity, quality, novelty, and variety of ideas (Shah, Smith, & Vargas-Hernandez, 2003), but several other metrics exist (Moreno, Blessing, Yang, Hernández, & Wood, 2016). The number of ideas generated (i.e., idea fluency) is the most used metric across studies, and also the one with the least variation in how it is operationalized: a simple count of ideas (although more elaborated guidelines exist, e.g., Linsey et al., 2011). Whereas exposure to external stimuli can bring benefits to idea generation (Dugosh & Paulus, 2005;Gonçalves, Cardoso, & Badke-Schaub, 2013;Liikkanen & Perttula, 2008;Nijstad, Stroebe, & Lodewijkx, 2002), some studies have found that examples might reduce idea fluency (Linsey et al., 2010;Sio, Kotovsky, & Cagan, 2015;Vasconcelos, Chen, Taysom, & Crilly, 2017;Viswanathan & Linsey, 2012, a result that is generally attributed to design fixation. Whilst fixation effects may explain a decrease in the number of ideas generated by stimulated groups, other factors may also influence this reduction in fluency. These other factors might be just as important in explaining the reduced fluency, or maybe even more important. In fixation experiments, one way to introduce the stimuli is to show them as examples for how the participants should or could present their ideas . Whilst stimulated groups are offered these examples, baseline groups might lack any reference for how they should represent and elaborate their ideas (apart from being asked to generate annotated sketches, it is not clear from the literature how much instruction participants are given regarding the level of elaboration that is required from them). Subsequently, the number of ideas generated is counted without much attention to the way in which ideas are represented (i.e., by using only text, only sketches, or both) and to how well elaborated those representations are. However, describing ideas at a conceptual level using only a few words is less time-consuming than generating more detailed representations that combine sketches and written explanations. This raises the possibility that the increased fluency previously observed in non-stimulated participants reflects how those participants constructed less elaborated representations to convey their ideas when compared to stimulated participants. This can be both in terms of the format (e.g., sketch or text) and the level of detail of the representation (e.g., number of drawings or words).
To investigate this hypothesis, we reanalyzed existing data from three design inspiration and fixation experiments Vasconcelos, Cardoso, Sääksjärvi, Chen, & Crilly, 2017;. We assessed the quantity of ideas that participants generated, and both the format and elaboration (or level of detail) with which those ideas were represented. Were these assessments to support our hypothesis, the findings from earlier fixation experiments should be reconsidered, and future experiments should be designed and executed differently. It would also strengthen the claim that there is a 'normative representation effect' (Vasconcelos, Neroni, & Crilly, 2016), in which participants exposed to an example solution tend to conform to the representation format of the example.

Metrics applied
Design fixation can be assessed in different ways. In Jansson and Smith's original study (1991), fixation was identified according to the number of ideas generated, the repetition of features from the example solution in these ideas, the number of categories of ideas (i.e., flexibility), and the originality of these ideas. Other metrics were later incorporated in the fixation framework, depending on the hypotheses and manipulations of the experiments. These include the ease of use of the solution, its perceived value and overall quality, whether the solution met the requirements in the brief, the number of analogies drawn, the participants' resistance to change, and others . These metrics can be relatively objective (e.g., the number of different categories of solutions) or more subjective (e.g., the perceived value of the solutions). The more objective metrics tend to be counted directly whilst the more subjective ones tend to require increased judgment on the part of the assessors. Also, fixation metrics can be more oriented towards the design process (e.g., explicit references to previous ideas during a verbal protocol), or more oriented towards the design output (e.g., rarity scores associated to the final ideas) (Vasconcelos, Cardoso et al., 2017). Other categorizations exist, such as the division of fixation metrics into those that are direct or explicitly measured (e.g., the repetition of features from an example), and those that are indirect or estimated through other indicators (e.g., the participants' perception of their own fixation) (Moreno et al., 2016). Despite the many ways in which a metric can be described, it is still possible to observe variation in how any given metric is calculated and interpreted between studies.
Of all the metrics used in output-oriented creativity experiments, idea fluency is the most common, and it is a well-established measure of creative design (Shah et al., 2003). Whilst previous research suggests that producing more ideas can increase the chances of generating better ideas (Kudrowitz, Te, & Wallace, 2012), there is also evidence that high fluency does not necessarily correlate with enhanced quality (Reinig & Briggs, 2008), or even that producing fewer solutions is the key to achieving better solutions (Heylighen, Deisz, & Verstijnen, 2007). In design fixation experiments, fluency is calculated in two different ways: the total number of ideas and the total number of non-redundant ideas. The second approach differentiates participants' ideas from the example solution, but also from the participant's other ideas. It is calculated by counting only the unique ideas represented by the participants, thus removing any ideas that the experimenters consider to have been repeated. Whilst widely used in the experiments to diagnose design fixation and having a fairly objective means of measurement, the fluency metric may still be affected by extraneous factors not accounted for during the manipulation.

Extraneous factors identified
Whereas the properties of the external stimuli used may be responsible for many of the effects observed in design fixation studies, there are several other factors that can influence participants' performance. These are, at least partially, independent from the participants' ability to generate multiple ideas while exposed to external stimuli. For instance, both the quality and quantity of ideas that participants generate can be affected by their level of interest in or commitment to the task (Youmans & Arciszewski, 2014). The quantity of ideas generated might also be influenced by the number of requirements contained in a design problem statement (Kumar & Mocko, 2016), and the content of these ideas may depend on how participants interpret the instructions given to them .
Moreover, diminished fluency can be further explained by external stimuli containing solutions that are naturally very likely to appear during idea generation, irrespective of exposure (Perttula & Liikkanen, 2006). Participants' performance is also likely to be influenced by their level of knowledge and expertise in the problem domain (Viswanathan & Linsey, 2012), and even their ability to communicate effectively (Wang & Kudrowitz, 2016). Furthermore, recent studies suggest that some psychological traits, such as individual differences in attention (Edl, Benedek, Papousek, Weiss, & Fink, 2014) and memory-related processes (Bellows, Higgins, & Youmans, 2013) can also affect participants' ability to generate multiple creative ideas. These are only a few of the many variables known to interact with the manipulation of external stimuli in design fixation studies. If our assumption is correct, and stimulated and non-stimulated participants tend to elaborate their ideas differently, this will interfere with creativity results -both in terms of the quantity and quality of ideas that participants generate -ultimately inviting reinterpretation of some prior work on design inspiration and fixation.

Research methodology
Our reanalysis of existing data is based on three separate studies, in which different but comparable experimental setups were adopted. The studies and results are discussed as follows: the experimental setup of Study 1 is described in full, whereas Studies 2 and 3 follow up on that with shorter descriptions, in which only the main differences from the previous study are highlighted. Whilst participants, demographics, procedures, and materials varied across the three experiments, the analysis reported here is fairly consistent and tests the same research hypothesis: non-stimulated participants tend to represent more ideas containing only text and exhibiting less elaboration when compared to participants in stimulated groups. By doing so, we are also investigating whether the normative representation effect (or the tendency of participants exposed to an example solution to conform to the representation format of the example) can be observed in other design contexts .

Participants
Fifty-five undergraduate students in their first year of an engineering degree at the University of Cambridge participated in the study. Participants voluntarily took part in this quick design exercise in their first lecture, and the analyzed data was later used to introduce them to the concept of design fixation. No demographic data was collected from the participants, but as first year undergraduate students they were broadly similar in age and design experience, drawn from a cohort with a malefemale ratio of approximately 3:1.

Procedure overview
The experiment took place in a large lecture theatre with all the participants present and seated next to each other. Participants were set a design problem to work on individually, and were verbally asked to be creative and to design as many ideas as possible. They were also informed, through both spoken and written instructions, that they should represent their ideas using pen and paper, with both sketches and writing. As such, instructions for sketching were clear but were not considered mandatory (because some ideas would be difficult to sketch). Participants were randomly allocated to two experimental groups: the baseline group (n = 28) received only the design problem and the stimulated group (n = 27) received the design problem and a representation of a potential solution to illustrate how ideas should be presented. Participants were given ten minutes to generate as many creative ideas as possible. Finally, the participants' ideas were anonymized, randomized, and assessed in terms of their quantity and elaboration to test the influence of exposure to an example on the ideas generated.

Materials and design task
All participants received the design problem (typed on an A4 sheet), as well as blank A4 sheets to record their own ideas. The design problem was presented as follows: Bicycles are a popular mode of transportation and recreation for many people. While growing up, a person might go through a series of ever-larger bikes, sometimes having several models, one after the other. However, having several bikes can be a problem for many reasons. Your task is to generate as many ideas as possible to eliminate the need to have multiple bikes as people grow up.
The example solution given to the stimulated participants was an annotated sketch of a bike (Figure 1). It was preceded with the introduction: 'Below is an example of how you should present your ideas (i.e., annotated sketches)' , and followed by the description: A modular bike with parts of various sizes that can be connected and swapped to fit people with very different heights. Apart from the socketing parts and expansible or contractible wheels, the angles between tubes can also be modified in specific joints.
It should be noted that the design task was intentionally open so that expandable bike ideas were only one way to solve the problem. In fact, participants often generated ideas that weren't bikes at all, such as somehow controlling the height of the population or using an entirely new transportation mode instead.

Data analysis
The evaluation of the participants' ideas was conducted by the first author and two research colleagues. The three evaluators had backgrounds in design research, mechanical engineering, and complexity science, and with experience in evaluating design projects. Initially, the three evaluators analyzed, collectively, the participants' design work in order to measure the quantity of ideas. 'One idea' was considered to be any way to solve the problem that could be understood, whether represented with a sketch, a written description, or both. Participants often generated more than one solution, but if the solutions had the same underlying mechanism and could all be incorporated within the same overarching idea without interference (e.g., within a single bike design), then they were considered as a single idea. Conversely, if there were two or more solutions for a given part of an idea (e.g., the frame or wheel of a bike), then they were considered to be distinct ideas. This first assessment or idea count was conducted by the three evaluators as a group, which reduced the chance of mistakes when identifying ideas. However, this also increased the chance of mutual influence among evaluators and meant that no inter-rater agreement could be calculated.
Next, two evaluators (with backgrounds in design research and experimental psychology) marked, individually, all ideas according to the elaboration or level of details of the representation, a metric that is also used in creativity tests (Guilford, 1956;Torrance, 1962). This was done on a discrete scale from 1 (lowest) to 5 (highest), and took into account the amount of text and drawings in the participants' idea sheets. For textual elaboration, the evaluators were instructed to consider the number of words, complete sentences, and scattered pieces of information in the sheets (completely unrelated information, such as jokes, were not considered). For pictorial elaboration, the evaluators were instructed to consider the number of idea elements that were drawn, different views of the same elements, and the perceived care taken when drawing (completely unrelated information, such as elements that could not be associated with the idea, were not considered). The evaluators were provided with reference ideas drawn from the participants' design work. These reference ideas were used to illustrate each point on the scale, and the evaluators thus agreed on a marking scheme that compared all participants' ideas to the example that was shown to the participants as a stimulus ( Figure 1 and the accompanying textual description), as follows: (1) ideas with either text or sketches with lower elaboration than the stimulus; (2) ideas with either text or sketches with equivalent (or higher) elaboration; (3) ideas with sketches and text, both with lower elaboration; (4) ideas with sketches and text, one with lower elaboration and one equivalent (or higher); (5) ideas with sketches and text, both with equivalent (or higher) elaboration.
The assessment of elaboration had an agreement of 81% and a linearly weighted Cohen's Kappa coefficient of .856, indicating an almost perfect agreement between evaluators (Landis & Koch, 1977). In summary, the metrics used in this experiment were idea fluency (or the number of ideas generated), the representation format (or the proportion of ideas represented only in text), and idea elaboration (or the level of details of representations).

Results and discussion
This section presents complementary analyses of experiments reported by , in which fixation was observed in terms of both idea fluency and idea repetition. The following results offer an additional or alternative interpretation of the fluency results previously obtained.

Idea fluency
A Student's t-test with the total number of ideas (per participant) as the dependent variable revealed a significant difference in the quantity of ideas generated between the two groups, t(37.4) = 3.35, p = .002, d = .895, with participants in the baseline group generating more ideas on average (M = 2.39, SD = 1.32) than the stimulated group (M = 1.48, SD = .580).
These results reveal that the idea fluency was influenced by the presence of an example solution, and that designing without exposure to the stimulus resulted in more ideas being generated, which can be interpreted as a beneficial isolation from examples (Vasconcelos, Cardoso et al., 2017). This finding is consistent with similar studies in which exposure to an example caused reduction in idea fluency (Linsey et al., 2010;Sio et al., 2015;Viswanathan & Linsey, 2012). However, other studies have also reported that exposure to an example solution had no effect on the number of ideas generated (Lujun, 2011;Moreno et al., 2014;Smith, Ward, & Schumacher, 1993), or even that an increase in fluency was observed Nijstad et al., 2002;Purcell & Gero, 1992). This tells us that there are many factors that can impact idea generation under the influence of external stimuli.

Representation format
A Chi-squared test with the frequencies of text ideas (per group) as the dependent variable revealed a significant difference in the proportions of such ideas, X 2 (1, N = 107) = 26.5, p = .000, φ = −.498, with participants in the baseline group generating proportionally more ideas containing only text. To test whether this result could simply be attributed to baseline participants generating ideas that could be represented using only text (e.g., services or policies), we also performed a similar analysis but this time only considering those ideas that were bikes, therefore more likely to be represented with sketches. A similar effect was found, X 2 (1, N = 76) = 12.4, p = .000, φ = −.404, with participants in the baseline group generating proportionally more ideas containing only text. To avoid exaggerating group differences driven by a few individuals, a Student's t-test with the average proportion of ideas represented only in text as the dependent variable also revealed a significant difference in the proportion of text ideas between the two groups t(29.6) = 4.79, p = .000, d = 1.28, with participants in the baseline group generating, on average, a higher proportion of ideas containing only text (M = .429, SD = .443) than the stimulated group (M = .019, SD = .096). Ideas presented only in sketches (with no accompanying text) were extremely rare and were not included in our analysis. Table 1 shows summary statistics for these results.
These results reveal that ideas containing only text were generated more frequently when participants designed without exposure to the example, even though all participants were instructed (textually and verbally) to produce annotated sketches. Conversely, those exposed to the example solution, represented nearly all their ideas using both representation formats, which indicates that the exposure to an example solution with sketches and text acted as a prompt that motivated stimulated participants to follow a similar representation format. This supports our hypothesis that non-stimulated participants would represent more ideas containing only text than those exposed to an example solution. Also, this finding is consistent with the engineering education literature, which states that engineering students are reluctant to sketch (Booth, Taborda, Ramani, & Reid, 2016) and might not understand the role of sketching in the design process (Schmidt, Hernandez, & Ruocco, 2012). This could explain why baseline participants tended to generate only half of their ideas with sketches: because text was their default representation format. Interestingly, whereas the baseline group generated more ideas overall, both groups had a similar number of ideas containing both sketches and text, t(39.0) = −1.17, p = .242, d = .317. Finally, only one idea was represented without text, which indicates how important and natural verbal descriptions are when communicating ideas.

Idea elaboration
A Student's t-test with the average elaboration score (per participant) as the dependent variable revealed a significant difference in the elaboration of ideas generated between the two groups, t(41.8) = −3.39, p = .002, d = .910, with participants in the stimulated group receiving higher scores on average (M = 3.67, SD = .720) than the baseline group (M = 2.70, SD = 1.33).
These results reveal that idea elaboration was lower when participants designed without exposure to the example, and that being exposed to such an example resulted in stimulated participants receiving higher elaboration scores. This finding is relevant to how design work is assessed in design creativity studies, in which ideas are judged with respect to their quality and originality. It is known that ideas are more likely to be perceived as creative when represented in high-quality sketches than when the sketches are of lower quality (Kudrowitz et al., 2012). It is also known that producing sketches while designing can increase the quality of the outcomes (Schütze, Sachse, & Römer, 2003). As such, participants exposed to a detailed mixed-format example might receive higher creativity scores because they sketch and describe ideas at a good level of detail. Alternatively, more elaborated ideas were also found to be correlated with lower originality scores (Dippo & Kudrowitz, 2015), and other metrics that depend on the total number of ideas generated (such as the repetition of features from the examples) are also likely to be negatively affected by increased idea elaboration. Figure 2 illustrates a selection of participants' ideas that were represented in different formats (sketch and text) and with different levels of elaboration (high and low). The examples provided in Figure 2 (as well as in Figures 4 and 8) are not representative of the design work of any particular experimental group, and a more qualitative analysis and comparison of the drawings from each group is outside the scope of this paper.

Summary and limitations (Study 1)
The results from Study 1 support our hypothesis, which predicted that non-stimulated participants would generate more ideas but with a lower level of elaboration when compared to those exposed to an example solution. However, this finding must be discussed with respect to a few limitations of the study.
Firstly, the idea generation session in the experiment was ten minutes long, which can be considered very short when compared to similar studies in which idea generation sessions lasted for thirty or sixty minutes . It is possible that longer sessions reduce the normative representation effect, especially if the end of the session is defined by participants exhausting their idea pool rather than reaching a time limit. Secondly, the participants in this study were first-year engineering students, and it is possible that different educational backgrounds will produce different results. For instance, it is argued that engineering education does not sufficiently encourage sketching and that this can be harmful to the design process (Schmidt et al., 2012;Ullman, Wood, & Craig, 1990). It is also argued that differences in design education might be responsible for industrial design students generating more ideas than mechanical engineering students when participating in fixation studies (Purcell & Gero, 1996). Thirdly, participants in this study were seated next to each other and this might have interfered with their creative process. It has been shown that group interference may hinder idea generation (Diehl & Stroebe, 1987) and the environment in this study allowed participants to interfere with each other's work more easily, even though they were asked to work individually. Lastly, in introducing the stimulus the participants were explicitly instructed to conform to a given representation format, i.e., '… an example of how you should present your ideas' . Even though the baseline group was also told to sketch and write, we understand that the stimulated group had additional directions to follow the representation format (and explicit instructions can have an effect on idea generation (Smith et al., 1993;). As such, stimulus exposure that does not tell participants how they should represent ideas seems more adequate for further experimentation.
To address the issues above, we have analyzed a second data-set from another experiment in which participants were given more time to generate ideas, they had different educational backgrounds, they were seated far from each other, and the example solution was introduced in a subtler way.

Participants
Thirty master students in industrial design engineering at Delft University of Technology took part in the study. Participation was voluntary and the students received a monetary reward in exchange for their time. They were broadly similar in age (M = 23.5 years) and design experience (M = 4.71 years), drawn from a cohort with a female-male ratio of approximately 3:2.

Procedure overview
Experimental sessions were conducted in a classroom with participants seated in every other seat to prevent interference with each other's work. Participants were randomly allocated to two experimental groups (baseline and stimulated), each the same size (n = 15). They were asked to represent their ideas with both sketches and writing and were given thirty minutes to generate as many ideas as possible.

Materials and design task
Studies 1 and 2 used the same design problem. However, both the introduction of the example and the example itself were slightly different (i.e., the examples were similar in elaboration but different with respect to their underlying concepts: modularity in Study 1 and adjustability in Study 2). The example (Figure 3) was preceded with: 'Here is a concept that illustrates one way to solve this problem' , and followed by the description: ' A telescoping bike with parts that can be extended or shortened to fit people with very different heights. Apart from the adjustable tubes and wheels, the angles between tubes can also be modified in specific joints' .

Data analysis
The ideas were assessed by the first two authors of this work, with backgrounds in design research and experimental psychology. Idea elaboration was calculated as in Study 1. The assessment had an agreement of 87% and a linearly weighted Cohen's Kappa coefficient of .825, indicating an almost perfect agreement between evaluators (Landis & Koch, 1977). Study 2 used the same metrics and assessment methods of Study 1.

Results and discussion
This section presents complementary analyses of experiments reported by Vasconcelos, Cardoso et al. (2017), in which fixation was observed in terms of idea diversity and, to some extent, idea fluency. Figure 3. the example solution provided to stimulated participants in study 2. the sketch used is a modification of the Zee-K ergonomic Bike (floss, 2010).

Idea fluency
A Student's t-test with the total number of ideas (per participant) as the dependent variable showed a non-significant difference in the quantity of ideas generated between the two groups, t(28) = 1.27, p = .216, d = .462. However, these non-significant results are likely to reflect the small size of the sample used because these means gave rise to a moderate effect size (.462).
Whilst the difference in idea fluency between baseline and stimulated participants here did not reach statistical significance (contrary to Study 1), again participants in the baseline group generated more ideas on average (M = 10.33, SD = 3.96) than those in the stimulated group (M = 8.47, SD = 4.12). Interestingly, participants in Study 2 generated almost five times more ideas than those in Study 1 on average, even though the time available for generation was only three times longer. It is possible that having more time for idea generation contributed to this result, however, research has shown that the idea generation rate decreases asymptotically towards a steady flow over time (Tsenn, Atilola, McAdams, & Linsey, 2014) and that this decline is more evident in the first forty minutes (Liikkanen, Björklund, Hämäläinen, & Koskinen, 2009). As such, we believe that the main factor that lead to such an increase was the change in the participants' academic background -it has been argued that industrial designers might be more resistant to fixation (Agogué, Poirel, Pineau, Houdé, & Cassotti, 2014). A less likely factor that might have contributed to lower fluencies in Study 1 when compared to Study 2, is the room conditions and the distance between participants.

Representation format
A Chi-squared test with the frequencies of text ideas (per group) as the dependent variable revealed a significant difference in the proportions of such ideas, X 2 (1, N = 282) = 11.2, p = .001, φ = −.199, with participants in the baseline group generating proportionally more ideas containing only text. Similarly, a Student's t-test with the average proportion of ideas represented only in text as the dependent variable revealed a significant difference in the proportion of such ideas between the two groups t(14.0) = 2.36, p = .033, d = .862, with participants in the baseline group generating, on average, a higher proportion of ideas containing only text (M = .061, SD = .100) than the stimulated group (M = .000). Ideas presented only in sketches were extremely rare and were not included in our analysis. Table 2 shows summary statistics for these results.
As in Study 1, these results show that ideas containing only text were generated more frequently when participants designed without exposure to the example (again despite the instructions to produce annotated sketches). Conversely, nearly all ideas generated by stimulated participants had both representation formats. These results are in line with those from Study 1, and provide support for our hypothesis. However, here very few ideas from baseline participants were represented only in text (8%), whereas this proportion was much higher in Study 1 (51%). This can reflect the increase in time available for idea generation or the change in participants' educational background (or both), but not the change in how the example solution was introduced. As previous research has suggested that experimental instructions can shape idea generation (Smith et al., 1993;, we expected that a subtler introduction of the stimulus (without any directions for how participants should represent ideas) would introduce more variability in how stimulated participants represented their ideas. However, in both studies these participants represented almost all ideas in both sketches and text.

Idea elaboration
A Student's t-test with the average elaboration score (per participant) as the dependent variable showed a marginally significant difference in the elaboration of ideas generated between the two groups, t(28) = −1.79, p = .084, d = .654, with participants in the stimulated group receiving higher elaboration scores on average (M = 3.43, SD = .391) than the baseline group (M = 3.18, SD = .388). As in Study 1, these results show that idea elaboration was lower (to some extent) when participants designed without exposure to the example, whereas stimulated participants tended to have higher elaboration scores. These findings are consistent with those in Study 1, but here the significance levels were lower. Whilst this can be partially explained by the small size of the sample, it also indicates that the difference in elaboration scores between groups in Study 1 was higher than that in Study 2. As with the representation format, we believe that this is a result of participants having an academic background that fosters elaborated sketching (Yang, You, & Chen, 2005) or participants being more experienced sketchers (Verstijnen, van Leeuwen, Goldschmidt, Hamel, & Hennessey, 1998a, 1988, while also having extra time to develop such sketches. Additionally, as elaboration scores were partially dependent on the modality of representation of the ideas (i.e., ideas represented in only one format tended to have lower scores), this result may be attributed to the small difference between groups in the number of ideas containing only one representation format. Still, the marginal effect found here is enough to influence the potential calculation of other inspiration metrics in this data-set, such as originality and feature repetition. Figure 4 illustrates a selection of participants' ideas that were represented in different formats (sketch and text) and with different levels of elaboration (low and high).

Summary and limitations (Study 2)
The results from Study 2 replicate most of what we found in Study 1, but to a lesser degree. Nonstimulated participants tended to generate more ideas and a higher proportion of their ideas were represented only in text. Conversely, stimulated participants tended to generate ideas that were more elaborated (although in general, here significance levels were lower). Again, this finding must be discussed with respect to a few limitations of the study.
Firstly, whilst the idea generation session in this study is typical of creativity experiments, it is nevertheless short when compared to many real design projects (Crilly, 2015), and it is still possible that longer sessions will reduce the normative representation effect (as has happened from Study 1 to Study 2). Secondly, whereas participants in Study 2 had a different academic background and the stimulus was introduced in a dissimilar way, the design problem (and the stimulus to some extent) was the same as in Study 1. As other design problems and examples are likely to inspire and fixate designers in different ways (Kumar & Mocko, 2016), maybe our findings are dependent on a specific problem-example pairing, and another pairing might produce different results. Thirdly, previous research has already emphasized how the assessment of ideas is a key component in inspiration and fixation research, capable of compromising the interpretation of existing studies . Here, it is possible that the assessment used for the elaboration of ideas was influenced by the representation format, i.e., ideas presented in both sketches and text tended to score higher than those presented in only text or sketch. Therefore, a more suitable assessment would consider the elaboration of each representation format individually. Lastly, with respect to experimental incentives, it should be noted that participants in Study 1 did not receive any monetary reward. Although there is some debate on the effect of rewards on creativity, with evidence to support either a negative (Amabile, 1983;Deci & Ryan, 1985) or a positive effect (Eisenberger & Rhoades, 2001;Groves, Sawyers, & Moran, 1987), it is possible that the reward given to participants in Study 2 affected how diligently they adhered to the experimental instructions.
To address the issues above, we analyzed a third data-set from another experiment in which participants were given even more time to generate ideas, a different design problem-example pairing was used, idea elaboration was assessed for sketches and text separately, and no compensation was offered for students' participation.

Participants
Fifty-eight master students in industrial design engineering at Delft University of Technology took part in the study. Participation was voluntary and did not involve any monetary reward. Students were broadly similar in age (M = 23.5), drawn from a cohort with a female-male ratio of approximately 3:2.

Procedure overview
Experimental sessions were conducted in classrooms with participants seated in every other seat to prevent interference with each other's work. Participants were randomly allocated to three groups: a baseline group (n = 18); a pictorially stimulated group (n = 20), which received a pictorial representation of an example solution; and a textually stimulated group (n = 20), which received a textual representation of the same solution. Participants in all groups were asked to sketch and use text if they thought it helped clarify their designs. Participants were given forty-five minutes to generate as many creative ideas as possible.

Materials and design task
The design problem was presented as follows: Since ancient times, transportation of people and goods has always been an essential human activity (…) Despite the rapid technological developments in the field of human transportation, it is still uncertain how this area will unfold in the future. Your task is to think about how human transportation will be like in 2050.
There were four additional requirements: the transport should be public (for at least ten people), appropriate for short urban journeys, comfortable, and safe. The pictorially stimulated participants (picture group) received a picture of a straddling bus (Figure 5(a)), whereas the textually stimulated participants (text group) received an equivalent textual description of the same example ( Figure 5(b)). The example solution was preceded with the introduction: 'You can choose whether you would consider (or not) this image/text when generating ideas' .

Data analysis
The ideas were assessed by the first two authors of this work, with backgrounds in design research and experimental psychology. Unlike studies 1 and 2, here idea elaboration was calculated for each representation format separately (i.e., sketches and text). This was done on a discrete scale from 1 (lowest) to 3 (highest). Again, the assessment consisted of comparing all participants' ideas to the example used, as follows: (1) ideas with a much lower elaboration; (2) ideas with a lower elaboration; (3) ideas with an equivalent or higher elaboration. Whenever an idea lacked one representation format, it automatically received a nil score (0) for such a format. The assessment had an agreement of 75% and a linearly weighted Cohen's Kappa coefficient of .674 for sketches, and 78% and .699 for text, indicating a substantial agreement between evaluators (Landis & Koch, 1977).
It is important to state that the choice of this last data-set allowed us to do more than just address the issues and limitations of the last two studies. More critically, this data-set holds information from groups stimulated with a single representation format, so if the results that we have obtained so far are explained by a normative representation effect , then the picture group should account for a higher proportion of the ideas containing only sketches, whereas the text group should account for a higher proportion of ideas containing only text. Accordingly, when comparing elaboration scores between the two groups, scores for sketches should be higher in the picture group and scores for text should be higher in the text group.

Results and discussion
This section presents complementary analyses of experiments reported by Cardoso et al. (2012), in which fixation was observed in terms of idea flexibility and repetition. To deal with non-normality in the data, we used Analysis of Variance (ANOVA) with significance values estimated using 1000 "Imagine a new concept for future public transportation where an electric-powered vehicle drives over traffic jams. Its design resembles a modern tram with a wide stretched cabin covering a two-lane motorway. This vehicle is a little wider than two contemporary motorcars placed side by side, and its length is about six cars in a row. Supported by extended 'legs' which run on rail tracks on both sides of the road, the vehicle's cabin is elevated above the cars on the motorway. Cars can drive under the vehicle when it is stopped on designated (elevated) passenger stations" (b) Figure 5. the example solution provided to stimulated participants in study 3. on the left (a), the picture provided to the pictorially stimulated participants (archDaily, 2010). on the right (b), the text provided to the textually stimulated participants .
bootstrap resamples and planned contrasts, a non-parametric version of the regular ANOVA test. The first contrast compared the baseline group to the stimulated groups (picture and text). The second contrast compared the picture group to the text group.

Idea fluency
A one-way ANOVA with the total number of ideas (per participant) as the dependent variable showed no significant differences across the groups, F(2, 55) = .155, p = .857, η 2 = .006. Similarly, planned contrasts did not reveal any significant differences between groups. These results are consistent with the ones found by Cardoso et al. (2012), but different from those obtained in Studies 1 and 2, as no general trend can be observed here (baseline M = 5.94, SD = 3.21; picture M = 5.95, SD = 1.43; text M = 5.55, SD = 2.84). Interestingly, participants in Study 3 generated fewer ideas than those in Study 2 on average, despite the increase in the time available for idea generation in Study 3. We believe that this is attributed to the change in the design problem along with the incorporation of several problem requirements, as previous research has shown that the number of requirements is negatively correlated with idea fluency (Kumar & Mocko, 2016). It is also intriguing, and perhaps concerning, to see how even a supposedly more objective metric such as fluency can vary according to how we interpret 'one idea' . The total number of ideas per group reported here differs from the one reported in a previous analysis , even though the same data-set is under consideration. This a result of slightly different interpretations of what 'one idea' is, either because seemingly different ideas can be counted as one (thus being merged into a single concept), or because some ideas could not be clearly understood (thus being removed from the analysis).

Representation format
A Chi-squared test with the frequencies of text ideas (per group) as the dependent variable showed no significant difference in the proportions of such ideas, X 2 (2, N = 337) = 2.16, p = .340, φ = .080. Similarly, a one-way ANOVA with the average proportion of ideas represented only in text (per participant) as the dependent variable showed no significant differences across groups F(2,55) = .186, p = .831, η 2 = .007, with all participants representing almost all ideas with both sketches and text. Finally, planned contrasts did not reveal any significant differences between groups. Ideas presented only in sketches were extremely rare and were not included in our analysis. Table 3 shows summary statistics for these results.
These results show that the generation of ideas containing only text was not common and was not different across experimental groups. This was true whether participants designed with or without exposure to the example, and whether the example was a picture or text. In fact, participants' ideas represented in only one format were extremely rare in this study. Such results differ from those obtained in Studies 1 and 2, and do not support our hypothesis. Also contrary to our expectations, the picture group did not generate a higher proportion of ideas containing only sketches than the other groups, nor did the text group with respect to ideas containing only text. Interestingly, whereas in the previous two studies participants were explicitly asked to represent their ideas with both sketches and writing, here participants were instructed to sketch, and then only to add writing if they thought it was necessary. Yet, less than 2% of all ideas in Study 3 had a single representation format (either text only or sketch only). Accordingly, it is more likely that these results should be attributed to participants having even more time to generate ideas, but also a possible characteristic of the design problem. Whilst the design Table 3. summary results for idea fluency and modality of representation in study 3 (proportions in brackets). problems used in both studies relate to the general issue of commuting in the future, 'public human transportation systems' and 'ways to eliminate the need to have multiple bikes' are problems that are different in many other respects.

Idea elaboration
A one-way ANOVA with the average elaboration scores (per participant) as the dependent variable showed no significant differences across the groups, either for pictorial elaboration, F(2, 55) = .154, p = .857, η 2 = .006, or textual elaboration, F(2, 55) = .186, p = . 831, η 2 = .007. Similarly, planned contrasts did not reveal any significant differences between groups. Figure 6 shows summary statistics for these results. Consistent with the previous representation format outcome, these results show that the level of detail in the participants' ideas did not differ according to whether participants designed with or without exposure to any example. Again, such results diverge from those obtained in Studies 1 and 2, and do not support our hypothesis. Also contrary to our expectations, the picture group did not score higher in pictorial elaboration than the other groups, nor did the text group with respect to textual elaboration. As discussed before, these results might be attributed to an individual or combined effect of participants having more time to generate ideas, and characteristics of the instructions given to participants and the design problem used. Additionally, results can be partially attributed to the very low number of ideas represented in only one format across groups, as the elaboration assessment penalized ideas with a single format, pulling elaboration scores down for groups in which participants did that more frequently. Finally, the present data does not support the idea that the lack of experimental incentives influenced results in Study 1. Here participants were also not compensated for their time; yet, they seem to have adhered to the experimental instructions we provided.
Whilst the manipulation had no influence on the elaboration scores, in line with previous research (Dippo & Kudrowitz, 2015), we observed a negative Pearson product-moment correlation between idea fluency and elaboration (i.e., the more ideas one generates, the lower the level of details of such ideas). This was true for both textual elaboration (r = −.634, p = .000) and pictorial elaboration (r = −.698, p = .000). Additionally, we found that both the pictorial and textual elaboration scores were correlated (i.e., those who add more textual details to ideas also tend to add more pictorial details and vice versa) (r = −.602, p = .000). These are strong correlations (Evans, 1996) and they include all participants in Study 3, irrespective of experimental group. Figure 7 shows scatter plots for these results. Figure 8 illustrates a selection of participants' ideas that were represented in different formats (sketch and text) and with different levels of elaboration (high and low).

Summary and limitations (Study 3)
Whilst Study 1 provided strong evidence to support our hypothesis, Study 2 produced weaker evidence and in Study 3 there was no evidence at all. We suggested possible explanations for this, all of  which drew on discussions in the design creativity literature. These explanations include a number of factors that regulate idea generation to some extent, comprising the time available for idea generation (Liikkanen et al., 2009;Tsenn et al., 2014), the academic background of participants (Agogué et al., 2014;Purcell & Gero, 1996), the instructions provided (Smith et al., 1993;, the design problem used (Kumar & Mocko, 2016), group interference (Diehl & Stroebe, 1987), and the assessment of the ideas .
Another potential influencing factor related to the participants' background is their language. Language proficiency shapes the capacity of individuals to communicate, and it can regulate their performance in academic contexts (Papadopoulos, 2014). It has also been found that English natives are more likely to score higher on text-based creativity tests (in English) when compared to non-natives (Wang & Kudrowitz, 2016). As such, it is conceivable that language proficiency played a role when participants had to choose how to represent and elaborate ideas. The three studies reported here required participants to describe their ideas in English, and this might have encouraged the generation of ideas containing only text in Study 1 (conducted in the United Kingdom) while exerting an opposite effect in Study 2 and 3 (conducted in the Netherlands). Similarly, this might have boosted overall textual elaboration in Study 1 when compared to the other two.
Generally, we believe that the increased fluency observed in non-stimulated participants does reflect how those participants often only minimally elaborate their ideas when representing them, whereas it is also clear that stimulated participants tended to produce more elaborated idea representations. Nevertheless, as observed, there are several reasonable explanations for the different results obtained in the three studies, but we suggest that two factors contributed the most to it. More exactly, differences between stimulated and non-stimulated participants tend to be more evident when participants are not used to sketching, and when the time available for idea generation is too constrained. Table  4 summarizes these and other experimental variables that collectively characterize the three studies.

Conclusions
Researching creativity in real-world design contexts is very difficult because the phenomena of interest might remain hidden from view, and many of the relevant variables cannot be easily manipulated. Alternatively, laboratory settings help reveal such phenomena and allow more control over extraneous variables. However, as laboratory studies move towards increased ecological validity, it becomes increasingly difficult to identify, manipulate, and block all variables involved in the design task. Whilst some of these extraneous factors are now known to design researchers, many others are yet to be discovered and may be compromising what we learn about creativity. In this paper we have investigated one such possible extraneous factor. We have analyzed how the exposure to an example solution can influence the representation of the ideas generated in response to a design problem, ultimately affecting the number of solutions. We have found that, under certain circumstances, non-stimulated participants tend to represent ideas with little elaboration and that many of these ideas contained only text. Conversely, exposure to an annotated sketch increased both the number and the proportion of annotated sketches in the ideas generated, as well as the elaboration of the representations of ideas. These increases in the modalities used and the details depicted were obtained at the expense of the number of ideas generated overall. This is likely because representing detailed ideas with both sketches and text is more time-consuming than describing ideas using only a few words. These results can also be explained by the normative representation effect. However, these results were not consistent across studies and more research is needed to identify the specific variables and conditions that supported our hypothesis in Study 1 and 2, but not in 3. In particular, future studies should use larger sample sizes, incorporate other academic backgrounds, and experiment with longer time-scales.
In fixation experiments, the introduction of the example needs to be somehow justified to the participants, otherwise they may wonder why they are being exposed to that extra material. A possible approach is to introduce the stimuli by explaining that those stimuli illustrate how participants should present their ideas. As such, researchers should be mindful of the normative effects that such a representation could have. Additionally, although participants can be told how they should represent their ideas (sometimes more loosely (e.g., Bleuzé, Ciocci, Detand, & De Baets, 2014), sometimes more exactly (e.g., Fu, Cagan, & Kotovsky, 2010)), participants still represent ideas in ways that do not comply with the instructions. As such, researchers should make sure that they are analyzing ideas that have been provided in a similar format, either by constantly reminding participants to represent their ideas in the required manner, providing a blank template to be completed, or by later removing ideas that do not fit the requirements.
Our results are relevant not only to how we currently interpret the reduced idea fluency of stimulated participants in fixation experiments, but also to how we judge idea quality more broadly as well. For instance, previous research has found that by reducing the number of ideas generated, too much textual elaboration can also affect participants' originality scores (Dippo & Kudrowitz, 2015). Also, whereas sketching may increase the quality of the design output (Schütze et al., 2003) (which would benefit stimulated participants), producing many ideas can also improve the overall quality of those ideas (Linsey et al., 2011) (which would benefit non-stimulated participants). As such, idea representation and elaboration can impact not only on idea fluency, but on many other ways in which the creative design process might be measured.
The fact that increased idea elaboration makes participants generate fewer ideas overall is also important to the development and implementation of idea generation methods and tools. These methods, such as brainstorming, often focus on generating a great number of ideas in a short time, with sessions lasting for no more than one hour. As such, idea elaboration at this stage should be minimized to prevent reducing idea fluency. In conclusion, the creativity research community should always be aware of the complexity of the design process and the opportunities for known and unknown extraneous factors to interfere with the experimental manipulation of that process and the results obtained. This will help us to design better controlled experiments that yield more reliable data, from which we could more confidently develop tools and methods to mitigate the effects of fixation and thus support creative design.

Disclosure statement
No potential conflict of interest was reported by the authors.