Activity preferences in psychotherapy: what do patients want and how does this relate to outcomes and alliance?

ABSTRACT This study aimed to investigate (a) what clients’ within-treatment activity preferences were; (b) whether a match between preferences and psychotherapy approach predicted outcomes and alliance; (c) whether scores on preference dimensions, per se, predicted outcomes and alliance. Participants were 470 clients engaging in one of five approaches with trainee psychotherapists. We used the Cooper–Norcross Inventory of Preferences to identify clients’ within-treatment activity preferences; and multilevel modelling to examine the relationship between these preferences – and a match on these preferences – to outcomes and alliance. Clients had an overall preference for therapist directiveness and emotional intensity. We found no evidence of a preference matching effect. Clients who expressed a desire for focused challenge over warm support showed greater progress. Client preferences for focused challenge may be indicative of their readiness to change and indicate a positive prognosis. Further research should directly observe therapeutic practices and assess a range of client variables.

the macro-level intervention approaches that clients might have preferences for, such as psychoanalysis or cognitive behavior therapy.
Whereas most research on activity preferences has focused on preferences regarding the format of intervention (e.g., individual versus group therapy, Renjilian et al., 2001), our research concerned the styles or methods that clients would like their psychotherapists to use. Measures developed to support the investigation of such within-treatment activity preferences include the Psychotherapy Preferences and Experiences Questionnaire (Sandell et al., 2011), the Preference for College Counselling Inventory (Hatchett, 2015), and the Cooper-Norcross Inventory of Preferences (C-NIP) (Cooper & Norcross, 2016;Cooper et al., 2019).
Surveys of the general public using the C-NIP -an 18-item inventory with four preference subscales: Therapist Directiveness vs. Client Directiveness (TD-CD), Emotional Intensity vs. Emotional Reserve (EI-ER), Past Orientation vs. Present Orientation (PaO-PrO), and Warm Support vs. Focused Challenge (WS-FC) -have suggested wide variations in within-treatment activity preferences (Cooper & Norcross, 2016;Cooper et al., 2019). However, on average, respondents indicated a preference for Therapist Directiveness over Client Directiveness: with a focus on goals and the acquisition of practical skills. They also indicated, on average, a desire for Emotional Intensity over Emotional Reserve; and no strong preferences on the two other C-NIP subscales. As this research was conducted with members of the general public, however, it is not clear whether these results would generalize to a clinical context. Hence, the first aim of our study (Question 1) was to identify the within-treatment activity preferences of clients at the commencement of therapy. We hypothesized that, as with members of the general public, there would be mean preferences for Therapist Directiveness and Emotional Intensity; but no mean preferences on the two other C-NIP scales: PaO-PrO and WS-FC. At a more exploratory level, we were also interested in whether clients' activity preferences would vary by demographic characteristics: age, gender, ethnicity, disability status, and sexuality.
Question 2 for this study, central to the practice of personalization, was whether matching to clients' within-treatment activity preferences would be associated with improved outcomes and alliance. Matching has been hypothesized to have a positive effect on the grounds that clients, at least to some extent, "have a fairly good sense of what they like and what works and does not work for them" (Mcleod, 2012, p. 26). That is, an intervention that matches a clients' preference should provide them with more of what is helpful and less of what is unhelpful (Norcross & Cooper, 2021). However, previous results have been mixed, and the question remains unsettled, suggesting it may need refinement. In support of this hypothesis, three high-quality meta-analyses have indicated that preference assessment and accommodation is associated with improved outcomes (Lindhiem et al., 2014;Swift et al., 2019), a stronger therapeutic alliance (Windle et al., 2019), and reduced dropout (Lindhiem et al., 2014;Swift et al., 2019;Windle et al., 2019). Effects did not vary significantly across the type of preference (Swift et al., 2019). However, the majority of studies in these meta-analyses have focused on between-treatment preferences, such as psychotherapy versus medication, rather than activity preferences at the within-treatment level. In the few studies that have examined these more microlevel preferences, no significant effects were found (e.g., Kerns et al., 2014;Kludt & Perlmuter, 1999). In addition, the meta-analyses did not distinguish the effects of treatment matching (matching effects) from the effects, on clients, of feeling that they had a choice (choice effects), and/or that the psychotherapist was striving to accommodate their preferences (alliance effects) (Mcleod, 2012;Norcross & Cooper, 2021). Hence, even if an overall positive effect exists for preference assessment and accommodation, it may not mean that matching, per se, is associated with improved outcomes. For this study, the more refined specific hypothesis that we aimed to test was that matching on clients' within-treatment activity preferences would be associated with improved outcomes and alliance.
At an exploratory level, we were also interested in whether clients' within-treatment activity preferences, themselves, might be associated with outcomes (Question 3). For instance, do clients who express a preference for warm support, as opposed to focused challenge, show improved progress in psychotherapy, regardless of what therapy they receive? There is no prior literature on this question, and we did not have specific hypotheses here. However, as client factors have been considered the single strongest contributor to outcomes (Bohart & Wade, 2013;Wampold & Imel, 2015), we did wonder if clients' pre-treatment preferences might contribute to outcomes in some way.

Clients
Clients were recruited at a low-cost counselling and psychotherapy clinic in an urban region of the UK. The service was hosted by a psychotherapy training institution and served as a placement for the institution's trainees. The clients were self-referred, generally following a recommendation by a friend, relative, doctor, or other community provider. Exclusion criteria for the service were severe and enduring mental health problems (such as psychotic disorders or personality disorders), dependent drug or alcohol use as the primary problem, and learning difficulties.
In total, 712 episodes of psychotherapy were initiated during our specified time period, May 1, 2016 to August 31, 2018 (an "episode" was defined as a period of ongoing therapy for a client, with no break of longer than three months). As shown in Figure 1, 74 episodes were excluded because individuals did not consent to take part in the research at assessment or subsequently withdrew their consent. We then excluded 31 episodes in which individuals were attending psychotherapy for a second or third time within our inclusion period. Of the remaining 607 individuals assessed for psychotherapy, 21 were excluded because they were not referred for psychotherapy at assessment, and 34 were excluded because they did not start psychotherapy following a referral. We then excluded 62 clients because C-NIP data were not available. A further 12 clients were excluded because of technical problems with their data processing, and 8 because of data entry mistakes (duplicates). Where clients were reallocated to a second psychotherapist, we used only data from the first episode of treatment.
Our final dataset, therefore, consisted of 470 clients who had complete scores on at least one of the C-NIP subscales, with outcome scores on at least one measure at a minimum of two time points. This sample was predominantly female (67%), White (61.1%), and heterosexual (59.1%), with a mean age of 38.0 years old (Table 1). Overall, 8.3% of participants identified themselves as disabled. Where this was specified, it was most commonly chronic pain, sometimes resulting in limited mobility. Approximately two-thirds of the clients were in the clinical range for depression.

Trainee psychotherapists
The interventions were delivered by 179 trainee psychotherapists, the majority of whom were female (n = 140, 78.2%), with a mean age of 42.3 years old (SD = 9.1). The median number of clients per therapist was two, with a range of 1-8 clients per therapist.
The trainees were enrolled in Master's or doctoral level programs in one of five different psychotherapy approaches: Gestalt (n = 7), humanistic (n = 50), integrative (n = 31), person-centered (n = 44), and transactional analysis (n = 47). Prior to commencing their placement, all trainees received one year's didactic training in their approach, and were required to be familiar with their approach's basic concepts and application to practice. Trainees received regular clinical supervision from an experienced psychotherapist at a rate of one hour for every four hours of psychotherapy practice, with supervision meetings at least once every two weeks.

Supervisors
Twenty supervisors were involved in the rating of the therapeutic approaches (TSQ, see below), 11 females and 9 males. The majority (n = 15) supervised more than one approach. The minimum qualification for supervisors was a Master's degree in one of the psychotherapies that they supervised. Five of the respondents held a professional doctorate, and one was an academic professor. All respondents had a minimum of five years' postqualification experience.

Measures
We chose a range of outcome measures to capture broad-scale psychological distress and to match the routine outcomes evaluated in England's Improving Access to Psychological Therapies (IAPT) program. In addition, a trauma outcome measure was used as previous clients at the service where the research was conducted have reported high levels of posttraumatic stress symptoms.

Cooper-Norcross Inventory of Preferences (C-NIP, V.1.0)
The C-NIP assesses respondents' preferences for psychotherapist style (Cooper & Norcross, 2016). This tool, now translated into six languages, was primarily developed for routine assessment of clients' preferences in clinical practice but has also been used for research . The inventory was constructed through principal component analysis of data from a convenience sample of US and UK laypeople and clinicians, responding as prospective clients (Cooper & Norcross, 2016). It is an 18-item instrument that yields scores on four dimensions: TD-CD (5 items), EI-ER (five items), PaO-PrO (three items), and WS-FC (five items) (see Introduction). The C-NIP invites participants to respond on seven-point, Likert-type items, scaled from −3 to 3 and labeled with opposing preferences (for instance, "Focus on specific goals" -"Not focus on specific goals"). Scale scores equal the unweighted sum of each of the item scores; and hence range from −15 to 15 on the TD-CD, EI-ER, and WS-FC scales, and −9 to 9 on the PaO-PrO scale. In each case, a higher score indicates a preference for the first term in the scale label. The last item on each of the first three subscales is in a reverse direction, as are the first and fourth item on the WS-FC subscale.
In the original measure development study, internal reliabilities were TD-CD Cronbach's α = .84, EI-ER α = .67, PaO-PrO α = .73, and WS-FC α = .60 (Cooper & Norcross, 2016). A subsequent online survey with a representative sample of US and UK members of the general public, again responding as prospective clients, found scale reliabilities of .79, .66, .77, and .55, respectively . In the present sample, the scale reliabilities were .67, .54, .61, and .63, respectively. However, on the first three of these scales, these increased to .74, .71, and .80 when the final reversed item was removed. These shortened, internally consistent scales were used in a sensitivity analysis of our data; and the implications of these alpha coefficients are considered in the Discussion.

Client Health Questionnaire Depression 9-item scale (PHQ-9)
The PHQ-9 is a nine-item brief self-report measure for detecting severity of depression symptoms in a general population (Kroenke, Spitzer, & Williams, 2001). Respondents are asked to rate a range of problems over the last two weeks, on a 0 ("not at all") to 3 ("nearly every day") scale, indicating the frequency of depressive symptoms. The total scores range from 0 to 27, with higher scores indicating a greater severity of depression, and with a clinical sample cut-off over 9. The PHQ-9 has high internal consistency (α = 0.89), good test-retest reliability (r = .84), and good convergent validity when correlated with the 20 Item Short Form Survey (SF-20) mental health subscale (r = .73) (Kroenke et al., 2001).

Generalized Anxiety Disorder 7-item scale (GAD-7)
The GAD-7 is a brief self-report measure to assess symptom severity of general anxiety disorder (Spitzer et al., 2006). Respondents are asked to rate a range of problems over the last two weeks, on a 0 ("not at all") to 3 ("nearly every day") scale, indicating the frequency of anxiety symptoms. The total scores range from 0 to 21, with higher scores indicating a greater severity of anxiety, and with a clinical sample cut-off over 7. The scale has high internal consistency (α = .92), high test-retest reliability (r = .83), and good convergent validity against the Beck Anxiety Inventory (r = .72) (Spitzer et al., 2006).

Clinical Outcomes in Routine Evaluation-10 item version (CORE-10)
The CORE-10 General Distress Measure (CORE Information Management Systems Ltd., 2007) is a short version of CORE-OM (Connell et al., 2007). It includes items on six factors, including anxiety, depression, functioning, risk, trauma, physical symptoms, and risk. Items refer to the previous week and are scored on a 0 ("not at all") to 4 ("all of the time") scale. Total scores range for 0 to 40, and scores of 11 or over are in the "clinical" range. The internal reliability of the CORE-10 is high, with α = .90; the score for the CORE-10 correlated with the CORE-OM at .94 in a clinical sample and .92 in a non-clinical sample (Barkham et al., 2013).

Work and Social Adjustment Scale (WSAS)
The WSAS (Mundtet al., 2002) is a brief, 5-item self-report scale that measures the respondent's perceived functional impairment, in relation to an identified problem. Each item rates a dimension of impairment (work, home management, social leisure activities, private leisure activities, and relationships) on a 0 ("not at all") to 8 ("very severely") scale. Total scores range from 0 to 40, with higher scores indicating more severe impairment, and scores above 10 associated with clinical populations. The measure shows good internal consistency (α = .82-.93), test-retest reliability, and clinical predictive validity (Mundt et al., 2002;Thandi et al., 2017); (Zahra et al., 2014).

Impact of Events Scale -Revised (IES-R)
The IES-R is a 22-item self-report measure (for DSM-IV) that assesses subjective distress caused by traumatic events. It is a revised version of the 15-item IES (Horowitz et al., 1979). Items correspond directly to 14 of the 17 DSM-IV symptoms of PTSD. Respondents are asked to refer to a specific stressful event, and, then, rate, for each item, how distressed they have been in the past seven days, using a 0 ("not at all") to 4 ("extremely") scale. Total scores range from 0 to 88. A clinical concern is identified in scores of 24 or more, and a probable diagnosis of PTSD in scores of 33 or above (Weiss & Marmar, 1995). The IES-R shows good internal consistency (α = .95) and good convergent validity against measures of PTSD, anxiety, and depression (Beck et al., 2008).

IAPT Phobia scale (PHO)
The IAPT Phobia Scale is a measure developed as part of the Improving Access to Psychological Therapies (IAPT) program within National Health Service (NHS) settings in the UK, based on the Marks and Mathews (1979) Fear Questionnaire (IAPT, 2011). It consists of three questions, each rating situations that might invoke social anxiety, panic disorder, and specific phobia using a 0 ("would not avoid it") to 8 ("always avoid it") scale. A score of four or greater, in each question, is indicative of possible clinical disorder (IAPT, 2011). Data on internal reliability have not been published but in our sample was α = .75 at assessment, with convergence against other measures of distress in the range of r = .37 (IES-R) to r = .49 (WSAS).

Agnew Relationship Measure-5 item version (ARM-5)
The ARM-5 (Cahill et al., 2012) is a 5-item version of the ARM-28, a self-report measure that assesses three dimensions of the therapeutic alliance: bond, partnership, and confidence in the treatment (Agnew-Davies et al., 1998). Respondents are asked to rate a statement that refers to the therapist-client relationship on a 1 ("strongly disagree") to 7 ("strongly agree") scale. Summed scores can range from 7 to 35. The internal consistency of the ARM-5 is α = .79 (Cahill et al., 2012). For the purposes of our analysis, we used ARM-5 scores at session 10.

Treatment Styles Questionnaire (TSQ)
We designed the TSQ for this study to assess the within-treatment activities that characterized a therapeutic approach from the perspective of experienced professionals/ supervisors. The TSQ has four sets of items, corresponding to the four scales on the C-NIP. For the first set of items, respondents were asked to rate on scales from 1 (strongly disagree) to 5 (strongly agree) how much they thought that clients with a strong preference for "therapist directiveness (focus on goals, structure or techniques)" were "well suited" to each of the five psychotherapeutic approaches used in this study. The three subsequent sets of items asked for similar ratings of suitedness with respect to clients who had strong preferences for emotional intensity, past orientation, and warm support. An initial question asked the respondents to indicate which of the five approaches they supervised.

Client recruitment, assessment, and selection
Prospective clients contacted the service personally and arranged an assessment appointment. In the majority of cases, clients were informed about the service, or recommended to attend, by their doctors, or by other statutory services such as IAPT. Assessees were asked to complete the C-NIP, PHQ-9, GAD-7, CORE-10, WSAS, IES-R, and PHO 24 hours prior to assessment, and bring them to their appointment with an assessor, along with their personal demographic form, contact details, and information about their primary physician. Completion of these questionnaires took approximately 15-20 minutes. Five assessors from the clinic participated in assessing prospective clients for eligibility for treatment. The assessment followed established protocol, focusing on clients' presenting issues, personal histories, treatment histories, health, levels of risk, usage of substances, aims for psychotherapy, preferences about the therapist (gender and ethnicity), and availability.
Where prospective clients met eligibility criteria, the assessor then allocated the client, non-randomly, to a trainee. The trainees worked once a week, with a three-hour slot, throughout their one-year-long placement, and could have a maximum of three clients at any given time. Allocation to trainees was based primarily on availability: that is, to a trainee who was available for the specific day and time when the client indicated they could attend psychotherapy. Where more than one trainee was available, the assessor prioritized allocation to therapists who had fewer clients or less completed training hours. In addition, where possible, allocation decisions took clients' gender and ethnicity preferences into account. The clients' C-NIP responses were not used in any way to inform allocation; but information from the assessment, including all C-NIP and other questionnaire data, was available to the trainees. Trainees were not instructed to change their ways of working in response to the questionnaires.

Ethical approval
Assessors described the clinic's research procedures to clients, responded to questions, and requested consent for participation in research. Clients who declined consent for the research were still offered psychotherapy.
Ethical approval for routine outcomes evaluation was granted on 21 April 2016, with subsequent reviews at each iteration of the protocol. Ethical approval for the dataset of 2018-2019 was granted on the 10 th of September 2018 by the respective university ethics committee.

Treatment and schedule of measures
Eighteen of the participants were allocated to Gestalt psychotherapy, 107 to humanistic psychotherapy, 86 to integrative psychotherapy, 133 to person-centered psychotherapy, and 134 to transactional analysis.
The clients attended between one and 47 sessions, with a mean of 16.3 sessions (SD = 11.6). In 12 cases, where clients, prior to session 5, reported difficulties establishing a therapeutic relationship with their psychotherapist, they were reallocated to a new psychotherapist by their assessor (in these instances, only data from the first allocation were used for our analysis).
Prior to each appointment, clients were asked to complete all outcome measures (GAD-7, PHQ-9, CORE-10, WSAS, IES-R, PHO) and the ARM-5 (from session 2 onwards) and bring the forms to the session. If clients scored below 24 on the IES-R at assessment, the trainees were advised not to administer the measure, although some continued to do so.
Data on dropout (i.e., client withdrawal from treatment before considered advisable by the therapist) were not available.

Therapeutic approaches
All five programs shared principles and methods consistent with the institution's humanistic, relational philosophy. All trainees were taught to establish a therapeutic alliance based on respect, empathy, and acceptance; listen carefully to clients; facilitate emotional expression; and reflect on their own psychological processes.
Gestalt psychotherapy. Gestalt therapy is a "process-focused" humanistic psychotherapy which originated with Perls et al. (1951). In their first year training, Gestalt psychotherapy trainees were taught to practice in both non-directive and challenging ways, with a focus on the immediate present, the client's current context ("the field"), and the relationship between psychotherapist and client. They were also taught to explore clients' embodied experiencing.
Person-centered psychotherapy. Person-centered psychotherapy, based on the work of Rogers (1951), assumes that distressed individuals have the capacity to address their difficulties if they can explore them with an empathic, supportive, and trustworthy psychotherapist. In their first-year training, person-centered trainees were taught to practice non-directively and to work with clients' current problems and circumstances.
Transactional analysis. Transactional analysis (TA), developed in the 1950s by Eric Berne (1958,1961) combines humanistic and psychoanalytic ideas, conceptualized through the model of "Parent," "Adult," and "Child" ego states. In their first year of training, TA trainees were taught to develop a range of therapeutic agreements with clients; and to work with "ego states" and historical patterns of relating and beliefs ("life scripts"). TA trainees were expected to offer therapeutic direction to clients, to focus on past relationships, and to work on challenging outdated beliefs and experiences.
Humanistic psychotherapy. Trainees in this approach received combined input on the basic principles of person-centered psychotherapy, Gestalt therapy, and TA. They were encouraged to vary their degree of directiveness, emotional intensity, and focus on past or present depending on the client's presentation and their own emerging style. As with the TA trainees, they were taught to build overt contractual agreements with clients but also to engage in warmly supportive ways.
Integrative psychotherapy. Integrative training at the institute centered on a model of five different types of therapeutic relating: working alliance, real relationship, transferential relationship, reparative relationship, and the transpersonal relationship (Clarkson, 1990). The approach integrated humanistic principles with a psychodynamic, intersubjective emphasis. This psychodynamic emphasis meant that trainees were taught to practice in non-directive ways, but with a focus on past relationships through transference work and reflections on unconscious processes.

Question 1: the within-treatment activity preferences of clients embarking on psychotherapy
Our first question, the within-treatment activity preferences of clients, was examined primarily through descriptive statistics. Differences in C-NIP scale scores across demographic groups were tested using analysis of variance, with Pearson correlations to test associations with age. We adopted an alpha level of p < .05 for these tests.

Questions 2 and 3: the effects of matching treatment to preferences, and of preferences themselves
The principal dependent variables that we used to analyze Questions 2 and 3 were changes on our outcome measures over the course of psychotherapy. These changes were analyzed using multilevel modelling, with session-by-session outcome scores (level-1) nested within clients (level-2). A three-level model, with clients nested within therapists, was considered. However, this was rejected as variance across therapists on all outcomes and on changes in outcomes over time did not add significantly to model fit. We conducted a single-level regression analysis for therapeutic alliance (ARM-5) scores at session 10. Again, a more complex model was considered -with clients nested within therapists -but was rejected as it did not add significantly to model fit.
As our analysis of the effects of matching (Question 2) aimed to test a specific hypothesis across multiple measures, we used a Bonferroni-corrected alpha of p < .0071 (.05/7 dependent variables). However, our examination of the effects of preferences, per se (Question 3), was exploratory, and therefore we retained an alpha level of p < .05. We also used an alpha of p < .05 in our preliminary model development when deciding whether or not to retain demographic and therapeutic approach variables in our models.

Misfit index.
To assess the effects of matching treatment to preferences, we developed a misfit index for each client. This was the degree of discrepancy between (a) the client's preferences, and (b) their assigned treatment approach, as rated by supervisors on the TSQ. To calculate each client's misfit index, we first standardized clients' scores (across clients) on each C-NIP scale. Using data from the TSQ, we then also calculated standardized scores for each therapeutic approach on each scale. A raw misfit score was then calculated for each client on each scale, based on the difference between their (standardized) preference, and the (standardized) mean rating of the therapeutic approach that they were allocated to. Next, we squared all our raw misfit scores and took their square roots, so that a misfit in either direction would have a positive value. Finally, we summed the misfit scores across the four scales to yield our overall misfit index.

Dummy coding of categorical variables.
Categorical variables were dummy coded as follows: ethnicity (Black and Minority ethnic [BME]/Mixed/other, White; reference category = not stated); disability (disabled, not disabled; reference category = not stated); sexual orientation (gay/lesbian, bisexual; reference category = not stated); and psychotherapeutic approach (Gestalt, TA, humanistic, integrative; reference category = person-centered).

Multilevel analysis.
Procedures for the multilevel analyses followed guidelines proposed by Hox (2010;Hox & Maas, 2005) and Singer and Willet (2003), and were conducted using MLwiN (version 3.02) software with the default iterative-generalized least-squares (IGLS) method of estimation. All linear predictor variables, aside from our TIME variable, were centered around their grand mean, to ensure interpretability of interaction effects. To examine whether assumptions of normality and linearity had been met, graphs of level-1 and level-2 residuals by rank, and by fixed part predictions, were inspected for the final models (Hox, 2010).
SPSS curve estimation indicated a logarithmic relationship between session number and changes in session scores. Hence, we used a natural log transformation of our session numbers as our TIME variable.
To establish predictors for each of our outcome scores, we first established unconditional means models. Next, for each outcome, an unconditional growth model was established, which introduced the TIME predictor into the model.
To control for individual characteristics, we then entered into the model, as a block, our client-level demographic variables: gender, age, ethnicity, disability status, and sexuality. Each of these independent variables (IVs), along with the IV × TIME interaction, were tested individually; and all significant predictors within this block were then tested together, with predictors removed if they were no longer significant. With significant predictors retained, we then went on to test a second block of IVs, therapeutic approach, using a similar strategy. Our third block of IVs was client preferences on the four C-NIP scales, tested in a similar way to our two previous blocks (Question 3). Finally, we tested whether misfit indices and, most importantly, the misfit indices × TIME interactions, would add significantly to model fit (Question 2). To complete the modelling process, a composite model was established, and the contribution of each individual predictor was re-examined, and predictors were removed if they no longer attained significance. Where interactions with TIME were significant, direct effects were always retained in the model (Hox, 2010), so that the interactions could be meaningfully interpreted.
Modeling for our therapeutic alliance variable was similar except that interactions with TIME were not entered as we focused specifically on ARM-5 scores at session 10.
"Significance" in this modelling process was assessed in two ways. First, through inspection of the predictor's parameter values against the standard error for this value (the "single parameter test, " Singer & Willett, 2003). Second, through the likelihood ratio statistics test, which compares the deviance statistic (an indicator of model fit) between a model and a more specified version of that model, based on a chi-squared distribution with degrees of freedom equal to the difference in the number of parameters between the models (Hox, 2010).

Sensitivity analyses.
We conducted several sensitivity analyses (using the same alpha cutpoints) to assess the robustness of our findings for Questions 2 and 3. First, we used single-level linear regression modelling on change from baseline to endpoint to see whether a focus on final outcomes, as opposed to session-by -session change, would produce similar results. Second, as matching to strong preferences have been hypothesized to be the key determinant of outcomes (Cooper & Norcross, 2016;Cooper et al., 2019), we conducted 2-way univariate analyses on each of our change and alliance dependent variables to see whether there was an interaction between (a) strong preferences on the C-NIP dimensions, and (b) therapeutic approaches most strongly suited to those preferences (as rated on the TSQ). Two further sensitivity analyses were conducted on our multilevel models to address potential limitations in our design. First, as we were concerned that supervisors who did not practice, or supervise, particular psychotherapeutic approaches might be inaccurate raters of those approaches (on the TSQ), we re-calculated our misfit indices using only the ratings of supervisors of the targeted approaches. Second, as the internal consistency of our C-NIP dimensions was less than optimal, we re-conducted our analyses on the first three C-NIP subscales with re-calculated scores, based only on items that, together, showed internal consistency of α > .70.

Preliminary analysis: supervisors' ratings of psychotherapeutic approaches
The supervisors' ratings of the five psychotherapeutic approaches on the C-NIP dimensions are presented in Table 2 (with ratings for supervisors of the targeted approaches, only, available online).
On the TD-CD dimension, ratings for all supervisors ranged from 1.6 (person-centered) to 3.6 (TA), on the EI-ER dimension ranged from 3.3 (TA) to 4.1 (Gestalt), on the PaO-PrO dimension ranged from 2.5 (person-centered) to 3.7 (integrative), and on the WS-FC dimension ranged from 3.0 (TA) to 4.1 (person-centered).

Model development
Final models for our seven outcome variables are presented in Table 3. Visual inspection of residuals suggested criteria for normality and linearity had been met.

Preliminary model development
All outcome measure scores showed significant improvements over time, with both intercepts and improvements over time showing the greatest contributions to model fit when allowed to vary randomly at the client level.
With respect to demographic characteristics, our most consistent finding was that disabled clients had higher overall scores on all measures of distress than non-disabled clients. This ranged from 14.42 points higher on the IES-R (for clients coded as "disabled" versus those not coded as "disabled") to 2.23 points higher on the WSAS (for clients not coded as "not disabled" versus those coded as "not disabled"). In addition, on five of the six outcome measures, White clients had lower levels of overall distress than BME clients. This ranged from 4.60 points lower on the IES-R to 1.24 points lower on the GAD-7 (for clients coded as "White" versus those not coded as "White"). However, there was no evidence that disability status or ethnicity was related to changes in outcomes over time. Clients coded as "heterosexual" showed greater reductions in depression (PHQ-9) and anxiety (GAD-7) over time, as compared with those not coded as "heterosexual"; and improvements over time were less for clients coded as "bisexual" on the IES-R. On three of the outcome measures, clients coded as "heterosexual" also showed higher overall scores. Males showed lower levels of anxiety than females on the GAD-7 (b = −1.36), but also poorer alliances at session 10 on the ARM-5 (b = −1.36). Older age was associated with lower overall scores on the WSAS (b = −0.08), but also with less reduction over time (b = 0.02).
Therapeutic approach did not contribute significantly to fit on any of our seven models.

Association between preferences and outcomes (Question 3)
Higher scores on the WS-FC dimension were significantly associated with lower improvements over time on four of the six outcome measures, ranging from b = 0.04 (PHO) to b = 0.08 (CORE-10). This means that clients who indicated a preference for warm support at assessment tended to improve less than those who indicated a preference for focused challenge. Suppose, for instance, that two clients both scored 13 on the PHQ-9 at assessment (the sample mean), but one scored 10 on the WS-FC dimension, and the other scored −10. Holding all other variables constant, our model would suggest that, by  Total misfit*TIME Note. u 0j = random variance for intercept, u 1j = random variance for TIME. Square brackets indicate non-significant direct effects entered to make significant interactions interpretable. Higher numbers on all outcome scores indicate greater distress. Higher numbers on ARM-5 indicate greater alliance.
session 20, the client preferring warm support would have improved by 4.3 points, while the client preferring focused challenge would have improved by 7.9 points. Two further sets of findings suggested that clients who wanted more active input from their psychotherapists tended to improve more over time. First, on the GAD-7 (b = −0.05) and the PHO (b = −0.04), a greater desire for therapist directiveness (TD-CD) was associated with more improvements. Second, on the WSAS, a greater desire for emotional intensity (EI-ER) was associated with more improvement over time (b = −0.09).
A greater desire for therapist directiveness (TD-CD) was associated with higher levels of overall distress on five of our outcome measures, ranging from b = 0.15 (PHQ-9) to b = 0.57 (IES-R).

Association between misfit and outcomes (Question 2)
In the analysis of our main hypothesis (Question 2), total misfit between preferences and allocated therapeutic approach was not associated with change on any of our outcome measures over time. Single parameter scores (b/SE) ranged from 0.42 to −0.68. Total misfit was also not significantly associated with therapeutic alliance on the ARM-5, b = .21, SE = 0.11.

Sensitivity analyses
Results from our separate linear regression models were substantially the same as from our multilevel analyses (see online material). Misfit was not associated with change on any outcomes; and clients who preferred focused challenge over warm support showed significantly greater change on the PHQ-9 (b = −0.11), CORE-10 (b = −0.19), and PHO (b = −0.11). In our categorical analyses, we did not find any evidence that clients with strong preferences on the C-NIP dimensions did better in the therapeutic approaches matched to those preferences than clients with strong preferences in the opposite directions. No significant matching effects were found when we used only the TSQ ratings from supervisors of the targeted therapeutic approaches. We also found no significant matching effects on the outcome measures when we used the shortened C-NIP scales with internal consistency > .7. However, when only these scores were used, we found a significant positive association between misfit and alliance on the ARM-5, b = .40, SE = .14, p = .002 (indicating that more misfit was associated with a greater alliance at session 10). The latter sensitivity analyses also showed no substantial deviations from the relations between preferences and outcomes shown in Table 3.

Discussion
Perhaps the most novel finding from our study (in answer to Question 3) was that clients who expressed a preference for focused challenge (and other, more active therapist activities) tended to do better in psychotherapy than those who expressed a preference for warm support. One possible explanation for these findings is that clients' preferences for more active therapist inputs are indications of their motivation, engagement, or readiness to change: client factors that are known to be amongst the most important determinants of improvement (Bohart & Wade, 2013). Clients who express a preference for challenge and confrontation, for instance, may be more likely to be in the "preparation" or "action" "stages of change" (Krebs et al., 2018;Prochaska & DiClemente, 1986), while those wanting warm support may be still in the "contemplation" or "precontemplation" stages. A related interpretation is that a client's desire for warm support could indicate a hope, or expectation, of therapist sympathy, rather than a drive towards personal transformation. It might also reflect a greater dependency on the therapist.
Another possible explanation is that clients who desire focused challenge may tend to enter psychotherapy with problems that are specific and solvable, or with change that is measurable on standard outcome tools; whereas clients who desire warm support may tend to enter psychotherapy with more amorphous and difficult-to-measure problems. Such differences may reflect clients' problems being at different stages of assimilation (e.g., Stiles, 2011). Unassimilated problems are experienced as amorphous, and the process of bringing them into clearer awareness may not immediately reduce distress, even though it is therapeutically necessary, according to the assimilation model (Basto et al., 2017). Supportive, rather than challenging, therapeutic approaches may be a better fit for facilitating this uncovering process (Stiles et al., 1992;Stiles, Shankland, Wright, & Field, 1997).
Our findings here, however, are limited by the low internal consistency of the C-NIP subscales (in particular the WS-FC dimension), and by the relatively narrow range of therapies (humanistic/relational) within which these effects were identified. Future research would benefit from studying the relationship between preferences and change in other therapeutic approaches such as CBT. Development of more reliable means of assessing within-treatment activity preference is also essential for further studies.
In answer to our Question 2, and in contrast to the recent meta-analyses (Lindhiem et al., 2014;Swift et al., 2019;Windle et al., 2019), we found no evidence of a preference matching effect. (Indeed, the one nominally significant finding, from our sensitivity analyses, indicated that greater misfit was associated with a stronger therapeutic alliance.) Our null findings concur, however, with the more closely similar previous studies (e.g., Kerns et al., 2014;Kludt & Perlmuter, 1999) in suggesting that matching on withintreatment activity preferences does not lead to differentially positive benefits. As one way to understand this, psychotherapists and clients may have been involved in an ongoing, interactive, and implicit process of adjustment and responsiveness to the client's preferences within the framework of the therapist's approach, with both parties playing an active and agentic role. To the extent that this responsive accommodation was successful, each client may have received an individually optimized treatment, overcoming any effects of mismatching . In effect, the responsiveness may have tended to minimize the actual levels of misfit. In addition to their clinical sensitivity, the trainees had access to the clients' C-NIP scores (though it is unclear how much they examined them), and they may have tailored their work to fit the clients' preferences.
There are also several important limitations to our study design that could explain our null result, even in the absence of responsiveness effects. First, our assessment was based on ratings of named therapeutic approaches rather than direct observations of practices, and our TSQ measure was novel and has not been separately validated. Second, all of the treatments were delivered by early-stage trainees so their results may not be representative of these treatments as usually practiced. Third, our range of therapeutic approaches was relatively narrow: all based on relational and humanistic principles and some major approaches unrepresented (for instance, no CBT). Fourth, as noted earlier, the internal consistency of our preference scales was limited. Fifth, we did not assess dropout, which has been found to have the clearest association with preference accommodation to date (Swift et al., 2018). And sixth, allocation to therapeutic approach was not random.
Non-significant findings may also have come about because we looked at preference matching effects across all clients, rather than specifying the particular clients for whom matching effects may have been most impactful. Norcross and Cooper (2021), for instance, hypothesized that matching effects are likely to be greatest when individuals have high levels of motive congruence: where their explicitly recognized wants and needs match their implicit wants and needs (Thrash et al., 2012). Such motive congruence has been found to be higher in self-determining individuals, and in people who are more sensitive to their bodily states and less monitoring of other's expectations (Thrash et al., 2012). In addition, motive congruence is closely associated with domain familiarity: how knowledgeable people are about a particular context (Lichtenstein & Slovic, 2006). Hence, matching effects might be hypothesized to be greatest in clients who have previous knowledge, and experience, of psychotherapy; while those new to the domain might be predicted to have less understanding of what will be most suited to them.
Subsequent studies of within-treatment activity preference matching effects, therefore, might profitably (a) directly assess therapeutic practices, using (b) validated instruments, with interventions delivered by (c) experienced psychotherapists, who are (d) randomly allocated and (e) providing a wide diversity of therapeutic approaches. In addition, there should be assessment of (f) dropout and (g) client variables, in particular previous experience of psychotherapy.
In answer to our Question 1, the distributions of activity preferences of clients in our UK clinical sample showed only modest differences from that of the representative sample of UK and US general public . On the TD-CD subscale, around 40% in each sample showed a strong preference for therapist directiveness, with less than 5% showing a strong preference for client directiveness. On the EI-ER subscale, we found that a higher percentage of our clinical sample showed a strong preference for emotional intensity (35.9%) as compared with the general public sample (20.0%). On the PaO-PrO dimension, the present sample showed a more even split between strong preferences for past and present orientation (22.7% and 24.8%, respectively) as compared with the general public sample (19.7% and 38.4%, respectively). On the WS-FC dimension, we found a higher percentage of participants showing strong preferences for focused challenge (32.6% vs 9.4% for warm support), as compared with the general public sample (23.2% vs 25.4% for warm support). Thus, these results converge with findings from previous studies to provide robust evidence that people, whether members of the general public or clients about to embark on psychotherapy, tend to want direction from a psychotherapist in the form of skills, goals, and structure; and encouragement to express their emotions.
Two of our significant demographic predictors of C-NIP scores were consistent with previous findings (Cooper et al., 2017). First, BME clients showed a higher preference for emotional intensity than non-BME clients. It is not clear why this is the case. Indeed, some BME, collectivist cultures place greater emphasis and value on low emotional arousal as compared with more individualist, Western cultures (Lim, 2016). Second, females showed a greater preference for warm support over focused challenge, as compared with males. This finding would seem to be consistent with social role theories of sex differences (e.g., Eagly & Wood, 2016), by which men are expected to act in more dominant, agentic, and independent ways (Koenig, 2018).
As with all our analyses, these findings are limited by the low internal consistency of the C-NIP scales. A priority for further research, therefore, is to ensure that activity preferences are measured using indicators of proven reliability. The exclusion of certain clients from our sample also means that these findings cannot be generalized to the full population of clients -in particular, those with more severe and enduring mental health problems. Nevertheless, given the importance of tailoring treatment to the specific needs and wants of particular cultures, genders, and other demographic groupings, this inquiry into preference differences across demographic characteristics is an essential area for further enquiry.
Among this study's implications for practice, psychotherapists should be aware that most -though not all -clients come to treatment wanting direction from their psychotherapists and encouragement to express strong feelings. They should also be aware that female clients may want more warm support than males, and that BME clients may have a particular preference for emotional intensity. Our study did not provide evidence that outcomes and alliance are improved if psychotherapists accommodate such preferences; but other findings (Lindhiem et al., 2014;Swift et al., 2019;Windle et al., 2019) -as well as ethical consideration, such as the desire to support client autonomy -suggest that psychotherapists should be cognizant of such tendencies. On the other hand, we also found considerable diversity across clients' preferences, supporting a view that psychotherapists should never assume what individual client's preferences are, but engage in a process of preference assessment and dialogue (Norcross & Cooper, 2021;Schmid, 2002). In assessing client preferences, either through measures or through dialogue, it may be important to take note when clients are expressing strong preferences for support over challenge or are in other ways indicating a desire for a less active psychotherapist stance. Our findings suggest that such clients may be at particular risk of poor outcomes. Talking to clients about the meaning of such preferences may be important, particularly to ascertain if it is linked to poor motivation for treatment or a precontemplation stage of change.

Conclusion
Our study is the first to systematically explore the activity preferences of clients at the commencement of psychotherapy. Consistent with previous findings from general public samples, we found that clients generally prefer more directive and emotionally intense activities; with females showing a greater preference for warm support than males, and BME clients showing a greater preference for emotional intensity than non-BME clients. We did not find evidence of a preference matching effect, though methodological limitations in our study make any conclusions here premature. Of greatest importance, perhaps, we found an association between a preference for focused challenge and improved psychotherapy outcomes. This result is consistent with research on client factors in psychotherapy and, whilst requiring further empirical support and exploration, may have important implications for the assessment of readiness for -and likelihood ofchange.