A meta-analysis of the reliability of the Sexual Self-Esteem Inventory in Women (SSEI-W) measure

ABSTRACT In 1996, Zeanah & Schwarz proposed a new measurement instrument for capturing sexual self-esteem in women (SSEI-W). This 81 item measure is a multidimensional measure, allowing for both the calculation of an overall scale and scores for five subscale components. Since its conception, this measure has been broadly used not just with student samples but also with general population and clinical population samples. Although the measure’s reliability was originally validated in a student sample, it has been used broadly in other populations and also in other cultures. Therefore, we examine the reliability based on Cronbach’s alpha of the SSEI-W via random effects meta-analyses and explore which aspects could impact the reliability of the scale. Our results showed that while there is substantial heterogeneity, the overall measure shows very good reliability. There was little evidence that sample characteristics impacted the overall reliability of the SSEI-W, though, as expected, shortened versions produced lower reliabilities. Good to very good reliabilities were also found for all the subscales. We discuss directions for further research with the SSEI-W.


Introduction
Sexuality is an important part of human experience. Early psychological research into sex tended to focus on attitudes towards sex and sexual behaviours (e.g. Kinsey et al., 1948Kinsey et al., , 1953Robinson, 1976). However, as with many social phenomena, an individual's view of their own sexuality and sexual practices can influence these behaviours. Thus, Zeanah and Schwarz (1996) developed the Sexual Self-Esteem Inventory (SSEI) (review in Zeanah & Schwarz, 2019). Their scale was intended to help clinicians and researchers understand how sexual self-esteem could influence individuals' sexual behaviours and well-being. In the past 24 years since the SSEI was developed, it has been used by researchers not only in a variety of contexts but also in diverse populations. Thus, the goal of the current study was to conduct a meta-analysis on the reliability of the SSEI and its subscales using reliability measures reported for the different populations in these studies.

Sexual self-esteem inventory
The creators of the Sexual Self-Esteem Inventory highlighted the need for such a scale because findings from research on global self-esteem and sexuality were mixed, and a general measure of self-esteem may not be sensitive enough to capture differences in sexual self-esteem. In the original paper, they focused on women's sexual self-esteem because there are societal norms about sex that could influence men and women's responses to the measure. In a later review, the authors of the original paper do report an unpublished paper, arguing that the measure can also be used with samples of men (Zeanah & Schwarz, 2019). Based on theory about the factors that influence an individual's view of their own sexuality, the authors proposed five separate domains of sexual selfesteem and created subscales to measure each. The skill and experience subscale measures individuals' ability to please or be pleased by a partner and their opportunities for sexual interactions. The attractiveness subscale refers to an individual's feelings about their own body and their sexual appeal. It is important to note that this subscale refers to one's satisfaction with the body as a whole rather than specific body parts and thus is gender neutral. The control subscale measures how much control individuals feel over their sexual thoughts, feelings, and behaviour. The moral judgment subscale refers to whether a person's sexual activities are morally acceptable in their own eyes. Finally, the adaptiveness subscale measures to what extent individuals are satisfied with their sexual relationships because these relationships meet their goals and needs. This five factor model was supported by a principal component factor analysis (Zeanah & Schwarz, 2019). The authors of the original paper found that the SSEI had good convergent validity. They found that the attractiveness, skill/experience, control, and adaptiveness subscales positively correlated with frequency of dating, sexual experience, and relationship commitment. They also found that sexual guilt was positively correlated with the moral judgment subscale and that the number of sexual partners participants reported was negatively correlated with the control subscale. There was also some evidence for divergent validity. When they examined the correlation between the Rosenberg Self-Esteem Scale, which measures general self-esteem, they found that Rosenberg's Self-Esteem scale was only weakly correlated with the outcome variables, whereas the subscales on the SSEI were moderately correlated with the outcome variables.

SSEI's use in research
Since its development the SSEI has proved useful in many different research contexts. SSEI has been particularly helpful in studying the antecedents and consequences of sexual behaviour. For example, it has been used to study university age women who engage in hook-up culture in the United States and how their sexual self-esteem relates to their sexual practices (Dave, 2011;Evans, 2013;McLeese, 2015). It has also been used to study the consequences of childhood or adolescent sexual assault on adult views on sexuality (Faulkner, 2011;Kelley & Gidycz, 2015;Krahé & Berger, 2017a). The link between SSEI and sexual communication has equally been a topic of interest (Oattes & Offman, 2007;Rosenfeld, 2004). The SSEI has also been used by media researchers, examining why certain people engage with different types of media, such as romance novels (Reese-Weber & McBride, 2015) or dating apps (Tomaszewska & Schuster, 2019). Finally, it has been used in research not directly connected to sex or romantic relationships, for example, in understanding how weight loss (Barghi et al., 2017) or the desire for cosmetic surgery (Toussi & Shareh, 2018) influence sexual self-esteem.
Not only has the SSEI been used to answer varied research questions, it has also been used in diverse populations. The SSEI has been translated into multiple languages and used in several different countries including Iran, Germany, Poland, Chile, and Turkey. Furthermore, the scale has been used in both clinical samples and nonclinical samples. Clinical samples include teens in treatment for mental health issues (Swenson et al., 2012), women in treatment for sexual violence induced PTSD (Bornefeld-Ettmann et al., 2018), and women in treatment for depression (Krahé & Berger, 2017b). The scale has also been used with diverse non-clinical samples, such as sex-workers (Shareh, 2016), men who sleep with men who are HIV positive (Pando, 2015), and women who struggle with weight issues (Barghi et al., 2017;Jafari et al., 2016). Additionally, it continues to be used with university student samples: the population on which it was originally tested and validated. The original authors of the scale state that gender, age, and other sociodemographic variables could potentially influence how participants interpret the items and view each factor of sexual self-esteem included in the SSEI. For example, researchers who used a population of men who sleep with men found that they had higher scores on perceived attractiveness than heterosexual female college students, but lower scores than heterosexual male college students (Pando, 2015). Thus, it would seem pertinent to re-examine the reliability of the scale based on diverse samples from around the world to examine if the reliability systematically varies according to socio-demographic attributes of the sample.

Commonly reported measures of reliability
The reliability of a scale can be defined as how consistently a scale measures a specific construct, either over time or across all items in the scale (Cronbach, 1951). In the seminal paper describing Cronbach's alpha (α), Cronbach wisely points out that reliability over time and across items are useful for different purposes. Reliability over time is more concerned with stable constructs that we do not expect to change over time within individuals, while reliability across items is about measuring a core construct. Therefore, the use of one form of reliability over another depends on one's research question. Sexual self-esteem is posited to change over time as individuals receive positive or negative feedback (Zeanah & Schwarz, 1996), thus a measure of internal reliability is most appropriate, rather than test-retest reliability. Cronbach's α measures internal reliability by calculating the mean of all possible split-half correlations. This means that the items are split in half in all possible combinations and correlated and thus Cronbach's α can be interpreted similarly to a correlation, even though the mathematical derivation is different. Scores closer to 1 indicate higher internal consistency. Perhaps due to the ease of interpretation and simplicity of calculation, Cronbach's α is the most frequently reported measure of reliability for scales in psychology (Dunn et al., 2014), even though it is not without strong limitations (e.g. Schmitt, 1996;Sijtsma, 2009). Due to it being commonly reported, we have decided to use Cronbach's α as our measure of reliability in the current meta-analysis, in hopes that studies, where the SSEI was used, will at minimum have reported Cronbach's α.
When the scale was originally developed, the researchers calculated Cronbach's α for each subscale rather than for the total scale. For the Skill and Experience subscale made up of 18 items, Cronbach's α was reported as .93. For the 17 item attractiveness scale, Cronbach's α was .94. The 16item control subscale was slightly less consistent with a Cronbach's α of .88. The moral judgment subscale, consisting of 19 items, had a Cronbach's α of .85. Finally, the 15-item adaptiveness subscale had a Cronbach's α of .90. Thus, the items in each subscale are strongly interrelated and the individual subscales demonstrate good internal consistency.
In our investigation, we hope to see similarly high values for Cronbach's α, however, there are several factors that can influence α. The most important is the strength of correlations between items, which is the measure of internal consistency that is of interest. The second is the dimensionality of a scale. Essentially, Cronbach's α treats variability due to items correlating with uncorrelated subscales as error, thus scales with subscales that are weakly or uncorrelated tend to have lower α's. This may be why the authors of the original SSEI only reported α for each individual subscale. Finally, α can be influenced by the number of items in the scale (up to 19 items) (Cortina, 1993). This becomes evident if we consider that the effect of one bad item (weakly correlated with other items), is watered down when it is combined with more items that are strongly correlated. Thus, the more items in the scale, the higher our standard for a good value of α should be. Inversely, in studies using a short form of the SSEI, we expect slightly lower α values.

Current study
Our aim was to verify the reliability of the Sexual Self-Esteem Inventory and examine its reliability in diverse populations from around the world. We followed the PRISMA guidelines to gather studies that had used the SSEI, based on the criteria that they had used at least one of the SSEI subscales and reported Cronbach's α. We used Cronbach's α as a measure of reliability and conducted a random effects meta-analysis (Vacha-Haase, 1998) to estimate an overall reliability value for the measure.

Data collection
The study was registered on the Open Science Framework (OSF, https://osf.io/54q6w/) and follows the PRISMA guidelines where applicable. PRISMA is a set of evidence-based guidelines/items which aids in the reporting of meta-analyses and systematic reviews (Moher et al., 2010). We deviate in some cases from this form as the PRISMA guidelines are designed for randomised controlled trials, rather than the study of reliability. The PRISMA Flow Chart used to select studies can be seen in Figure 1.
A sample of 213 studies were identified through various databases including Google Scholar (N = 99), Scopus (N = 50), Sage Publications (N = 36) and through inter-library loans (N = 28). These papers were identified by searching for articles that cited the original reference (Zeanah & Schwarz, 1996), in any language, regardless of any item modification (though it appears that none of the articles explicitly report modifying individual items) and regardless of using the whole scale or a subscale (e.g., Walsh et al., 2013;. Peer-reviewed articles, PhD dissertations, and Masters theses were included if they met the selection criteria. Of the 213 records identified, 114 (53.52%) were discarded due to these being duplicates which left 99 studies to filter through. One study was excluded due to the paper not being accessible as it was removed from the database. This left 98 studies which were assessed for eligibility, 52 (53%) were excluded as they only referenced the original paper but did not use the scale. Ten of the eligible 46 studies utilised the scale, but did not report the Cronbach α's required. We contacted these authors where possible to still include these, but were unable to include them in our analyses as we were unable to calculate an α for our analyses. This left 36 studies in the sample. There were two papers derived from the same sample (Krahé & Berger, 2017a, 2017b, we, therefore, included the one with the largest final sample size (N = 2,425 vs. N = 2,251) in our further analysis, but note that these two samples yielded identical estimates for α. These 35 samples represented 13,960 participants. Ten of these 35 studies did not report age (in years), the estimated weighted average for age is M= 25.90 years (SD = 8.45).

Coding of sample characteristics
The sample characteristics were coded for each study in which they were present. They included: (a) sample size, (b) mean age, (c) percent female, (d) type of sample: general population, student or clinical, (e) geographical location -where the study was conducted which was coded via ISO codes (three letter codes documenting the country where the data were collected); (f) percent heterosexual (g) percent in a romantic relationship. These were chosen for exploratory purposes and description of the samples. The choice of these sample characteristics is similar to other meta-analyses of reliability (e.g. Graham & Christiansen, 2009;Rouse, 2007).

Analytical strategy
As Cronbach's α can be straightforwardly interpreted as a correlation coefficient (Bland & Altman, 1997). We apply Fisher's r to z transform for the analyses (e.g. Caruso, 2000;O'rourke, 2004), but we transform the values back to r when reporting in text. Reliabilities were summarised via random effects meta-analyses with a Sidik-Jonkman estimator for τ 2 . We also report other common measures for heterogeneity, i.e. estimates for the between study variation in α, including I 2 , as a crude rule of thumb >75% is deemed to be an indicator of substantial heterogeneity (Higgins et al., 2003). There are alternative methods to transform α (Bonett, 2002;Hakstian & Whalen, 1976;Rodriguez & Maeda, 2006) or one could also use the raw alpha. We opted for the Fisher's r to z transform as it is more widely employed in meta-analysis and allows us to further examine the consequences of shortening (alternative methods use the number of items in the meta-analyses). Our supplementary analyses showed little difference between any of the transformations on the fundamental conclusions (changes were largely limited to the second decimal of estimates). More generally, simulation studies suggest that different ways of constructing confidence intervals for α tend to yield negligible differences (Romano et al., 2010). We report the forest plot with 95% confidence intervals which allow testing whether they fell within Nunnally's (1978) acceptable range (.7) for the overall scale in terms of α. For the subscales the forest plots can be found on the Science Framework (OSF). We examined publication bias for the overall scale based on a visual check of the funnel plot and Egger's test (Borenstein et al., 2009;Egger et al., 1997). It is important to note that publication bias is but one cause for funnel plot asymmetry (Egger et al., 1997, p. 632). For the subscales, these checks for publication bias are reported in full on the OSF. These are not reported here fully in text, in part as the number of studies is problematic (Sterne et al., 2011). Similarly, we report estimates following trim-and-fill procedure (Duval & Tweedie, 2000;Mavridis & Salanti, 2014). This non-parametric procedure first (1) trims, i.e. removes, the smaller studies causing the funnel plot asymmetry, next (2) it uses the trimmed funnel plot to estimate the true centre of the funnel, and finally (3) it then imputes any omitted studies around the centre (filling). However, one should note the limitations of this procedure (e.g. Peters et al., 2007;Simonsohn et al., 2018). Finally, while caution must be used when interpreting fail-safe N's (e.g. Becker, 2005), we also report how many studies would need to be added for the estimated reliability to fall below .5 (Orwin, 1983).
We performed a series of exploratory univariate meta-regressions to explore whether the type of sampling (Clinical/Student/General population sample), translation, shortening of the scale (No/Yes), publication year, proportion of female participants, proportion of heterosexual participants and proportion of participants in a relationship could be related to reliability. We used a permutation method with a 1,000 permutations to assess the robustness of these meta-regressions (Good, 2013;Viechtbauer, 2010). In our supplementary analyses on the OSF, we report similar analyses for the subscales. These are not reported in text, as the number of studies for each of these meta-regressions was small and we, therefore, caution against attaching inferences to these. In addition, it is important to bear in mind that meta-regressions are especially likely to yield false positive results when the number of studies is low, there are a large number of candidate predictors, and when heterogeneity is present (Higgins & Thompson, 2004). This applies to all our meta-regressions.
All analyses followed the PRISMA guidelines where possible (Moher et al., 2010). The PRISMA guidelines were designed with randomised controlled trials in mind, whereas our focus is on reliability, therefore not all guidelines apply. Our design and core analyses were preregistered on the OSF. On the OSF, we also present sample descriptions for subscales, additional exploratory analyses, and robustness checks (e.g. leave-one-out analysis, changing the estimator of τ 2 , using different transformations for α (Bonett, 2002;Hakstian & Whalen, 1976)).
The core analyses were conducted in R 4.0.2 (R Development Core Team, 2008), with the packages meta and metafor (Schwarzer et al., 2015;Viechtbauer, 2010). Our data and script are available from the OSF.

Qualitative synthesis and sample description
Studies were published between 2002 and 2019. There was some geographical spread among the 35 samples but the majority of samples were from the United States (k = 18), followed by Iran (k = 5), Germany (k = 4) and Canada (k = 3). All other countries only contributed a single sample to the dataset (Chile, France, Poland, Portugal, Turkey, UK; Figure 2). Notably, there were no samples from Africa, Australasia, and East Asia. The majority of the samples relied on the original rather than a translated version (k = 30, 4 translated samples, 1 was a mixture of translated and original). Around half of the samples shortened the original scale (k = 18), shortening it to either 35 items (k = 13) or fewer items (k = 5). Three samples indicated validation of the shortened version used (Bornefeld-Ettmann et al., 2018;Farokhi & Shareh, 2014;Hannier et al., 2018). The majority of samples were classified as student samples (k = 22), followed by general population samples (k = 8) and the remainder was classified as clinical samples (k = 5). Unsurprisingly the sample was predominantly female (82.86%, weighted average). Six samples have used the SSEI-W in a sample that also contained men and one used an exclusively male sample (Pando, 2015). Of the 35 samples, 21 provided some information on sexual orientation and 16 provided some information on relationship status. The majority of participants were heterosexual (87.03%) and roughly half of them were in a relationship (50.96%).

Overall scale
Of the 35 eligible studies, 27 reported a Cronbach's α for the overall scale, totalling 11,223 participants (range: N = 64 to N = 2,425). The estimate from the random effects meta-analysis for α is .90, 95% CI [.88; .92]. Figure 3 shows the forest plot summarising the meta-analysis. Figure 3 also shows that there is but a single individual study which had an overlapping confidence interval with α = .7 (Santos, 2013) which would be considered a low level of reliability. There was, however, substantial heterogeneity, Q(26) = 804.24, p < .0001, I 2 = 96.8%, τ 2 = .08.
A visual check suggested asymmetry in the funnel plot, this was corroborated by Egger's test (t (25) = 3.47, p = .002). Using Orwin's fail-safe N procedure (Orwin, 1983), 46 studies are necessary to reduce the reliability to .5. A trim-and-fill procedure would add 11 studies to the left of the plot (Figure 4). The revised random-effects estimate of α is estimated to be .85, 95%CI [.80; .88] (Q (37) = 1610.48, p < .0001, I 2 = 97.7%, τ 2 = .20). While adjustment for potential publication bias reduces the estimated reliability the scale is estimated to have good reliability because over 40 additional studies with poor reliability on the SSEI would need to be conducted to reduce the reliability to an unacceptable level. Thus, we can be relatively confident in the high estimations of reliability observed in the original studies.  A univariate meta-regression relying on permutation testing (1,000 permutations) suggested that publication year was significantly related to reliability (Q(1) = 5.87, p = .013). More recent studies tended to have lower reliability (B = −.03, 95% CI: −.05 -−.01). Meta-regression also suggested that shortened versions were associated with lower reliabilities (Q(1) = 6.16, p = .014; (B = −.27, 95% CI: −.48 -−.06)); however, this is to be expected since longer scales tend to have higher α's (Cortina, 1993). There was no indication that the type of sample (Clinical/ General/Student), Proportion of Women, Proportion of Heterosexual participants, Proportion of participants in a relationship, or translation had a notable effect on the observed heterogeneity of α (Q tests for moderators: all p's > .125).

Subscales
All the estimates for 95% confidence intervals for the reliabilities of the subscales largely overlap, it, therefore, seems that the overall effect is unlikely to be driven by a single subscale or that certain subscales have a much greater reliability than others. Skill/Experience. 14 studies comprising 3,693 participants were meta-analysed and yielded an estimate of α = .85, 95% CI [.81;.87]. There was substantial heterogeneity, Q(13) = 180.66, p < .0001, I 2 = 92.8%, τ 2 = .03. A visual check suggested no indication of funnel plot asymmetry. The fail-safe N procedure suggested that 18 studies would be needed to reduce the reliability to .5.

Discussion
In the current meta-analysis of the Sexual Self-Esteem Inventory for Women (SSEI-W), we analysed 35 studies conducted in 10 different countries with varied populations. The α for the overall scale showed good reliability or interrelatedness of items, even after accounting for potential publication bias. Each subscale also showed good reliability in terms of α which suggests that the inventory can be used with confidence in whole or in part. Interestingly, there is little evidence to show that sample characteristics, translations, or modifications (shortened forms) to the SSEI-W have a substantial impact on estimated reliability. Thus, while the scale has not been validated in these populations, researchers can expect the α for this measure to be similar across diverse populations. Though one should bear in mind that α captures only one aspect of the reliability of the scale, as measured by the interrelatedness of the items, and not its validity in measurement across groups. As we elaborate below, future work would benefit from validating the SSEI-W in different cultures and establishing measurement equivalence. This will then also open a path to examine the role for the broader study of cultural variables (e.g. Hofstede, 2001) in explaining variation in reliability (see OSF).
Interestingly, we did observe that there was substantial heterogeneity in our meta-analyses of α's. Apart from the length of the inventory used (complete versus short form), none of the other sample characteristics robustly explained the heterogeneity in reliability between the studies. However, such heterogeneity in reliability is to be expected as measurement error or variation in methods can cause such variability (Higgins et al., 2003).
Although our analyses do not provide direct evidence of the validity of the scale, the articles on which our reliability analyses are based do provide evidence for some aspects of validity of the scale, specifically criterion validity. For example, when a patient group of women who had experienced sexual or relationship violence was compared to a healthy control group, researchers found that women who had experienced sexual violence had lower sexual self-esteem and indeed scored lower on all five subscales than the control group (Bornefeld-Ettmann et al., 2018). In a similar study, women who had experienced childhood sexual abuse had lower scores on the SSEI than a control group and sexual self-esteem, as measured with the SSEI, partially mediated the relationship between past abuse and revictimization (Van Bruggen et al., 2006). Higher scores on the SSEI have also been linked to better sexual communication in intimate relationships (Oattes & Offman, 2007). These studies thus provide evidence for the criterion validity of the SSEI, also in clinical samples, specifically of women who have experienced abuse. One possible valuable use of the SSEI could be to help clinicians better understand what areas of sexual self-esteem they can target to help patients improve their sexual experiences and relationship quality.
More evidence of the usefulness of the scale can be seen in research looking at changes over time in scores on the SSEI. In one study on sexual self-esteem and cosmetic surgery in which women completed the SSEI before and after undergoing cosmetic surgery, their scores were higher post-surgery, suggesting that sexual self-esteem can change over time and that certain interventions can be efficacious at improving sexual self-esteem (Esmalian Khamseh & Nodargahfard, 2020). In another study looking at adolescent sexual self-esteem and sexual experiences over a 9 month period, researchers found that compared to their baseline scores, adolescents who had engaged in their first sexual experience during the study period had increased scores on the subscales of skills/experience and moral judgment after their first sexual experience (Swenson et al., 2012). Thus, we can see further evidence for the criterion validity of the SSEI when it has been used longitudinally to examine how life events can influence levels of sexual self-esteem. Therefore, when combined with the reliability analyses presented in the current study, the findings in previous studies which utilise the SSEI provide preliminary evidence for the reliability and validity of the scale and its use as a multidimensional measure of sexual self-esteem. It should be noted, however, that further tests of validity are necessary (Finch & French, 2018;Hussey & Hughes, 2020), as, for example, there has been no follow-up work on test-retest reliability and measurement invariance. Most papers reported just the internal consistency of the scale, and while a five factor structure was supported in the initial validation (Zeanah & Schwarz, 2019), subsequent work has not thoroughly examined support for its five factor structure (factorial validity). In sum, a truly valid measure should do much more than exhibit a good Cronbach's α (e.g. Borsboom, 2005;Finch & French, 2018;Hussey & Hughes, 2020;Markus & Borsboom, 2013) and we call for more research on measurement of the SSEI.

Limitations
There are several limitations to the current meta-analysis. First, we were unable to retrieve the reliabilities for ten studies that had used the SSEI, even after contacting the corresponding authors, but we attempted to adjust for this via use of a fail-safe N analysis. For all of the analyses (on the entire scale and the subscales), the fail-safe N analysis suggested that between 13 and 46 studies would need to be added to reduce the Cronbach's α to an unacceptable level, but note the limitation of these techniques (e.g. Becker, 2005). A second shortcoming is that we only examined one aspect of measurement: reliability with Cronbach's α, a measure which in itself is limited in capturing reliability (e.g. Dunn et al., 2014;Sijtsma, 2009). A good measure should do more than just exhibit a high α (e.g. Finch & French, 2018;Flake & Fried, 2020;Hussey & Hughes, 2020). For example, in our case it should exhibit the same five factor structure in each study and across populations. This should be tested using confirmatory factor analysis (e.g. Loehlin & Beaujean, 2017) and measurement equivalence (e.g. Vandenberg & Lance, 2000) to determine, for example, if we are measuring the same five factor construct in a clinical vs. a student sample. This is next to other aspects, such as, for example, test-retest reliability over time (e.g. Finch & French, 2018). Third, most samples were collected from Western, Educated, Industrialised, Rich and Democratic (WEIRD) populations (Henrich et al., 2010). Most samples are also based on students, a widespread issue for social psychology and more broadly the social sciences (e.g. Arnett, 2008;Peterson, 2001;Pollet & Saxton, 2019;Schultz, 1969;Sears, 1986). There were, however, several samples from Iran and other non-English speaking countries and some samples from clinical populations. The reliability and validity of the scale should be examined further in such diverse samples. Finally, many authors collapse the SSEI into a single score rather than treating it as separate subscales in a multi-dimensional measure, as the original creators of the inventory intended. This could potentially cause problems because some subscales may not be correlated. One example is the experience and the moral judgment subscales. Some individuals may have many sexual experiences, but not feel morally satisfied with their actions. Thus, we suggest that in the future researchers should use the subscales separately and make specific predictions about each of these based on previous research.

Future directions
Similar to most work in personality and social psychology (Hussey & Hughes, 2020), most papers reported Cronbach's α but provided only limited information on other aspects of measurement, for example, factorial validity. There are thus several future directions that could result from our synthesis. First, it would be interesting to examine measurement equivalence in clinical versus student samples. For example, do the factors correlate in similar ways in each of these populations? To answer this question, the inventory will need to be utilised in more clinical studies and in clinical studies with larger samples. A second population of interest is men. The current study revealed that there are relatively few studies that have used the SSEI in studies with men, perhaps unsurprising considering that it was originally validated on a sample of women, although the measure does not appear to have gendered items. Men's sexual self-esteem is an understudied topic in the literature. Although some studies have examined sexual self-esteem in men who have sex with men, and how this relates to their sexual practices (Kvalem et al., 2016;Stokes & Peterson, 1998;Traeen et al., 2014) little research on heterosexual men's self-esteem has been conducted (for one example, see Ménard & Offman, 2009). The five dimensions on the SSEI could provide insight into men's views of their sexual self-esteem and how it is associated with various antecedents and outcomes, similar to the ways in which it has been used in research on sexuality in women. A study validating the scale with a representative sample of men, could be a valuable next step for researchers interested in studying men's sexual self-esteem. In addition to these two directions, further work is needed to address other aspects of validity of the scale.

Conclusion
The SSEI is an important and useful measure for researchers interested in human sexuality. It captures an individual's own view of their sexual practices, attractiveness, control in sexual interactions, moral judgements about their sexuality, and the adaptiveness of their sexual practices. Such information may be key in understanding both adaptive and risky sexual practices for clinicians, researchers, and public health officials. Our meta-analysis shows that the SSEI has good reliability in terms of Cronbach's α and both the short and long-forms can be used as translations in different countries and in diverse populations.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Notes on contributors
Alaric Lloyd graduated with a BSc in Psychology from Northumbria University in 2020. His interests lie in sexuality, personality and forensic psychology Genavee Brown is a psychology lecturer and social psychology researcher.
Thomas Pollet is a Professor in psychology. His research focuses on social relationships.