Doubly blind: a systematic review of gender in randomised controlled trials

Background Although observational data show social characteristics such as gender or socio-economic status to be strong predictors of health, their impact is seldom investigated in randomised controlled studies (RCTs). Objective & design Using a random sample of recent RCTs from high-impact journals, we examined how the most often recorded social characteristic, sex/gender, is considered in design, analysis, and interpretation. Of 712 RCTs published from September 2008 to 31 December 2013 in the Annals of Internal Medicine, British Medical Journal, Lancet, Canadian Medical Association Journal, or New England Journal of Medicine, we randomly selected 57 to analyse funding, methods, number of centres, documentation of social circumstances, inclusion/exclusion criteria, proportions of women/men, and reporting about sex/gender in analyses and discussion. Results Participants’ sex was recorded in most studies (52/57). Thirty-nine percent included men and women approximately equally. Overrepresentation of men in 43% of studies without explicit exclusions for women suggested interference in selection processes. The minority of studies that did analyse sex/gender differences (22%) did not discuss or reflect upon these, or dismissed significant findings. Two studies reinforced traditional beliefs about women's roles, finding no impact of breastfeeding on infant health but nevertheless reporting possible benefits. Questionable methods such as changing protocols mid-study, having undefined exclusion criteria, allowing local researchers to remove participants from studies, and suggesting possible benefit where none was found were evident, particularly in industry-funded research. Conclusions Social characteristics like sex/gender remain hidden from analyses and interpretation in RCTs, with loss of information and embedding of error all along the path from design to interpretation, and therefore, to uptake in clinical practice. Our results suggest that to broaden external validity, in particular, more refined trial designs and analyses that account for sex/gender and other social characteristics are needed.


Introduction
Randomised controlled trials are thought to provide the strongest research evidence of clinical potential or efficacy for medical interventions. By randomly assigning subjects to intervention and control groups both the characteristics of interest but also those that are unidentified should be equally distributed across study arms, allowing researchers to eliminate the effect of individual and social characteristics not being studied.
To examine rather than control for the impact of social traits on health outcomes requires a very different approach. Socio-economic status (SES), race/ethnicity, sex/gender, or social connectedness, for example, must then be measured and considered as independent covariates that alter health outcomes. In reality social traits are not independent but act interdependently to shape opportunities and constraints that may alter gene expression, risk, compliance, access to care, and pathways from exposure to illness (1,2). Study protocols and inclusion criteria should be designed accordingly with enrolment that is large enough to allow for disaggregated analyses of, for example, results for women and men. The strength of randomisation is that baseline although not necessarily static social determinants will be equally distributed and Global Health Action ae Global Health Action 2016. # 2016 Susan P. Phillips and Katarina Hamberg. This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), allowing third parties to copy and redistribute the material in any medium or format and to remix, transform, and build upon the material for any purpose, even commercially, provided the original work is properly cited and states its license. eliminated as sources of bias, while researchers manipulate or control exposures of interest. The weakness is that unless they are managed as variables for analysis the very real impact of those social circumstances on the study endpoint, and interactions with the intervention of interest are hidden. Resulting study findings will then speak only of efficacy in a population cleansed of personal traits and social circumstances, but not of effectiveness and external validity in the real world where no one is devoid of such characteristics as sex/gender or SES.
Lifetime fluctuations in and the social nature of circumstances like SES are readily apparent; however, placing sex and gender among these may require explanation. Sex and gender are two separate but intertwined terms used for categorisation and analyses of men and women. Sex refers to biological attributes and is primarily associated with physical and physiological features, including chromosomes, hormone function, and sexual anatomy. Gender goes beyond biology and refers to the socially constructed roles, behaviours, expressions, and identities of girls, women, boys, men, and gender diverse people. It influences how people perceive themselves and each other, how they act and interact, and the distribution of power and resources in society. Gender is not static and not something a person possesses, it is rather an activity. The concept of 'doing gender' manifests this. Doing gender most often incorporates, but can also challenge explicit and implicit social norms, constraints, and expectations that alter ways of behaving and acting as men and women (3Á5). Like SES, gender will vary across settings and over time. Although the impact of a group or society's gender norms is not ubiquitous or homogeneous, there are commonalities arising from the experience of being, for example, a woman within a given grouping. Furthermore, like SES, gender can, although it does not always, affect health and wellbeing. For example, in many cultures girls are undervalued relative to boys and are therefore fed less. Similarly, women in many countries are less educated than men, not because of limited individual capacity but because societal and cultural norms imply that higher education is only for men. In countries where women are well educated, or even more educated than men, they are still often under-represented in high-ranked and well-paid jobs, suggesting that gender inequality is not restricted to developing countries but is a worldwide phenomenon. Women currently outlive men globally; however, their longevity advantage has and continues to fluctuate with time and social circumstances (6). This life expectancy difference likely arises from how men and women live their lives, which work and activities they engage in, and risks they are exposed to or take (7Á9). Changes in other social circumstances may change the expectations of and roles consigned to men or to women and may accordingly change their doing of gender, illustrating that gender is not a fixed characteristic (5).
In reality, it is seldom possible to isolate sex from gender, as biology interacts with social and environmental living conditions, and sex and gender become tangled together. Genes can be activated or shut down (temporarily or permanently) by environmental factors and ways of living. Since men and women often live different lives with different duties, demands, and resources, this newer epigenetic knowledge contributes to insights about how gender and sex are intertwined. Therefore, using the terms sex and gender has to be done with caution. In this article, we use the term 'sex-specific' when talking about diseases or conditions that are restricted to either men or women, like prostate cancer or preterm birth. Gender is used as a separate term when we talk about bias related to being either women or men Á because the creation of gender bias is by definition a social process whereby preconceptions and ideas about women and men skew research, investigations, or treatment. Sex or gender are also used as separate terms as the authors of a specific reviewed paper did so. However, we prefer sex/gender as a term that recognises that biology shapes social context which, in turn, shapes biology.
Within a randomly selected sample, exposures being examined may not have uniform or homogeneous effects across social groupings such as sex/gender, race or SES, of study participants (10). If variability is not individual but instead arises from a characteristic of a social subgroup, the statistical independence of participants will be jeopardised. Failure to recognise that subjects may not be independent will introduce error despite randomisation (10). Observational research has demonstrated strong and extensive health effects arising from and associated with membership in the groupings, 'women' and 'men'. Significant sex/gender differences have been well documented with respect to, for example, heart diseases (11) and type 2 diabetes (12). Pharmacokinetics may differ for men and women, as can benefits, side effects, and adverse reactions to drugs (13). Unequal access to medical care for women and men in many settings is of importance in understanding treatment and health outcomes. Finally, gender bias has been demonstrated in clinical decision-making (14,15). Although it does not always alter health, evidence is strong enough to justify considering sex/gender whenever possible, as a modifier of the relationship between intervention and medical outcome (16).
To directly study the impact of any characteristics on a particular outcome in experimental designs requires the ability to randomly allocate these to participants. Although not fixed, social characteristics cannot be randomly assigned. Nevertheless, it is possible to estimate how traits like sex/gender or SES alter outcomes rather than dismissing them as topics not worthy of study in a particular trial. At a minimum, examining interactions of these with independent variables will hint at their effect. Powering a study to enable a priori randomisation of recruited women and men separately so that the study intervention can be examined both within and across these groupings and to assess interactions of social determinants with other variables will increase accuracy and meaning (17Á19).
The aim of this systematic sampling review of recent randomised controlled studies (RCTs) is to determine whether and how the social traits of sex/gender are addressed in design, analysis, and interpretation of study findings and to then consider the meanings and impact of the methodologies used. We selected sex/gender for specific examination because the categories 'man' and 'woman' were the most commonly identified social characteristics in the reviewed RCTs.

Sample selection
In September 2013, we searched PubMed using the terms randomised controlled trial, clinical trial, human, Annals of Internal Medicine, British Medical Journal, Lancet, Canadian Medical Association Journal, New England Journal of Medicine and the search filters clinical trial, September 2008Á1 July 2013. We then sorted the initial 588 papers retrieved by date of publication to randomise papers from each journal and ensure sampling of the entire time frame, and selected every 20th paper for inclusion. In January 2014, a similar search with date limits of July 2013 to 31 December 2013 yielded another 124 papers, of which every fourth paper was selected for review. Recent papers were oversampled to ensure that findings reflected most current research methodology.
When a selected paper was not an RCT (n 06), the next paper on the list was substituted. To establish our analytical method and construct a data extraction template, five studies were reviewed by both researchers, then three more were reviewed independently, and thereafter discussed for concordance of data extraction. After each reviewing another 10 and 9 studies, respectively, both authors again checked for inter-reviewer consistency in approach, information extraction, and interpretation of findings, then reviewed 30 more papers (15 each) independently. All in all, 57 papers were included in the analysis. Sample selection is summarised in Fig. 1.

Data extracted and analysis
Although many authors in the sample used the concepts 'sex' and 'gender' as interchangeable synonyms and without defining them when describing inclusion, exclusion, or effects of being men or women, we used the combined term sex/gender in our data extraction templates. Authors generally documented body mass index (BMI) only in studies of diseases where weight was an important factor in a biological sense. However, as BMI is also strongly related to and entangled with the level of education, family economy, and other aspects of SES, we included it as a social factor when extracting data. We noted inclusion/ exclusion criteria for each social category (sex/gender, race, SES, social connectedness, health behaviours, education, BMI). After identifying that sex/gender was the most commonly recorded of these social categories, we focused further analyses on how sex/gender was or was not addressed in research design, analysis, and interpretation. To do this, we re-read and re-extracted the following data from all papers: inclusion/exclusion criteria, whether the study topic was sex-specific, number and proportion of women and men in the study, and how sex/gender was addressed and reported in results and discussion. A narrative summary of methodological strengths and shortcomings overall was also documented for each study.

Results
The 57 trials reviewed are summarised in Table 1 (20Á76). In total, six papers reported on sex-specific conditions (36,47,48,58,68,74). Table 2 identifies social characteristics documented across the reviewed papers. Most frequently recorded was sex/gender (52/57, 91%), followed by race or ethnicity (30/57, 53%), or some aspect of SES (9/57, 16%). Although BMI, a possible proxy for SES, was sometimes recorded its use was solely as a physical indicator of risk. Social connectedness and past or present adversity, both known determinants of health, were not documented in the reviewed trials. Thirty-five studies (61%) had agerelated exclusions, few of which were necessary given the condition being assessed. One trial excluded women without explanation. In this study of diabetes, subjects recruited through primary care practices were all men (39). Conversely, in what was designed as a sex-specific study of the impact of peer support for mothers on maternal and child morbidity and mortality in Malawi, researchers allowed men to participate in women's support groups (47). Pregnant women, or those who might become pregnant, were excluded in 13 of the 54 studies (24%) of either women alone or both men and women. Table 3 documents whether and how sex/gender was addressed in the studies reviewed. We considered papers where women and men were included in proportions ranging from 40 to 60% as having equal representation. Of the 51 non-sex-specific trials, 20 (39%) enrolled women and men in roughly equal proportions, 22 (43%) included more than 60% men, in 5 (10%) more than 60% of participants were women, and sex/gender proportions were not documented in three studies conducted among adults and two among children. Not noted in Table 3 is that the majority of papers in the sample used the concepts 'sex' and 'gender' as undefined synonyms when describing inclusions, exclusions, results, and in discussions.
Explicit and implicit methodological aberrations were not uncommon (see Table 1) particularly with respect to selection of participants. In five papers, for example, there were unclear reasons for exclusions or high numbers of unexplained dropouts among those already enrolled (24,30,44,52,55). In a different study, 371 (Â25%) of those randomised to the treatment group yet none in the control group were unavailable to consent and were therefore excluded (53). In another study almost 80% of recruits were removed (44). We noted that arbitrary and ill-defined options to exclude participants at intake or after were more common in trials with some or all funding from industry (90%) (24, 30, 39, 52Á55, 60, 73) than solely from public sources (44). For five of the eight trials that were not blinded, there was no reason related to the intervention itself that would preclude this standard methodology (48,53,56,59,75).
Next we examined whether and how sex/gender differences were analysed and included in results. Of the 49 of 51 studies on non-sex-specific conditions that included both women and men, only 10 (20%) used these categories to differentiate findings either via disaggregating data (n08) (20, 34, 63, 67, 69Á72) and/or by examining interactions between sex/gender and other variables of interest (n03) (26,63,70).
Aspects of sex/gender that were described in outcomes were sometimes discussed (20,51,57,63,67,71). Also, in one of the six studies of sex-specific conditions there was discussion of whether gender aspects like social roles, opportunities, constraints, and expectations inherent in being a woman might interact with findings (48). Conversely, identified differences between women and men were, on occasion, not reported in results but alluded to in subsequent discussions (48,60) or in appendices (54). Overall, the interactions between sex/gender and other social determinants of health (e.g. SES, education, race) that would enrich understanding were neither included nor discussed as missing explanatory indicators, possible sources of error, or as predictors of the outcome.

Discussion
In this systematic sampling review of recent RCTs, we have assessed whether social traits exemplified by sex/ gender were included in or potentially biased findings. Studies continue to show that women are underrepresented in enrolment (77,78) in National Institutes of Health (NIH) funded clinical research despite funding guidelines (79Á81) in research on specific diseases (82Á84), in analyses (85), and when mortality is an endpoint (86). Our question went beyond inclusion of women to examine whether and how the impact of diversity in baseline social characteristics within study arms was addressed. We selected sex/gender for in-depth examination recognising that inclusion is a prerequisite for but does not alone address the social character of being men and women. It is because dissimilar characteristics of subgroups such as men and women can and do modify outcomes and should be addressed that major    (88). Such resources reaffirm that inclusion alone does not identify impact. Evidence of the differential effect of sex/ gender on, for example, cardiovascular disease diagnosis, treatment offered, and prognosis (19,89), or on pharmaco-dynamics in general is robust enough to recommend inclusion of sufficient numbers of participants to enable subgroup analyses for women and men (13). Our focus was whether and how sex/gender was addressed in clinical trials rather than whether this social characteristic modifies outcomes for specific interventions or diseases. Whether social circumstances are acknowledged as integral to outcomes and elevated to the level of variables for analysis rather than controlled into oblivion is a matter of methodology across RCTs regardless of their specific topic. We therefore did not limit the sample to studies of particular systems and included all RCTs published in the journals and time frame selected. This yielded a mix of individual and cluster randomised trials; public and private funders; and pharmaceutical, technologic, and educational interventions. To the best of our knowledge, no study prior to ours has systematically assessed sex/gender in formulation of the research question, design, analysis, and interpretation in randomly selected studies from high impact medical journals. The composite picture arising is one of loss of information at each step in the path from design, through recruitment, data analysis and interpretation, a loss that can embed error in evidence.

Research question and design
In general, men and women were included, although overrepresentation, usually of men relative to women, limited external validity for many of the trials reviewed. The preponderance of men in several studies of diseases with no male prevalence, studies that claimed random recruitment of participants, raises questions about interference in the selection process (54,73) as did subjective and undefined exclusions (24, 30, 39, 44, 52Á55, 60, 72, 73). In one study, the removal of almost 80% of recruits can only be assumed to introduce selection bias (44). As mentioned earlier, arbitrary and ill-defined options to exclude participants at intake or during the study were more common in trials with some or all funding from industry (90%) (24, 30, 39, 52Á55, 60, 73) than solely from public sources (44). A European study of interventions to decrease cardiovascular mortality among diabetics did not mention why all participants were men. Being a woman was not among exclusion criteria, recruitment occurred in general practices where women are well represented, neither the interventions the disease studied nor the outcome were specific to men, and there  Includes studies of sex-specific conditions. Susan P. Phillips and Katarina Hamberg was no indication in the title or abstract of the exclusion of women (39). This was the sole study to demonstrate a total blindness to sex/gender that was criticised widely and became a reason for denial of public funding more than 20 years ago in the United States (90). In contrast, a study of whether social networking can increase HIV testing among African-and Latino-Americans intentionally limited research on a non-sex-specific illness or intervention to men who have sex with men (74). By being explicit about reasons for selecting an all-male population, the authors minimised bias in their research question. The study by Lewycka et al. (47) on whether peer health education of women could increase breastfeeding rates and decrease morbidity or mortality in Malawi illustrates sex/gender bias via over-inclusion. Allowing men to attend peer groups for women in a setting where women generally lack autonomy created the potential for silencing or coercion of female by male participants. Put another way, having men participate in what should be a study of women's health education introduced potential bias arising from gender inequality. This was not considered in the paper.

Analysis and interpretation
Data related to sex/gender, although available, were often neither utilised nor discussed. Authors might argue that such data served as evidence that randomisation controlled the impact of baseline characteristics like sex/ gender via equal distribution and that funding limitations precluded powering studies to examine different outcomes for women and men. However, one could suggest that such an argument is evidence of, and will perpetuate blindness to the impact of sex/gender on health outcomes. There were no examples of a priori consideration of sex/gender by randomisation within the groupings 'men' and 'women' rather than in the sample as a whole. In the 8 studies where results were disaggregated for men and women these findings were generally not discussed and none considered reasons for sex/gender differences (20, 34, 63, 67, 69Á72). This silence, although methodologically appropriate (because subgroup analysis was not planned a priori) occurred in studies where enrolment was adequate to identify differences and so, a research opportunity was lost. Three examples illustrate this. A study of new drug regimens for rheumatoid arthritis found women's responses equalled those for men (57). The authors noted that in prior research men had responded better than women to treatment, but made no comment about their novel findings and instead stated that there was no sex differential in response. In a study of interventions immediately following a STEMI (a kind of myocardial infarction) statistically significant differences in response by women and men were dismissed as a chance finding (67). Finally, although the only predictor of prevalence of trachoma infection was the proportion of women in each randomised cluster studied (p 00.002), this was ignored while statistically insignificant variability across treatments (p!0.99) was summarised as possibly of importance (38).
At times, authors' statements did not match findings, putting a positive spin on interventions of no benefit or possible harm. New treatments that were no better than controls were termed 'non-inferior' (see comments in Table 1). Although overstating benefit was most common in tests of new drugs, two publicly funded studies highlight how unshakeable beliefs shaped reporting (47,68). Both studies assumed that increasing breastfeeding rates would improve infant health in Africa. In each, the proportion of breastfed babies did increase; however, when there was no subsequent change in designated health outcomes authors hypothesised that there were likely other, non-measured advantages to breastfeeding. The evidence of no benefit from breastfeeding was, for whatever reasons, deemed unacceptable to report.
Social characteristics such as sex/gender are not always modifiers of the relationship between interventions and health outcomes. In treatment trials for endocarditis (42) or pancreatic cancer (69), sex/gender as a determinant or an effect modifier seemed unlikely. However, when sex/ gender, race or SES matter will not be detected by randomising them into hiding. Without comment from authors it is impossible to determine what reasoning preceded limiting analyses to the group as a whole or if the impact of social circumstances was considered. Only by including subgroup analyses can researchers ascertain whether and when the measurement of social traits is relevant.

Limitations
Drawing general conclusions about research methodology from analyses of 57 studies must be done with caution. However, by randomly selecting among the 712 RCTs identified, the sample studied should be representative of all papers identified in the initial search. In addition, interim findings after analysing 28 of 57 papers did not change substantially when all reports were included, making it seem that we had identified a pattern and could generalise from it.
One might argue it is difficult to analyse research from the varying medical areas included in this review in the same way. However, the point here was not to evaluate the relevance or best way to analyse sex/gender in one and each disease or condition. Instead, we searched for patterns and an overview of whether researchers seem aware of, and addressed, the fact that the groups men and women are not identical but instead often differ with respect to important biological characteristics and social and environmental living conditions.

Conclusion
Few of the random samples of all RCTs published in five high-impact journals over 5 years assessed whether social Sex/gender in RCTs Citation: Glob Health Action 2016, 9: 29597http://dx.doi.org/10.3402/gha.v9.29597 circumstances altered outcomes. Fewer still attempted to interrogate findings with respect to sex/gender (or race or SES) and identify whether sex/gender acts as a modifier of the pathway from intervention to outcome. The theoretical robustness of the RCT is that it mimics animal experiments by controlling for, or randomly distributing and, therefore, removing unidentified and unmeasured human variability as sources of error. Baseline characteristics such as SES, race and sex/gender are, however, known determinants of health that can modify the relationship between exposures and outcomes of primary interest. Inherent in the methodology of clinical trials is the clean slate of equal background noise or impact of social characteristics like gender in all study arms but also the messiness of failing to hear the noise. It is only by listening by studying rather than randomising non-biological traits away, that research will identify whether and when social characteristics of participants affect outcomes.