The critical interpretive synthesis: an assessment of reporting practices

ABSTRACT The importance of the critical interpretive synthesis (CIS) to review quantitative and qualitative research, and to critically develop new theory, is increasingly recognized and evidenced by the increase in published CIS reviews. However, the flexibility embedded in the method hampers its implementation and exacerbates concerns about trustworthiness. This paper seeks to determine the extent of transparent reporting and soundness of execution in published CIS reviews by developing assessment criteria based on CIS key features. We analyzed 77 CIS reviews published between 2006 and 2018 for their reporting practices. Findings indicate that reporting practices of CIS key features are suboptimal. We recommend that authors better document their CIS to increase the transparency of their study and suggest authors to rely on described guidelines to select and conduct their CIS. To this end, our reported evaluation criteria could assist authors, reviewers, and journal editors in their evaluation of the quality of CIS studies.


Introduction
Over the past years, various review types have emerged aiming to synthesize qualitative or quantitative research, or a combination of both (for an overview of existing review methods, see Schick-Makaroff et al., 2016;Tricco et al., 2016a). This recent availability of new review types, however, causes confusion as to what designs could be considered as a sound method for reviewing the literature (Templier & Paré, 2017). In addition, these misconceptions create discord on what should be reported regarding key methodological aspects, quality, and characteristics of transparency of the various review genres (Paré et al., 2016;Templier & Paré, 2017).
In response to this need Paré et al. (2016) and Templier and Paré (2017) generated guidelines to increase trustworthiness in literature reviews in terms of transparency and systematicity. Paré et al. (2016) define transparency as the reproducibility of the review process that includes explicit reporting of the procedures and analysis techniques used in each step of a review. Systematicity or soundness of execution refers to the internal validity of the review. To ensure internal validity, fit-for-purpose methods need to be used and well-executed. In other words, authors need to use appropriate processes to search, screen, analyze, and interpret relevant information in order to achieve specified goals (Paré et al., 2016).
Additionally, Straus et al. (2016) clarified the methodological aspects of various review types by operationalizing different reviews identified through the use of a scoping review. According to their scoping review, two main groups of knowledge synthesis methods can be identified: (1) whether or not quantitative and qualitative research can be integrated, (2) and whether or not the review is used for establishing or refining a theory or phenomenon. There exist only four review types that are able to do both: the critical interpretive synthesis (CIS), the integrative review (IR), the realist review (RR), and the narrative synthesis (NS) (Tricco et al., 2016b).
CIS is a relatively new review type to provide synthesising arguments in the form of a coherent theoretical framework from both qualitative and quantitative research. CIS distinguishes itself from the other review types through its emphasis on theory development, critical orientation, and flexibility (Dixon-Woods et al., 2006b). One of CIS' greatest advantages, flexibility, is also one of the greatest disadvantages associated to the use of CIS since it may introduce ambiguity in how the review is applied and reported in research (Kastner et al., 2016). While the IR, RR, and NS have undergone evaluations and further developments that clarify the method (Campbell et al., 2019;Dixon-Woods et al., 2005;Whittemore & Knafl, 2005), similar work is missing for CIS. Yet, the degree of flexibility offered by CIS may exacerbate concerns about trustworthiness, emphasizing the need for evaluations and further developments in terms of guidelines.
Additionally, Templier and Paré (2017) assessed transparency in review studies and found that theory development reviews in general (such as a CIS), are usually less explicit about their review process. They add that future assessments should focus on the systematicity of the review as well. In fact, there is an agreement that recent review types remain underdeveloped and under-evaluated in terms of their purpose and processes and more operationalzations of the steps of the review types are needed (Dixon-Woods et al., 2005;Tricco et al., 2016b).
This study aims to answer some of the needs expressed by previous scholars in three ways. First, we develop criteria to evaluate the transparency and systematicity of CIS reviews. These evaluation criteria serve as an indication of the quality and trustworthiness of CIS and may assist referees and journal editors in their assessment of CIS designs.
Second, this study adds upon much requested and needed empirical work clarifying review types, their purpose and processes, identifying its key characteristics and evaluate its use in practice.
Finally, we develop a hierarchy of the key features of a CIS. To increase the overall quality and trustworthiness of the review, every aspect of the review should be implemented, consistently executed (i.e. systematicity), and clearly reported (i.e. transparency) (Paré et al., 2016;Templier & Paré, 2017). Each review type is characterized by different key features that allow to distinguish review types. However, some features are more central to the review design than other features and central features are more important than peripheral features. This allows to identify a hierarchy of features and enables the evaluation of the extent to which central features of the review type have been consistently implemented and clearly reported in research.
In the next section, we discuss transparency and systematicity in literature reviews. This is followed by an explanation of the CIS design. Next, we discuss the method used to evaluate reporting practices of the available CIS literature, followed by an overview of the outcome of the assessment. We conclude with recommendations for future research practices.

Transparency and systematicity
Literature reviews are important to gather and enhance knowledge, develop new theories, and identify avenues for future research (Paré et al., 2016). The trustworthiness and quality of a literature review hinges on the soundness of execution (i.e. systematicity) and explicitness of reporting (i.e. transparency). Similar to original research articles, systematicity and transparency with regard to the data and applied analysis techniques are essential to evaluate study quality and enhance reproducibility of literature reviews. To reduce bias, error, and misinterpretation that can otherwise limit the value of the review and its use for future research, implemented processes and analysis techniques should be well executed and explicitly reported (Altman & Moher, 2013). It is recommended that research articles that do not live up to these standards are rejected for publication by referees and journal editors. A lack of sytematicity and transparency in a literature review limits its reproducibility and, therefore, the overall quality of the review. Reproducibility of literature reviews should focus on the applied methodological design as such and not on the findings since this is highly dependent on the author's judgment (Templier & Paré, 2017).
In their assessment of reporting practices of various review types, Templier and Paré (2017) found theory development reviews to be less explicit in their description of their literature search and selection strategy, quality assessment, and data analysis compared to other review types such as meta-analysis or scoping review. Furthermore, they discuss various good practices and generate recommendations to increase transparency in theory development reviews. First, they state that the iterative nature of the literature search and selection in these review designs should be made clear by reporting each decision the authors make. Doing so allows the reader to judge if the applied processes, techniques and decisions are appropriate for the research question. Second, the authors should report whether or not a quality assessment is carried out when searching and selecting articles for further analysis, and explain the quality assessment strategy. Third, theory development reviews should be explicit in describing the used analysis techniques for data extraction by explaining the coding procedure used to identify themes and concepts as well as how the literature was categorized.
In addition, Paré et al. (2016) suggest guidelines on how to increase transparency and systematicity in various review types. Their main argument is to be explicit in every decision and revision made during the review process, even though they acknowledge that flexibility with regard to the application of their guidelines may be necessary. In this aspect, they describe their guidelines as a living document that is applicable for reviews adopting a detailed, sequential and a priori plan to a highly iterative plan (such as a CIS).
While these recommendations set the base for reporting standards, they remain broad and lack a detailed description of which features are central to a specific review type. A key aspect of systematicity is selecting the appropriate review design based on the research question. Everything else that follows, from the development of a review plan, searching the literature to dataanalysis and synthesis, is guided by the selected review design (Paré et al., 2016). It is, therefore, important to emphasize the specificities of the review type and the key features that accompany it in reporting guidelines.
In this paper, we focus on one specific review genre, the CIS. This relatively new review type is increasingly being used by various scholars across disciplines and offers methodological means to produce explanatory theories from a large and diverse set of literature (Bales & Gee, 2012). Over the years, CIS became the most reported knowledge synthesis method (i.e. starting from 1970 to 2011) 1 (Perrier et al., 2016). Furthermore, CIS is recognized as "one of the best study designs used to provide a fresh interpretation of the data rather than a summary of results, as is often the case with other review types" (Jarvis, 2017, p. 3). The importance of the CIS is thus increasingly recognized, whereby providing prospective authors with more guidance regarding the execution of a CIS will improve its implantation in practice and increase awareness regarding the trustworthiness of CIS.

The critical interpretive synthesis
The CIS provides a systematic, empirical method for combining both qualitative and quantitative forms of research (Bales & Gee, 2012). The technique builds on existing review designs, including an adaptation of the meta-ethnography, and uses analysis techniques from the grounded theory and processes (i.e. gathering of literature) from the systematic review (Bales & Gee, 2012; Barnett-Page & Thomas, 2009;Dixon-Woods et al., 2006b).
Similar to an IR, RR and NS, CIS allows for the integration of quantitative and qualitative empirical studies into a single review (Tricco et al., 2016b). Combining both research results is considered as a major advantage. The primary authors of the CIS state that this may provide "insightful and illuminating ways of understanding the phenomena" (p. 40) making it especially useful and relevant for policymakers, practitioners, and researchers (Dixon-Woods et al., 2006b).
A number of distinctive features differentiate CIS from other similar review types. One feature is its orientation towards theory development. A RR focuses on testing the applicability of a theory under various circumstances, and a IR is aimed towards summarizing theories and evidence to provide a comprehensive understanding of various complex concepts, theories, or problems of a particular phenomenon (Dixon-Woods et al., 2006a;Tricco et al., 2016b;Whittemore & Knafl, 2005). NS is often used in systematic reviews and allows to develop a theoretical model focusing on how, why, and for whom an intervention works (Campbell et al., 2019). Instead, CIS applies an inductive approach aiming to create an overarching theory by integrating different theoretical categories, generated from the available evidence leading to a more profound understanding of the topic under study (Dixon-Woods et al., 2006b;Schick-Makaroff et al., 2016).
In contrast with other review types, that limit their critical orientation to the exclusion or inclusion of papers in their analyses, the critique in a CIS is "a key part of the synthesis, informing the sampling and selection of material and playing a key role in theory generation" (Dixon-Woods et al., 2006b, p. 6). The dynamic and iterative process of question formulation, source search and selection, and analysis distinguishes it from other review methods in which systematic searches and appraisal techniques are applied (Dixon-Woods et al., 2005). The need for flexibility in CIS is explicitly acknowledged, fuels the development of emerging theoretical notions, and guides the search of articles (Dixon-Woods et al., 2006b;Schick-Makaroff et al., 2016).
Overall six general activities can be identified which represent the dynamic process of the CIS: (1) Open research question: CIS starts with the formulation of an open research question which will be refined during its execution and will not be finalized until the end of the review.
(2) Literature search: a broad searching strategy (e.g. website search, reference chaining, contacting experts) is initiated in addition to a more structured approach. (3) Literature selection: literature is selected based on likely relevance, including purposive selection with flexible inclusion criteria (not necessarily aiming to identify and include all relevant literature). The ongoing selection of the literature should be informed by the emerging conceptual framework based on the principles of theoretical saturation. (4) Quality appraisal: is based on the content of the paper, its likely relevance, and theoretical contribution to the CIS. Papers that are considered to be 'fatally flawed' may be excluded from the synthesis. To assess the quality of the paper, Dixon-Woods et al. (2006b) used five questions focusing on the aims of the research, research design, research process, amount of data, and method of analysis. The exclusion based on quality is usually deferred until the synthesis phase since papers considered to be methodologically weak may still provide relevant insights regarding the emerging theoretical framework. (5) Data extraction: CIS demands constant reflexivity with an ongoing critical orientation to the material by placing the literature within its context. The construction of a theoretical framework starts with the analysis of the sources, using similar analysis techniques as the meta-ethnography. In doing so recurring themes are identified using language from the studies themselves. (6) Formulation of a synthesising argument: finally, the concepts are constantly compared with the data in order to identify the relationships among them. The aim is to develop a synthesising argument in the form of a coherent theoretical framework including a network of constructs and the relationships between them.

Literature search
Potentially relevant sources were searched in three academic databases: Web of Science, Pubmed, PyschInfo and one web search engine: Google scholar. The search was conducted from October until the beginning of December 2018. Sources were searched from 2006 onwards (i.e. year in which the CIS has been developed). Only one search term was used: 'critical interpretive synthesis' and the term was searched in the title when searching in Google scholar. This yielded 360 results ( Figure 1). After removal of duplicates (N = 160), titles were screened and abstracts were read. Flexible inclusion criteria were used to ensure that a substantial range of CIS reviews was captured. The sources were considered inappropriate if the implementation of a CIS could not be derived from the title or abstract. This lead to the exclusion of 94 articles. If one of the following terms was visible in either the title or abstract, this was considered as sufficient indication that a CIS was implemented: • Critical interpretive synthesis • Systematic review of qualitative research • Interpretive synthesis • Synthesis of qualitative and quantitative research • Critical review The full-text of 106 papers was assessed for eligibility. Another 29 articles were excluded based on the following criteria: (1) No methodological or protocol papers discussing a CIS, since these sources do not include an actual implementation of a CIS; (2) No conference proceedings or presentations, since these sources do not include a full report of the CIS; and (3) Full-text is unobtainable.
Ultimately, we retained and analyzed 77 sources in this paper (for an overview, view Appendix A), including 70 journal articles, three doctoral dissertations, three master's theses and one book chapter.

Data extraction
A data-extraction template was developed to retrieve relevant elements from the sources included for review. The publication year; type of source (e.g., article, book, dissertation); research field according to the Web of Science research areas; search strategy; inclusion of qualitative, quantitative, or mixed-method studies; applied inclusion/exclusion criteria; sampling process; type of quality appraisal; data-extraction and data-analysis techniques; and presence of a theoretical framework were summarized in the template.
This was followed by an assessment of the reporting of the CIS. We relied on the guidelines of transparency and systematicity offered by Templier and Paré (2017) and Paré et al. (2016), and the operationalization of the CIS design by the original authors and various scholars (Dixon-Woods et al., 2006b;Entwistle et al., 2012;Haddrill et al., 2017;Schick-Makaroff et al., 2016;Tricco et al., 2016a) to identify appropriate evaluation criteria regarding transparency and systematicity. Overall, seven key features were formulated. However, not every key feature is crucial in achieving the goal of a CIS. In fact, a hierarchy of key features can be described. Based on the emphasis that is placed on the key features by previous scholars (Dixon-Woods et al., 2006b;Entwistle et al., 2012), the following hierarchy was applied (listed from most to least important): (1) Ft.1 -data-extraction method for identifying themes/concepts: since the goal of a CIS is to formulate a synthesising argument, the identification of themes (using techniques from the meta-ethnography) is required and may therefore be considered as an important element of a CIS.
(2) Ft.2 -formulation of a synthesizing argument: placed second in the hierarchy since the development of a theoretical framework is the major goal of a CIS and cannot be formulated without the identification of themes (cf. ft. 1).
(3) Ft.3 -inclusion of various research results: inclusion of both quantitative and qualitative research is placed third given the emphasis that scholars place on this advantage. (4) Ft.4 -flexible inclusion criteria: CIS reviews utilize flexible inclusion criteria to select sources that are directly relevant to the research question and the emerging theoretical framework. The applied selection criteria thus determine the development of the theoretical framework and are therefore placed fourth in the hierarchy. (5) Ft.5 -quality appraisal: placed fifth in the hierarchy since one can also carry out a CIS without quality appraisal of the included sources. (6) Ft.6 -two-staged sampling process: the two-staged sampling strategy allows the authors to select additional potentially relevant sources that inform the emerging theoretical framework. However, a synthesising argument may also be formulated without a two-staged sampling method. (7) Ft.7 -broad searching strategy: placed last in the hierarchy since the broadness of the search strategy does not guarantee that more relevant sources will be found. Furthermore, a CIS can also be implemented when no broad searching strategy is applied.
The CIS reviews were scored on their reporting of the key features (i.e. transparency) and whether the reported feature was implemented according to the CIS design (i.e. systematicity). Papers either received the score: • 1: transparency and systematicity (T&S), • 2: transparency but no systematicity (T&NS), • 3: no transparency and no systematicity (NT&NS) • 4: no transparency but systematicity (NT&S) A detailed description of the assessment criteria and scoring can be found in Appendix B. Overall systematicity is measured with scores 1 and 4 and transparency with scores 1 and 2. The total score of the CIS was calculated by summing the number of key features that were adequately implemented and transparently reported (score '1ʹ), with scores ranging from 0 to 7.
The first author assessed the quality of reporting in each study retained in the analysis. To guarantee sufficient inter-reliability and an objective assessment of the studies, two independent researchers -with no involvement in writing this paper -conducted an independent assessment on 14,3% of the studies. No significant disagreements occurred (total scores differed by no more than one point on average). The inter-rater reliability (IRR) prior to discussing discrepancies was 75%. After discussion, the IRR increased to 96%, and the operationalization of the assessment criteria of features 1, 2, 4, and 7 was further clarified to ensure an objective assessment of the features.
It is important to stress that the studies are scored based on systematicity and transparent reporting of the key features of a CIS. However, Dixon-Woods et al. (2006b) and Entwistle et al. (2012) often state that a CIS key feature 'may be present'. Our operationalization and assessment of the features are in this sense stricter than initially intended by authors of the CIS (i.e. Dixon-Woods et al., 2006b) but are preferred in light of replicability and transparency of research. For example, the authors of the CIS state that a broad searching strategy should be used. However, the authors did not clarify what should be considered as 'broad'. We therefore consider the presence of at least three different searching methods (e.g., database search, reference chaining, expert consultation, website search) as a broad strategy. Since the aim is to identify as many potentially relevant studies as possible, the strategy is considered to be automatically broad if experts were consulted (in addition to a database search), since they may directly provide authors with additional relevant sources.
In addition, the authors of the CIS emphasize that the literature selection should be driven by likely relevance. We felt that another aspect should be added to this feature, since many CIS reviews applied specific selection criteria. Even though most selection criteria were used to give direction to the selection of the sources (and are in this sense flexible), emphasis on research results (being either qualitative or quantitative) may be considered as contradictory to the CIS. Therefore, selection criteria are considered as flexible when no statements about specific research results are made.
A final note should be made with regard to the extent of transparent reporting of the key features. Even if a key feature is clearly reported, variations are still possible with regard to the amount of detail for describing a feature. Since an objective assessment of the extent of transparency is unfeasible, we assessed whether or not a key feature is clearly described including the elements as described in Table 1, Appendix B. We did not evaluate the level of detail offered by the authors.

Results
A total of 77 studies stated to have applied a CIS. An increase from one study in 2007 to 18 studies in 2018 was visible with a strong increase in CIS studies since 2015. Given the origin of the CIS in health equity research, it is unsurprising that the majority of CIS reviews remains being published within the field of health care/policy sciences and services. In general, three out of four (73%) studies were published in either the Life Sciences & Biomedicine (LS&B) web of science category or combinations of categories that include LS&B. Nonetheless, the publication of CIS reviews in other research areas, such as the social sciences, seems to be increasing since 2013. For an overview of the years and fields in which CIS studies have been published, view Appendix C.
Overall, most of the CIS key features were adequately implemented and reported (i.e. score 1), with the exception of ft.5 (i.e. quality appraisal), ft.6 (i.e. sampling process) and ft.7 (i.e. searching strategy) (Figure 2). Looking at systematicity (i.e. score 1 or 4), we found that approx. 65% of the studies identified themes and reported a synthesizing argument (ft.1 and 2) and over 67,5% included quantitative, qualitative, or mixed-method designs in their review (ft.3). In addition, over 84% of the studies utilized flexible selection criteria according to the CIS method (ft.4). The highest-ranked features are therefore adequately implemented in the majority of the studies. Recordings of systematicity were lowest in ft.5 till 7 with 49% or less of the studies receiving a score 1 or 4.
A similar result is visible when looking at transparency (i.e. score 1 or 2). Over 96% of the studies clearly reported the applied selection criteria (ft.4), regardless of whether or not these were formulated according to the CIS. Approx. 80% of the studies were transparent in their reporting of the applied search strategy (ft.7) and approx. 90% clearly described the applied analysis technique (ft.1 and 2). Transparent reporting was lowest within ft.3, 5, and 6, with approx. 29 till 39% lacking a clear description of these features.
Since all included sources explicitly mentioned CIS as the applied design in their literature review, we expect that the analysis will be performed according to the CIS. Yet, in all studies scored with '2ʹ for ft.1 or 2, authors applied a different analysis technique. This either included a deductive approach where existing concepts, frameworks, or theories were applied to the data in order to generate a synthesizing concept. Or included summarizing methods (i.e. word counts, categorization based on used methods) without using techniques from the meta-ethnography. This, however, stands in contrast with the CIS that argues for an inductive approach, allowing a theory to emerge from the data.
Overall, over half (64.9%) of the studies showed a total score (i.e. number of '1ʹ scores) of 4 or higher. Approx. one out of five (18.2%) studies scored 6 or 7 on the reporting scale of CIS key features. Finally, in four studies, authors did not adequately implement or report any of the features.

A tree map of CIS key features
Based on the hierarchy of key features, a tree map was made (Figure 3). In this tree map, we focus on the highest-ranked features in the hierarchy (i.e. ft.1 till 4), since these can be considered as the features that are most central to CIS and allow to differentiate CIS from other review types. More details of the other key features can be found in Table 2, Appendix D.
In this tree map, we identify various groups, indicated with letters A till F, where A (N = 28) represents CIS studies where authors adequately implemented and reported all the central key features. Seven studies in this group received a score '1ʹ on every key feature. Scholars from eleven studies in the A group were generally transparent in their reporting of more peripheral features (i.e. ft.5, 6, and 7), but varied with regards to systematicity (i.e. receiving either score 1 or 2). In the remaining studies (N = 10), scholars lacked transparency on one or more of the peripheral features (i.e. receiving either score 3 or 4). However, since the central key features are all adequately implemented and reported, we consider these studies as best reporting practices for a CIS review, with emphasis on the studies scoring '1ʹ on every feature.
Scholars from studies included in the B group (N = 3) did not apply flexible inclusion criteria but instead focused on either qualitative or quantitative research results. Consequently, this approach involves a narrower selection of potentially relevant studies, whereby relevant information, informing the synthesizing argument, could be missed. These scholars did combine various research results in their review, either by combining qualitative studies and mixed-method studies  or by focusing on qualitative studies and integrating them with previously found quantitative studies (Flemming, 2010a(Flemming, ,2010b. Since various research results were still included in the review, concepts were identified, and a synthesizing argument was visible, the main CIS objective is still fulfilled. The C group (N = 4) contains studies where scholars did not include various research results in their review. Yet, combining various research results can be considered as something that differentiates CIS from other review types. Not doing so may therefore be considered as a strong counter indication for the appropriateness of the CIS. One may, therefore, question whether a different review type would have been more appropriate here.
Twelve studies are included in the D group. This group contains studies where scholars adequately identified and reported themes and a synthesizing argument. However, scholars of seven studies did not specify what kind of sources were included in their review and therefore remained vague about the inclusion of various research results (i.e. scoring '3ʹ on ft.3). In the remaining five studies, scholars only stated that various research results were included, but did neither provide a clear description about the number of different research results included nor described the characteristics of the included sources. This lack of transparency impacts the overall trustworthiness of the review. The E group (N = 22) refers to studies where scholars included a transparent description of the applied analysis technique for identifying themes and a synthesizing argument. The applied technique, however, differs from the meta-ethnography. In four studies the meta-ethnography technique for identifying themes was reported (i.e. ft.1). The scholars, however, applied a deductive method when identifying a synthesizing argument by placing the identified themes in existing frameworks. These scholars thus combine an inductive approach (i.e. according to the CIS) with a deductive approach. Scholars of another 12 studies did not adequately implement the CIS method, but used a deductive approach or focused on summarizing the included sources. In the remaining six studies, scholars either applied an integrative review focusing on certain concepts or theories, analyzed quantitative and qualitative research results separately or applied the techniques from the metaethnography, but failed to describe themes or a synthesizing argument (cf. supra).
Finally, the F group includes eight studies. In five studies, scholars did neither clearly report the applied analysis technique nor the themes. In the remaining three studies, scholars remained vague about the applied analysis technique despite identifying themes and presenting a synthesizing argument. Since these features are most central to CIS, the trustworthiness and overall quality of these studies can most clearly be called into question.

Limitations
The performed assessment in this paper only considered those studies that self-identify as a CIS in the title or abstract and explicitly mention CIS in their method section. As a result, we may have underestimated the total amount of CIS papers. However, the size of our final database allowed us to make tentative statements about the quality of CIS reviews in general.
Although an inter-reliability assessment was carried out on the scoring of the studies, the selection of papers was undertaken by one researcher (the first author). This may have introduced some bias and subjectivity. Using clear objectives, inclusion and exclusion criteria and a welldefined data-extraction template mitigated some of these risks.
The described hierarchy of key features presented in this article is based on the emphasis placed on these features by the original authors and scholars describing the CIS. We described the hierarchy in the sense that every aspect of the CIS serves to generate a synthesizing argument. However, the hierarchy presented in this paper should not be considered as an established fact and variations may occur depending on the emphasis placed on key features by other scholars. Nonetheless, we believe that ft.1 till 4 can be considered as central to a CIS. These features not only distinguish a CIS from other review types but are also emphasized by the majority of the scholars applying CIS. Therefore, we are confident about their identification as central to CIS and argue that these features need to be transparently described and soundly executed in order to be labelled as a trustworthy CIS review.
A final limitation of this study concerns the operationalization of key features. As previously mentioned, many of the key features are described as something that 'may' be present in the review by the primary authors of the CIS. Some studies may therefore still be considered as a CIS in view of the primary authors, even though some features were not implemented according to our assessment criteria. However, the applied definition for the key features was necessary to allow for an objective assessment of the articles. We further tried to mitigate this limitation by reporting a tree map that visualized how many key features are present and from which point on a study deviates from the CIS and diminishes in overall quality and trustworthiness.

Conclusions and implications
In this paper, we have focused on the reporting practices of CIS reviews and provided an overview of CIS publications since its inception in 2006. Based on the results of the assessment, we found that most CIS reviews are carried out in the field of health care policy and sciences. Yet in recent years, CIS is increasingly applied in other scientific disciplines as well, particularly the social sciences. The recent availability of more methodological articles discussing the CIS method 2 might be a driver for this increase. Consequently, expanding the knowledge of CIS may be an important factor to increase the applicability of the CIS to other science fields.
This paper also established that substantial inter-study variation exists in reporting standards of CIS reviews. Similar to the findings from Templier and Paré (2017) a lack of transparency was mostly visible in the description of the literature selection and quality assessment. Many CIS studies did not specify whether a form of quality assessment was performed. Even though CIS emphasizes relevance to the research question over study quality, it is recommended that scholars report the nature and extent of quality assessment they performed (if any) and/or if they decided to emphasize study relevance over study quality (Templier & Paré, 2017). In doing so we recommend scholars to be mindful that including studies of poor quality may impact their synthesizing argument and ultimately may bias the outcome of the CIS.
In most studies, scholars are transparent about their data-analysis technique notwithstanding variations in the amount of detail that is offered in the description of the coding process. Some scholars limit the reporting of their analysis to a description of the different steps used to identify the themes and did not provide an overview of these identified themes, while other authors provide a detailed description of the identified themes, the higher-order constructs and how these constructs formed the synthesizing argument.
Additionally, in over 23% of the studies, authors applied a different technique than the inductive metaethnography technique used in a CIS. Most of these authors used a deductive approach and used existing frameworks or concepts to analyze their sources. In a large number of these studies, authors aimed to increase the applicability of existing frameworks and guidelines to new areas. Others provided an overview of existing guidelines, frameworks, or theories or aimed to define concepts. These goals, however, do not represent a CIS, but instead fall under other synthesizing review types, such as the RR or IR. The incorrect classification of these studies may be considered as undermining the trustworthiness of the review.
Overall, these variations indicate that the soundness of execution and transparent reporting regarding various key features are suboptimal. On the one hand, this lack of transparency regarding the analysis and synthesis could be due to the emphasis on the 'authorial voice' by Dixon-Woods et al. (2006b) and the flexibility embedded in the CIS. Even though these characteristics are described as key advantages offered by CIS over other review types, they may, in fact, introduce ambiguity in how a CIS is applied and reported in research. Bales and Gee (2012) have previously stated that a CIS has some practical limitations due to the difficulty of implementation. However, transparent reporting of the key features and procedures is key for the trustworthiness and overall quality of a literature review (Kastner et al., 2016;Templier & Paré, 2017). Similar to the metaethnography (France et al., 2014), unambiguous reporting of the process and methodological decisions made during the review process is necessary to unlock the full potential of a CIS.
Scholars should, therefore, be clearly informed about which features need to be applied and reported per type of review. Yet, the lack of transparency of a CIS may also be rooted in journal author guidelines or the review process of the article, and not only due to CIS itself or scholars failing to report key features of their CIS review. In case of well-known review methods, most notably the systematic review, journal editors and article referees are well aware of reporting standards and features that should be reported in order to properly evaluate study methodology, improve transparency, and assess the overall scientific quality of the study. However, when journal editors and referees are much less familiar with the review type, specific demands on transparency are more difficult to make. In some cases, journal author guidelines leave little space for reporting study methodology, effectively limiting authors in their efforts for being transparent.
On the other hand, the lack of systematicity may be due to the various review types that have emerged over the years and the similarities between these review types. Moreover, the research domain in which the study is published may, in fact, influence the review type that is chosen and may be accompanied with specific reporting practices. Nonetheless, the difficulties for scholars to choose an adequate review type and the lack of systematicity in various CIS reviews shows that more clarity is needed regarding the key features of the various review types. In addition, we believe that reporting practices should be of the same quality, regardless of the research domain.
In order to increase the overall trustworthiness and quality of scientific journal articles, we advocate the development of reporting standards that clarify the specificities of review types and provide scholars, journal editors, and article referees with an overview of key features that should be reported. In addition, we encourage editors in their efforts to increase the methodological transparency and systematicity of the articles published in their journal and recommend that journal guidelines reflect their efforts and allocate sufficient space (either in text or in appendix) to describe the methodological choices made by the authors.
The content of this paper increases the current knowledge and applicability of the CIS. This may not only impact the potential number of CIS reviews in other scientific fields but may also be relevant for various journals publishing literature reviews. An overview of the key features of the CIS increases the comprehension of this review design, allowing for an increase in systematicity and transparency when scholars report about the design and journal editors and referees evaluate the scientific quality of the review article. The hierarchy, set forth in our assessment of the CIS key features, may therefore be used as a standard for CIS reporting practices, aiding scholars in their future CIS review. We hope that our study may be considered as a starting point for developing similar studies and standards for methodological reporting in other review genres.
Notes 1. In this paper, CIS reviews are visible before 2006 (i.e. the year in which the CIS was developed). This is due to the fact that the authors categorized the various reviews using an existing overview of review types described by Tricco et al. (2016b). 2. Along with an increase of CIS studies, we noticed that methodological papers discussing the CIS design were published starting from 2012 (N2012 = 1, N2014 = 1, N2016 = 1, N2017 = 3). These methodological papers were found during the search of CIS reviews, and were removed for further analysis based on exclusion criteria 1 (cf. figure 1)

Disclosure statement
No potential conflict of interest was reported by the authors.   Score 1: recurring themes/concepts are identified and the analysis technique (based on the metaethnography, including an inductive approach) is clearly described. Score 2: recurring themes/concepts are identified but a different analysis technique (e.g., summarizing methods such as word counts, categorizations, deductive approach) is described. Or no recurring themes/concepts are identified, but the analysis technique is clearly described. Score 3: no recurring themes/concepts are identified and the analysis technique is not clearly described. Score 4: recurring themes are identified but the analysis technique (based on the metaethnography, including an inductive approach) is not clearly described.

APPENDIX B -Assessment criteria of CIS
(Continued)

Ft. 2 Synthesising argument
Score 1: a synthesising argument is described and the applied analysis technique (i.e. examining the relationship between the concepts, refining the identified concepts, creating higher-order construct, and constructing a conceptual/theoretical framework) is described. The analysis technique is based on the meta-ethnography and includes an inductive approach. Score 2: a synthesising argument is described but a different analysis technique (e.g., summarizing methods such as word counts, categorizations, deductive approach) is used. Or no synthesising argument is visible, but the analysis technique is clearly described. Score 3: no synthesising argument is described and the analysis technique is not clearly described. Score 4: a synthesising argument is visible, but the analysis technique is not clearly described.

Ft. 3 Inclusion of various methods
Score 1: selected studies are specified (either in text, table or in appendix where the number of different research results included in the review are described) and include various research results (i.e. quantitative and qualitative and/or mixed methods). Score 2: selected studies are specified (either in text, table or in appendix where the number of different research results included in the review are described), but only sources with a certain research result (i.e. quantitative or qualitative) are included for further analysis. Score 3: selected studies are not specified. It is unclear if various research results were included. Score 4: authors mention that qualitative and quantitative and/or mixed methods sources were included but do not clearly describe the number of the different research results that were included. Ft. 4 Flexible inclusion criteria Score 1: selection strategy is described either by specifying inclusion criteria that allow for the inclusion of both qualitative and quantitative research results. Or by specifying that the selection of sources is based on relevance to the research question without utilizing specific criteria. Score 2: selection strategy is described and selection criteria are specified but emphasis is placed on specific research results (being either qualitative or quantitative). Score 3: selection strategy is not described. It is unclear how literature was selected for inclusion. Ft.5 Quality appraisal Score 1: quality appraisal is described and based on likely relevance and contribution to the theory that is being developed. Some form of quality appraisal may occur and methodologically weak studies may be excluded. However, emphasis is placed on likely relevance and is also described as such by the authors. Score 2: quality appraisal is described and performed using existing quality assessment instruments. Methodologically weak studies may be excluded, but no emphasis is placed on likely relevance. Score 3: quality appraisal is not described. Ft.6 Two-staged sampling process Score 1: sampling strategy is reported (including a description about the number of sources found and selected in text and/or in flow chart) and includes a two-staged sampling process starting with purposive sampling, followed by theoretical sampling to add, test and elaborate the emerging analysis. Score 2: sampling strategy is reported (including a description about the number of sources found and selected in text and/or in flow chart), but no two-staged sampling method is reported. Score 3: Sampling method is not clearly reported (no clear description about the number of sources found and selected) and no two-staged sampling method is described. Score 4: a two-staged sampling method is reported but no clear description is given about the number of sources found and selected. Ft.7 Broad searching strategy Score 1: at least three searching methods are clearly described (e.g., database search, reference chaining, website search, expert consultation (e.g., professional librarian, team member familiar with the field, information specialist)) including a description of the used search terms, which databases were searched, etc. If experts were consulted (in addition to database search), the search strategy is automatically considered as broad. Score 2: search strategy is clearly described, but less than three search methods were applied. Score 3: no broad searching strategy is visible nor clearly described. Score 4: a broad searching strategy is visible, but not clearly described (i.e. no clear description about used terms, databases, etc.).
Note: A score '4' is not applicable for ft.4 and 5 since a lack of transparency automatically entails that this feature was not specified in the article. Therefore, if these features are not reported transparently, this also means no systematicity was visible.