Validation of disease states in schizophrenia: comparison of cluster analysis between US and European populations

Background There is controversy as to whether use of statistical clustering methods to identify common disease patterns in schizophrenia identifies patterns generalizable across countries. Objective The goal of this study was to compare disease states identified in a published study (Mohr/Lenert, 2004) considering US patients to disease states in a European cohort (EuroSC) considering English, French, and German patients. Methods Using methods paralleling those in Mohr/Lenert, we conducted a principal component analysis (PCA) on Positive and Negative Syndrome Scale items in the EuroSC data set (n=1,208), followed by k-means cluster analyses and a search for an optimal k. The optimal model structure was compared to Mohr/Lenert by assigning discrete severity levels to each cluster in each factor based on the cluster center. A harmonized model was created and patients were assigned to health states using both approaches; agreement rates in state assignment were then calculated. Results Five factors accounting for 56% of total variance were obtained from PCA. These factors corresponded to positive symptoms (Factor 1), negative symptoms (Factor 2), cognitive impairment (Factor 3), hostility/aggression (Factor 4), and mood disorder (Factor 5) (as in Mohr/Lenert). The optimal number of cluster states was six. The kappa statistic (95% confidence interval) for agreement in state assignment was 0.686 (0.670–0.703). Conclusion The patterns of schizophrenia effects identified using clustering in two different data sets were reasonably similar. Results suggest the Mohr/Lenert health state model is potentially generalizable to other populations.

S chizophrenia is a complex and multidimensional disorder that affects approximately 1% of the world's population (1) and is a leading cause of disability (2). Schizophrenia is broadly characterized by three domains of psychopathology, including negative symptoms (social withdrawal, lack of motivation, and emotional reactivity), positive symptoms (hallucinations, delusions), and cognitive deficits (working memory, attention executive function) (3).
One of the most widely used instruments to measure the severity of schizophrenia is the Positive and Negative Syndrome Scale (PANSS) (4). The scale was developed using the Brief Psychiatric Rating Scale (5) and the Psychopathology Rating Schedule (6), to provide an extensive assessment of schizophrenia symptoms. Originally, the PANSS consisted of three subscales: positive, negative, and general psychopathology (4). However, factor analyses have shown the existence of other components in the structure of the PANSS items. Despite the variety of factor models described, the five-factor model of PANSS (including negative, positive, excitement, depression, and cognitive impairment dimensions) is the most commonly reported and adopted model in the literature (7Á13). Although some studies have found more factors (Van den Oord et al. (14) and Emsley et al. (15) found a six-factor and seven-factor model, respectively), there is now widespread agreement on the five-domain structure.
Factor analyses separate disease symptoms into different domains, but there are questions regarding correlations across these domains, the relative degrees of symptomatology, and whether these patterns and correlations are stable across cultures. In addition, a major concern for health economics evaluation in schizophrenia is the lack of consensus around the definition of disease states that captures the heterogeneity of symptoms.
Few studies have addressed this question of heterogeneity using cluster analytic techniques and none, to our knowledge, has compared the resulting patterns across studies and cultures. Chouinard and Albright (16) conducted a cluster analysis of end-point PANSS scores on 135 patients with chronic schizophrenia. The authors identified five clusters, but only three clusters (mild, moderate, and severe symptoms) could be evaluated due to the small sample size. Dollfus et al. (17) used Ward's method of cluster analysis on PANSS scores of 138 patients. They evaluated five subtypes of schizophrenia: positive, negative, mixed, disorganized, and schizophrenia with few symptoms. Lykouras et al. (18) also used Ward's method on PANSS scores in 255 psychiatric inpatients with schizophrenic disorder. They identified five groups of patients: the first group comprised patients with overall psychopathology of minimal severity; the second group patients with severe positive symptoms along with symptoms of psychomotor excitement; the third group patients with severe positive psychopathology only; the fourth group patients with severe positive negative depressive and cognitive symptoms; and the fifth group patients with severe negative symptoms only. These studies have limitations due to their relatively small sample sizes for clustering methods, making analyses less stable and less generalizable to other populations.
In 2004, Mohr/Lenert (19) used data from a 1-year clinical trial that collected PANSS scores and costs on 663 US patients with schizophrenia and conducted a k-means cluster analyses on PANSS scores for items in five-factor domains. Statistical analyses first led to a six-state framework, updated to an eight-state framework after expert review, with varying levels of positive symptoms, negative symptoms, and cognitive impairment. Lenert et al. (20,21) published additional results, such as utility values for each of the eight disease states estimated by 620 members of the general public using the standard gamble method. Utility weights for the eight schizophrenia health states ranged from 0.44 to 0.88 and were used in several cost-effectiveness analyses evaluating treatments for schizophrenia (22,23). While factor analyses of the PANSS showed remarkable stability of the structure across international populations (the five dimensions being positive symptoms, negative symptoms, cognitive impairment, hostility/aggression, and mood disorder), it has not been shown whether multidimensional disease states similar to those found in Mohr/Lenert would be obtained in a European population. The validation of such classification in Europe would benefit both clinicians and health economics actors, such as modelers or decision makers. Clinicians may consider these health states to classify patients based on their clinical profile, to understand disease progression, to better define the severity of patients' symptoms, and to prescribe appropriate treatments. Health economic modelers may require a validated classification for elaborating a relevant model structure, in view of future treatment economic evaluations that will be reviewed by decision makers.
The objective of this analysis was to reassess the factor and cluster analyses performed by Mohr/Lenert (2004) and the state assignment rules developed in this cohort, using data from the European Schizophrenia Cohort (EuroSC) (24). This may be useful for defining health states based on severity of symptoms in the context of pharmacoeconomic model development.
The EuroSC participants were selected to provide a representative sample of the European population with schizophrenia. As such, the data provided a good opportunity to verify the stability of the composition of patient subgroups across two different cultures in a second large data set and to validate the use of the Mohr/Lenert classification (19) in Europe for economic analyses.

Data source
Data from the EuroSC (24) were used for this study (N01,208). EuroSC was a naturalistic follow-up of a cohort of people aged 18Á64 years, suffering from schizophrenia and in contact with secondary psychiatric services. The principle objective of the EuroSC was to identify and describe the types of treatment and methods of care for people with schizophrenia and to correlate these with clinical outcomes, states of health, and quality of life (24Á31). Participants were interviewed at 6-monthly intervals for a total of 2 years, until 2002.
The study was carried out from 1998 to 2000 in nine European centers that covered France (N0288), Germany (N 0618), and Britain (N0302). Each of these areas covered an urban center of approximately 1 million inhabitants living in a city or in medium-size towns. In each area, patients treated in the 'psychiatric sector' (32) were identified according to the following criteria: diagnosis of schizophrenia according to the DSM-IV criteria (33) and aged 18Á64 years. Random sampling from these patients was used to generate a representative sample; only minor clinical and sociodemographic differences were observed between patients from the different countries (33).
This cohort was conducted in accordance with the Declaration of Helsinki and French Good Clinical Practice (34,35). The protocol of this study was approved by the institution review board or the ethics committee responsible for the participating hospital or institution. Written informed consent was obtained from each participant after the study details had been fully explained.

Statistical analyses
The presented statistical analyses were performed on the EuroSC data only. The results were then compared with the Mohr/Lenert (2004) data findings.

Factor analysis
The first step was to determine the key PANSS elements that describe a domain and to verify whether the EuroSC data led to similar results as those defined in Mohr /Lenert study (2004). A principal components analysis (PCA) with Varimax rotation on standardized PANSS scores was conducted using combined visits data. Factor loadings and eigenvalues for each of these domains were provided.

Cluster analysis
As previously performed by Mohr/Lenert (2004), the second step was to conduct a k-means cluster analysis on the sum of standardized PANSS scores within each of the five domains derived from the PCA. The aim of the cluster analysis was to group subjects into similar categories of disease symptoms.
The cluster analysis was performed with SAS version 9.3 proc fastclus, following MacQueen's (1967) k-means methodology, using an algorithm in which each item is assigned to the cluster having the nearest centroid (mean). An optimal cluster center minimizes the sum of squared distances. As the number of clusters increases, the root-mean-squared distance to the cluster center (the root-mean-squared error [RMSE]) declines.
The root-mean-squared distance was described by the number of clusters, to examine the rate of change of RMSE terms by the number of clusters.
Description of clusters using severity of symptoms The next step was to describe the clusters as disease states and check whether they corresponded to those derived by Mohr/Lenert (2004). Each cluster was described by three levels of severity Á low, moderate or high Á in terms of the first three factors accounting for positive symptoms, negative symptoms, and cognitive impairment. Levels were assigned according to cluster center values and confirmed by average domain scores within clusters.

Association between EuroSC clusters and Mohr/ Lenert disease states
The association between EuroSC and Mohr/Lenert clusters was assessed using severity levels of symptoms in the first three domains. These are presented in a contingency table (Table 4). Further description of this step is provided in the Results section.
Agreement among the cluster assignment rules A combined model was created that represented the synthesis of both the EuroSC clusters and the states in Mohr/Lenert model. Individual patients were assigned to these composite states based on the closest EuroSC cluster center and using the rules described for cluster assignment in Mohr/Lenert. These rules (Appendix 3 of Mohr /Lenert 2004 (19)) define cut-off points for low, moderate, and high symptoms for each of the five domains considered in the framework.
The level of agreement between the two models in assignment of individual patients to states was estimated using the FleissÁCohen weighted kappa statistic coefficient (36).
Statistical analyses were performed using SAS software, version 9.3.

Results
Factor analysis As found in several previous studies (37) and in the Mohr/ Lenert (2004) study, the PCA with Kaiser criterion conducted on EuroSC data retained five factors, accounting for 56% of total variance, to describe the structure of PANSS scores. These factors corresponded to positive symptoms (Factor 1 Á eigenvalue: 4.646), negative symptoms (Factor 2 Á eigenvalue: 3.954), cognitive impairment (Factor 3 Á eigenvalue: 2.966), hostility/aggression (Factor 4 Á eigenvalue: 2.705), and mood disorder (Factor 5 Á eigenvalue: 2.679). Table 1 presents the factor loadings of these five domains. The factor analysis was highly consistent with numerous other studies (7Á13). PCA revealed the same five-factor structure, with the same items representing the factors as in the Mohr/Lenert (2004) study, except for item G12, lack of judgment and insight. This item has a loading value slightly greater on the positive factor than on the cognitive impairment factor (0.46 vs. 0.40).

k-Means cluster analysis
A k-means cluster analysis was conducted on the sum of standardized PANSS scores within each of the five domains. As an example, negative symptoms score was defined as the sum of motor retardation, active social avoidance, blunted affect, emotional withdrawal, poor rapport, passive/apathic social withdrawal, and lack of spontaneity and flow of conversation scores. Items used to define other symptom domains are presented in bold in Table 1.
RMSE and rate of decrease by number of clusters are presented in Fig. 1 the number of disease states, as the rate of decrease of RMSE is sharply reduced at two clusters. However, when examining the curve on rate of decrease of RMSE (in green in Fig. 1), the rate of decrease increases slightly after three, four, and five clusters and then decreases after six clusters. This result suggests that there is no further improvement after six clusters. In agreement with the Mohr/Lenert (2004) study, we assumed that patients with schizophrenia may be optimally described according to six subgroups.

Description of clusters
In this section, the described clusters were derived using the same five-factor model as in the Mohr/Lenert (2004) study.

Characterization of clusters
Cluster center locations (standardized values) according to the first three domains (negative symptoms, positive symptoms, and cognitive impairment) are reported in Table 2. A level (low, low to moderate, moderate, moderate to high, or high) on the three domains was assigned to each cluster using the corresponding cluster center values. A low symptom level was assigned when the center value was close to the minimum value of the six clusters, a moderate symptom level was assigned when close to the value midway between the minimum and the maximum, and a high symptom level was assigned when close to the maximum. Severity levels assigned to each domain are represented by colors in Table 2.
The disease severity in clusters varied from mild disease (Cluster 5), with low symptoms in all three domains, to high disease (Cluster 2), with high symptoms in all three domains. Clusters 4 and 6 were considered as moderate disease states, although both positive and cognitive symptoms were estimated as low to moderate in Cluster 6 and moderate to high in Cluster 4. The two remaining groups were both severe, with positive symptoms predominant in Cluster 3 and both negative and cognitive symptoms predominant in Cluster 1 (Table 3). Similar levels could be assigned using average domain scores (negative symptoms, positive symptoms, and cognitive impairment) on each cluster (Table 3).

Association and agreement between EuroSC clusters and Mohr/Lenert disease states
In the Mohr/Lenert (2004) study, States 5 and 7, as well as States 3 and 6, were initially grouped together. These clusters were distinguished at a later stage after recommendations by the clinical panel.
The cross-tabulation of the Mohr/Lenert six-cluster model by the EuroSC clusters model is presented in The remaining EuroSC Clusters 3 and 4 were both severe. As levels on positive and cognitive symptoms in EuroSC Cluster 4 ranged from moderate to high, Cluster 4 was assigned to Mohr/Lenert Cluster E (combining States 5 and 7). EuroSC Cluster 3 was then assigned to Mohr/Lenert Cluster C (combining States 3 and 6) ( Table 3).
We then assigned individual patients to these composite model states based on the two assignment rules. The  resulting FleissÁCohen weighted kappa coefficient (95% confidence interval) was estimated at 0.686 (0.670Á0.703).

Discussion
Developing models that map disease-specific measures such as the PANSS to health states for the purpose of economic evaluation, for example, is a difficult task.
The intent of such cluster analysis is to individualize homogeneous clusters of patients whose disease might be determined by biological process, differing from one group to another. The approach proposed by Mohr/Lenert considered 'big data' statistical methods to identify disease states in schizophrenia through cluster analysis. These disease states were subsequently converted to health states (full descriptions of the quality of life of the individual) for value rating tasks. The question posed by reviewers at the time of publication of the two paired papers (19,20), and today, is whether clustering or statistical methods that use natural covariance to define disease states produce disease states that are generalizable to other populations or merely statistical summaries of the data from one trial. That is to say, can the disease states identified in one trial be reused in others as part of a mapping function?
Using data from the EuroSC, a 2-year observational study of 1,208 patients with schizophrenia, the same clustering method as in the Mohr/Lenert (2004) study was used to verify the stability of the structure and the multidimensional disease states across international populations. Although the EuroSC cohort was conducted about 15 years ago, the evaluation of the severity of symptoms, as assessed by the PANSS, has not changed. Patients still have negative symptoms, positive symptoms, or cognitive symptoms and therefore such old data may not be considered as a limitation.
As previously revealed (37) with various populations and cultures, the present factor analysis suggests that dimensions of schizophrenia as measured by the PANSS are well represented by a five-factor structure, corresponding to positive symptoms, negative symptoms, cognitive impairment, mood disorder, and hostility/aggression. The resulting EuroSC five-factor model is similar to the Mohr/Lenert (2004) findings, with the same items representing the factors, except for item number G12 (lack of judgment and insight), which presented a loading value slightly greater on the positive factor than on the cognitive impairment factor (0.46 vs. 0.40).
The cluster analysis is a way of looking at how covariance of symptoms among dimensions in the PANSS is manifest. As in the Mohr/Lenert (2004) study, the present k-means analysis identified six distinct disease state clusters among patients suffering from schizophrenia and in contact with secondary psychiatric services in Europe.
One limitation of the Mohr/Lenert disease states is that they were defined only according to the first three principal component domains identified and did not account for variability in the mood disorders and hostility/ aggression seen in schizophrenia. This is a limitation as the presence of mood disorders or hostility/aggression are considered as major treatment outcomes by some clinicians and should be used for the definition of disease states. However, Mohr/Lenert (2004) justified this decision, stating that the addition of two more domains would have tripled the number of disease states, even if only two levels were used to describe these domains, making preference rating of the states and application of the model to clinical trial data impractical. Our analysis parallels the Mohr/Lenert (2004) approach. However, understanding which states are comparable with each other in two different analyses over five separate domains might be difficult. To verify whether the cluster analysis on EuroSC resulted in disease states similar to the Mohr/Lenert (2004) study, each cluster was described by its corresponding severity level within each of the positive symptoms, negative symptoms, and cognitive impairment domains.
The description of clusters by their corresponding severity level (low, moderate, severe) of positive symptoms, negative symptoms, and cognitive impairment was assessed using the cluster center locations on the domains. This approach may be discussed, but when describing the average scores of negative, positive, and cognitive symptoms across clusters, it was verified that the scores increased with the attributed severity level. EuroSC clusters were assigned to Mohr/Lenert states using the associated levels of positive symptoms, negative symptoms, and cognitive impairment. It should be noted that the EuroSC clusters model did not fit the Mohr/Lenert clusters model perfectly, probably due to differences in the range of symptom severity in the two populations. In particular, EuroSC Cluster 3 was assigned, with high positive symptoms, to Mohr/Lenert States 3 and 6, with low-to-moderate positive symptoms. However, when we aligned EuroSC Cluster 3 with Mohr/Lenert States 5 and 7 and aligned Cluster 4 with Mohr/Lenert States 3 and 6, substantial or good agreement in state assignment between the EuroSC model and the Mohr/ Lenert assignment model was estimated (FleissÁCohen's kappa coefficient at 0.665 [0.648Á0.682], interpreted using the commonly cited scale (38)). Some variability should be expected, given the differences in populations in the two studies, cultures, and differences in training of the raters measuring PANSS scores.
An advantage of having cluster definition rather than only dimension scores is that participants can be classified with several types of symptoms. Although a lot of studies use dimensional scores (positive and negative subscores of PANSS), clusters with a possibility of mixed profiles may provide a better understanding of the disease. Indeed, it is widely recognized that the number of factors used when considering the PANSS could influence the identification of subtypes of schizophrenia and/or the psychopathological processes underlying them, which may influence prognosis, therapeutic approaches, response to treatment, and prediction of related variables (8, 14, 39Á41).
The strengths of this study include the use of a large highly generalizable cohort of patients with schizophrenia and the use of a validated statistical method, particularly appropriate to highlight the heterogeneity in schizophrenia. Indeed, k-means clustering approach ensures minimal variation within the clusters but a maximal variation among the clusters, creating homogeneous subgroups.
This study also has a number of limitations. Most of them are shared with the Mohr/Lenert study (2004), as the quantitative analysis used in both studies was similar. Moreover, there is some question of the validity of the PANSS rating in the real world versus randomized clinical trials. In addition, it is noteworthy to mention that EuroSC considered patients from the United Kingdom, France, and Germany, but none from late entrants in Europe, with less advanced healthcare. Therefore, the generalizability of such clusters should be taken with caution in such cases. In conclusion, this study compared two sets of disease states for schizophrenia using the PANSS derived in the Mohr /Lenert study (2004) and in the EuroSC study. Both the factor structure and the number of discrete clusters required to explain variation in symptom levels in empirical models were similar in US and European populations. The resulting substantial agreement in assignment suggests that disease states obtained using k-means clustering from the PANSS are comparable between US and European populations, as are the state assignment rules developed from using each data set. The present findings provide additional support for the validation of the Mohr/Lenert classification and confirm that symptom severity in schizophrenia is characterized by heterogeneity and that similar patterns of heterogeneity exist in two different data sets across two different cultures. Further study is needed to determine if the patterns reflect different phenotypes of the disease, and investigation into the biological basis for symptom clustering should be explored.