The Strengths and Difficulties Questionnaire: the factor structure in a sample of Korean immigrant parents in New Zealand

ABSTRACT OBJECTIVE: The Strengths and Difficulties Questionnaire (SDQ) is a widely used, brief, 25-item instrument for screening for adaptive and problematic behaviour in children and adolescents. Despite its widespread application in child and adolescent research, concerns regarding the construct validity of the instrument have been expressed. Further, to date, limited research has been conducted using Korean and Korean immigrant samples to provide data about the reliability and validity of the SDQ and the factorial structure of this instrument. The purpose of this study was to examine the construct validity, based on pre-existing models suggested by the extant literature, for the parent-informant version of the questionnaire. METHODS: A sample of 207 Korean immigrant parents in New Zealand completed the SDQ for their children (ages 6 and 10). The resulting data were subjected to confirmatory factor analysis (CFA), testing four competing models: a three-factor model (internalization dimension, externalization dimension, and a prosocial factor), a five-factor model (emotional symptoms, peer problems, conduct problems, hyperactivity, and prosocial), a six-factor model (a separated uncorrelated method factor, four symptom factors, and the single prosocial factor), and a hierarchical model in which the four first-order problem-oriented scales form a higher order difficulties factor. RESULTS: CFA of the SDQ partially supported the traditional five-factor conceptualization of the SDQ, although some modifications were necessary to reach an acceptable fit. Reliability was a concern particularly for Emotional Symptoms and Peer Problems. CONCLUSIONS: The use of the revised five-factor model of the SDQ in the present setting should be interpreted with caution. Some items need to be further evaluated and revised to capture the originally intended constructs.


Introduction
Asians are one of the fastest growing ethnic groups in New Zealand, making up 12% of the population. The Asian population grew by 33% between 2006 and 2013 (from 117,000 to 472,000), which is the highest growth rate of any New Zealand ethnic group [1]. Korean immigrants, who are the focus of this paper, comprise the fourth largest Asian ethnic group in New Zealand, with 30,200 people [1]. The rapid increase in contemporary immigration has resulted in immigrant children representing the fastest growing segment of the child population in New Zealand. Compared to non-immigrant children, these children are more likely to exhibit behavioural and emotional disturbance. Recent research confirms that immigration results in tremendous stress for children [2]. The stress may originate from leaving a familiar social context, entering a new country, or adapting to a new cultural environment, which often includes a new school, new language, and different moral values [3]. Although immigrant children generally experience many of the same stressors as non-immigrants do, and they may experience an additional stressor due to their minority position in the mainstream society [3]. Ultimately, these stressors may lead to a child developing psychological, behavioural, and social problems [3].
There is some evidence that young immigrant children are at risk of having difficulties with behavioural adjustment. For example, a longitudinal U.S. national study found that immigrant children (African, non-Hispanic, Asian/Pacific Islander) tended to experience more behaviour problems and less social competence than did European-American children upon entering kindergarten [4]. With specific reference to Korean-American adolescents, it has been reported that they tend to experience more mental health problems than do both European-American adolescents [5] and Chinese and Japanese American adolescents [6].
Although child and adolescent psychiatric disorders (e.g. depression, anxiety) are common and treatable, their symptoms often remain undetected and undiagnosed, thus leaving the children untreated for their mental health problems. In fact, young children and adolescents have relatively low access to mental health services in general [7]. One of the causes of this low rate is because both primary prevention (the prevention of the onset of a targeted condition) and secondary prevention (the identification and treatment of asymptomatic persons who have already developed risk factors but in whom the condition is not clinically apparent) are not well developed in child and adolescent mental health settings [7]. Therefore, validated instruments with the potential to aid in the early detection of children and adolescents at risk for developing psychological and behavioural disorders are of crucial importance. The Strengths and Difficulties Questionnaire (SDQ) [8] was developed in response to the need for a reliable screening instrument to assess social, emotional, and behavioural problems in children and adolescents aged between 4 and 16 years. Its use resides in the possibility of identifying young children who could need further assessment and treatment.
In spite of the strong clinical use of the SDQ [9], studies examining its proposed factor structure have yielded mixed results. A number of exploratory factor analyses (EFAs) have found support for a five-factor structure [10,11]. Although other studies that used a confirmatory factor analysis (CFA) have also confirmed the five-factor structure [9,12], Dickey and Blumberg [13], for example, failed to replicate the original five-factor structure of the SDQ in a U.S. sample. Using EFA and CFA, they found that a three-factor structure consisting of internalizing problems (emotional problems and peer problems), externalizing problems (hyperactivity and conduct problems), and a positive construal factor consisting of prosocial items provided a superior fit to the data. Dickey and Blumberg [13] acknowledged that their failure to replicate the predicted five-factor solution observed in European samples might be because several of the items from the original British version were modified to be more understandable to American parents. Another study by Palmieri and Smith [14] found that the six-factor model, which contained five-correlated factors and a positive construct method factor, fitted the data better than did both the threeand five-factor models.
As the SDQ was first developed in the United Kingdom, recent studies have raised questions of whether it is equally valid in all cultures [7,13]. Although the Korean version of the SDQ (SDQ-Kr) has been used in a small number of studies in Korea [15,16], to date, no studies could be identified that assessed the psychometric properties of the SDQ with a Korean immigrant sample. Further, no studies have examined psychological and behavioural adjustment in Korean immigrant children in New Zealand. Therefore, the purpose of this study was to examine the construct validity of the SDQ [8,11] with a sample of 207 Korean immigrant parents. The fit of four competing models of the SDQ's factor structure, as suggested by the extant literature for the parent-informant version of the SDQ, was tested and compared using a range of fit measures and CFA.

Participants
This study was part of a larger survey study of the parenting practices of Korean immigrant parents in New Zealand [17]. The current study extends the previous work by comparing the goodness-of-fit of competing models using data for 207 parents.
The total sample consisted of 207 Korean immigrant parents (128 mothers and 79 fathers) in New Zealand with a child aged between 6 and 10 years. All parents were born in Korea and had lived in New Zealand for an average of 7.4 years (SD = 5.2). At the time of the survey, average age was 33.9 years (SD = 12.5) for mothers and 34.3 years (SD = 11.0) for fathers. The parents were well educated with 99% of fathers and 89% of mothers had a university degree or higher. Annual family incomes were between $NZ 60,000 and $NZ 80,000. All parents had one child between 6 and 10 years old who was the target child for this study. The average age of these children was 7.8 years (SD = 1.8). There were 121 girls and 86 boys.

Procedure
The study's procedures were approved by the University of Auckland Human Participants Ethics Committee (UAHPEC) in New Zealand in 2011 (Ref: 2010559). Written informed consent was obtained from all participants.
After approval from the UAHPEC, participants were mainly recruited with the cooperation of the Korean religious organizations and Korean language schools in New Zealand. The questionnaire package, including an introductory letter describing the study, the questionnaire, and the self-addressed stamped envelope, was distributed by Korean community leaders to eligible parents via post. Other participants were recruited through online postings on Korean community websites, newspapers, and in places frequented by Korean parents. Interested parents contacted the researcher by phone or email to obtain questionnaires via mail. Completed questionnaires were mailed to the researcher within a month.

Measure
The Korean version of the SDQ was used. The Korean version of this scale has been validated in a small number of studies in Korea [15,16]. Parents were asked to complete the parent version of the SDQ and rate their answers on the basis of the child's behaviour over the last six months. Each of the 25 items is rated on a 3point scale with the following responses: 0 = not true; 1 = somewhat true; and 2 = certainly true. Five of the items are positively worded and reverse-scored. The reversed-scored items are one item from the conduct problems scale, two items from the hyperactivity scale, and two items from the peer problems scale. Each of the five subscales is scored by adding the responses of the constituent items. A total difficulties score is obtained by summing scores for all items except for the prosocial items. Subscale scores range from 0 to 10, and the total difficulties score ranges from 0 to 40. Higher scores on the problem-oriented subscales are indicative of more behavioural problems. Higher scores on the prosocial scale indicate more positive behaviour.

Statistical analyses
The statistical analyses in this study were performed with IBM SPSS (IBM SPSS Statistics V.19 for Windows; IBM, New York, New York, USA) and AMOS v20 [18]. In order to determine the requisite sample size of the study, an a priori power analysis was conducted using G power 3.1. It was not appropriate to calculate margins of error for this sample due to the non-representative, non-random processes used to assemble it. Nevertheless, with 207 participants, correlations r > .14 would be statistically significant. The current sample of 207 participants provided a power of .995 to detect a medium effect size of f 2 = .15 (Cohen's effect for R 2 ). Hence, the actual total sample size of 207 of this study was expected to have sufficient power.
Missing data were minimal in this study. The largest number of missing cases was 15, less than 5% of the total number of cases in the data set. Missing data occurred for 15 participants who intentionally or unintentionally skipped or refused to answer some questions. Missing data were replaced using the Expectation Maximization (EM) algorithm [19].
CFA was used to test a series of alternative plausible models for the structure of the SDQ. CFA is normally considered large sample methods (N > 500) [20], because of the large number of parameters being evaluated (i.e. means and variances for all observed and latent variables [including residuals], as well as regression weights and intercepts for all path weights) [21]. However, the procedures can be viable with smaller samples, although there is a greater risk that improper solutions (e.g. ultra-Heywood negative error variance or inter-correlations > 1.00) will occur. Small sample CFA can be facilitated by several techniques. One of the techniques used in this study was maximum likelihood estimation. Using maximum likelihood estimation, which is robust for non-normality, will assist in ensuring estimable models [22].
Rather than estimate models for mother and fathers separately, a model that did not distinguish between the two groups was estimated simultaneously to improve the probability of generating a proper solution. In each of CFA model, multiple fit indices and their respective cut-off were used to evaluate the global model fit to the data. These include the chi-square (χ²); comparative fit index (CFI); goodness-of-fit index (GFI); root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). The χ 2 statistic falsely punishes models with large sample sizes and degrees of freedom. The lower limit of acceptable CFI values is .80 [23], although Marsh, Hau, and Wen [24] recommended a .90 cutoff criterion for this index. In practice, a value of RMSEA less than .05 indicates a close fit of the model, though values between .06 and .08 suggest reasonable error of approximation. GFI should by equal to or greater than .90 to accept the model [24,25]. SRMR values below .05 are considered indicative of good fit and values between .05 and .10 reflect moderate fit of the data [25,26].
In addition, due to the large-model conditions and a rather small sample size, a Swain correction was calculated using Boomsma and Herzog's [27] R-function, in order to obtain proper estimates of the model statistics and indexes.
When certain parts of the model did not show acceptable fits, items with statistically non-significant path loadings and items with cross-loadings to others factors or with strong modification indices (i.e. >20) were deleted to ensure that the model had an acceptable fit. The following models were tested, and the fit indices were compared to assess how well each model fit the data. Model 1 is a three-factor first-order solution [13], consisting of an internalization dimension (i.e. five emotional symptoms and five peer problems items), an externalization dimension (five hyperactivity and five conduct problem items), and a prosocial and positive construal factor (five items). Model 2 is the traditional five-factor model postulated by Goodman [11], with each factor comprising five items, in which the relationships among the five factors are explained by their inter-correlations [14]. Model 3 is the six-factor model suggested by Palmieri and Smith [14] and includes all five-correlated factors while also specifying a separate uncorrelated method factor on which all five prosocial items and the five reverse keyed problem-oriented items are loaded. The final model tested is a hierarchical factor analytical model that corresponds to Goodman's [8] claim that the four first-order problem-oriented scales represent a higher order difficulty. Because the SDQ emphasizes both behavioural/emotional difficulties and strengths of children, this model postulates that the prosocial scale is a conceptually different construct that forms a separate strengths factor that is lateral but correlated with the second-order difficulties factor [28].
After examining the goodness-of-fit for alternative measurement models for the SDQ, the most appropriate model was selected and descriptive statistics computed. Although Cronbach's alpha is the most commonly used measure of internal consistency reliability, a number of problems arise from its use. For example, information regarding the internal structure of an instrument is not provided by alpha [29]. The omega coefficient is thought to be a better estimate of the reliability. Hence, the omega coefficients for each scale were calculated.

Results
Model fit indices for the competing models can be found in Table 1. A summary of the descriptive statistics can be found in Table 2.
Looking at the results across all models, the GFIs for the models did not meet the required cutoff values ( Table 1). The three-factor conceptualization proposed by Dickey and Blumberg [13] provided the worst fit compared to all other models. The traditional five-factor conceptualization had a better fit to the data but was still inadequate. Five items did not reach the target factor loading of .30, and one item from conduct problems loaded on more than one factor, while one item from emotional symptoms loaded onto the conduct problems factor somewhat differently than the original five-factor SDQ. Almost all items from the traditional five-factor model loaded on their respective factors, and the average factor standardized factor loading was .59, .52, .59, .62, and .53 for the emotional symptoms, peer problems, conduct problems, hyperactivity and prosocial dimensions, respectively.
The six-factor model, which encompassed a separate uncorrelated method factor, was found to fit the data marginally better than the traditional five-factor model, as evidenced by the decrease in the value of chi square and the improved CFI, GFI, RMSEA, and SRMR; however, the fit indices did not meet the accepted fit criteria. All five of the items comprising the prosocial factor had higher loadings on their original factor than they had on their method factor. Although the six-factor model suggested by Palmieri and Smith [14] had a slightly better fit with the data, many of the items did not load on their predicted factors, and 11 items had loadings less than .30. Finally, the hierarchical model, which contained a secondorder factor labelled "difficulties" and a correlated first-order factor labelled "strengths," was also found to fit the data worse than the method factor model.
The results for the five-factor model led to two items being removed from emotional symptoms (i.e. "Nervous and clingy in new situations" and "Many fears, easily scared"), two items being removed from peer problems (i.e. "Picked on or bullied by other children" and "Gets along better with adults than with other children") and one item being removed from conduct problems (i.e. "Steals from home, school or elsewhere") ( Figure 1).
After removing these items, the CFA results indicated that the five-factor solution fit the data marginally well (χ 2 = 348.8; df = 160; CFI = .77; GFI = .92; RMSEA = .076; SRMR = .077) and adjustments to fit indices using the Swain correction value = .962 likewise produced marginally adequate fit (χ 2 = 335.4; df = 160; CFI = .81; GFI = .92; RMSEA = .073). The results showed that the sample size of 207 is not so very small since the Swain correction factor is .962 which is close to 1.00 meaning that the small sample had no impact.
All items had statistically significant loadings on the relevant factor. As expected, the inter-correlations among the emotional symptoms, peer problems, conduct problems, and hyperactivity were all negatively correlated with prosocial behaviours, but these four behavioural difficulties were positively correlated with each other (Table 2). This result suggests that the prosocial subscale is a conceptually distinct construct that represents a "separate and positive construal" factor. The inter-correlations among the five factors were mostly small to moderate (. 16-.73). For the current study, only three subscales (conduct problems, hyperactivity, and prosocial behaviour) had α > .70. However, the mean-item correlations were in the optimal range of .20 and .40 for most of the scales in this study [30].

Discussion
The purpose of the current study was to examine the structural validity of the parent-informant version of Notes: k = number of items; df = degrees of freedom; CFI = comparative fit index; GFI = goodness of fit index; RMSEA = root mean square error of approximation; SRMR = standardized root mean residual. a Represents a final revised model used in the study.
the SDQ with a sample of 207 parents. This study confirmed that the original five-factor structure of the SDQ proposed by Goodman [11] was the best option, although some modifications were necessary to reach an acceptable fit. In this study, consistent with previous research [28], the conduct problems item "Steals from home, school or elsewhere" showed limited variance, with only 1.5% of the parents endorsing this item; the rest of parents responded that their child did not show this behaviour. The study revealed that removing this item led to a marginal improvement in reliability, even though the resulting scale contained only four items. It would be expected that in mainstream populations, few parents would report this behaviour; hence, the utility of this item in nonextreme parenting situations is reduced. Two items (bullying other children and getting along better with adults than other children) from the peer problems scale failed to load on their respective factors. It is possible that these anti-social behaviours are more difficult for parents to observe and rate than they are for teachers. For example, teachers see children interacting more with other children in the classroom, and getting along with adults may indicate positive relationships with teachers in a formalized setting. Another possible explanation is that these behaviours are more influenced by the setting (e.g. school versus at home) or that the subjective norms of parents and teachers differ more for these types of behaviour. In support of this notion, Stone et al. [31] speculated that the low internal consistency on the peer problems scale, as rated by parents, may be because parents are poor judges of children's peer relationship interactions.
Moreover, two items from emotional symptoms (e.g. nervous or clingy and many fears) also failed to load on their respective factors, and these findings have been reported elsewhere [32]. While Thabet et al. [32] also confirmed the original five-factor structure of the Arabic version of the parent-report SDQ, they found that particular items (e.g. "many fears") appeared to have a different function or meaning than what is seen among children and their parents from Western cultures. In the current study, the removal of and the estimated low frequencies for the items of "many fears" and "nervousness-clingingness" may be speculatively interpreted in a cultural context. These areas of emotional development are no longer perceived as the norm by Korean immigrant parents, with less parent-child physical proximity and protection. An alternative explanation is that children of the parents in the present sample may be less exposed to situations where their fears and nervousness are frequently exhibited. Moreover, the current sample consisted of parents whose children's behaviour mostly fell in the normal range, and very few exhibited behavioural problems.
The current study lends partial support to Goodman's five-factor structure, which suggests that the original component scales may be appropriate for a sample of Korean immigrant parents of children aged between six and ten. The finding that the five-factor model is a better fit than the three-factor, six-factor, and hierarchical models is not surprising in light of previous research. Most attempts at replicating the factor structure of the SDQ have essentially confirmed  Goodman's [11] predicted five-factor structure, with minimal cross-loadings, observed among subscales and acceptable model fit [12]. Although this study has provided a more comprehensive assessment of the construct validity of the SDQ and made a novel contribution to the literature surrounding the administration of the SDQ to Korean immigrant children, the study is not without its limitations. First, this analysis is constrained by the characteristics of the sample under study. In particular, the sample was restricted to welladjusted families, thus reducing the generalizability of the results to clinical samples. Additional research with larger, more diverse Korean immigrant samples (e.g. including at-risk children) in New Zealand is needed to confirm the generalizability of the results. Next, McCrory and Layte [28] suggest further revisions of the instrument to consider whether it would be feasible to specifically replace the "Steals from home, school or elsewhere" item with an item that generates greater variability in responses, as it might be less susceptible to socially desirable responding. Another limitation was the low reliabilities of some of the SDQ subscales. Many studies have consistently reported rather lowreliability values for certain subscales (emotional symptoms and peer problems) for the SDQ parent version [33,34]. The low reliabilities for some subscales (esp. < .70) may arise from two sources: (a) having only a three-option response scale that reduces variability in responses and (b) having relatively few items per scale [14,35]. To deal with this issue, the current study used the omega coefficients as an estimate of reliability. The alphas for most of the scales were improved, but the reliability of the emotional symptoms and peer problems could still not be resolved. Therefore, findings for emotional symptoms and peer problems should be accepted with caution. Also, the issue of low reliabilities requires further investigation.

Conclusions
In conclusion, the current study is the first to provide data on the factor structure of the parent-reported version of the SDQ in a sample of Korean immigrant parents in New Zealand. Although some items from the originally proposed scales were found to be inappropriate for the current sample, the revised five-factor structure model partially replicated Goodman's predicted five-factor model of the SDQ. But, it is suggested that the use of the revised five-factor model in the current study be interpreted with caution. It is also noteworthy to mention that the analyses conducted in the current study in addition to its findings add to the evidence that the original five subscales may not always tap distinct constructs. Thus, it is important to consider the cultural context in the interpretation of the screening questionnaire.