Validity and utility of instruments for screening of depression in women attending antenatal clinics in Blantyre district in Malawi

Introduction: Screening instruments should be brief, valid and easy to use if they are to be useful in a busy antenatal clinic in low-resource settings. A short instrument can be used in a busy antenatal clinic in combination with a more detailed instrument once referred. This study aimed at assessing the validity of a range of depression screening instruments and to test the utility of combining these instruments for use in antenatal clinics in Blantyre district, Malawi. Methods: This was a sensitivity analysis study using a sub-sample of 97 pregnant women drawn from a cross-sectional study (sample size = 480) that was screening for depression in eight antenatal clinics. Data from the cross-sectional study for the 97 pregnant women on the 3-item screener, Edinburgh Postnatal Depression Scale (EPDS), Hopkins Symptoms Checklist-15 (HSCL-15) and Self-Reporting Questionnaire (SRQ), was compared with a gold standard, the Mini International Neuropsychiatric Interview (MINI). Sensitivity, specificity and area under curve (AUC) were calculated to test for validity of the instruments. The utility of various combinations of the instruments was tested using the compensatory, conjunctive, probability and sequential rules. Results: The 3-item screener, EPDS, HSCL-15 and SRQ were valid instruments for screening antenatal depression. Sequential combination of the 3-item screener and SRQ had superior discriminant ability over similar combinations of the 3-item screener and either EPDS or HSCL-15 (sensitivity = 78%, specificity = 88%, AUC = 0.885). Discussion: The 3-item screener, EPDS, HSCL-15 and SRQ are valid instruments for screening depression in local antenatal clinics. The sequential combination of the 3-item screener and SRQ may be a practical, accurate and suitable method for multistage screening of depression in antenatal clinics in Blantyre district, Malawi.


Introduction
Depression is a mood disorder largely characterised by low mood and lack of interest or pleasure, 1 which can affect women during pregnancy. In sub-Saharan Africa, prevalence of antenatal depression ranges from 21% to 47%, significantly contributing to the disease burden of women. 2,3 Depression may cause fatigue, poor concentration and feelings of hopelessness in a pregnant woman. 4 It is often associated with premature birth, intrauterine growth restriction and low birthweight. 5 However, depression is often under-diagnosed by treating health professionals, 6 especially in antenatal care as is seen in Malawi. In that country, midwives generally focus on the physical health of pregnant women and their babies at the expense of mental health.
Pregnant women with depression can be identified through routine screening in antenatal clinics. 7 An instrument for screening of depression should be accurate, reliable and valid to use in antenatal clinics. Screening instruments cannot be valid without being reliable. 8 A reliable instrument for screening of depression should be able to measure depression in pregnant women consistently. 8 According to Wong and Lim, a valid instrument should have an ability to measure what it is supposed to measure. 9 This is determined by its sensitivity, specificity, positive predictive values (PPV) and negative predictive values (NPV). 9 PPV and NPV measure the likelihood that a positive or negative screening test result is accurate for an individual. 10 An instrument with high specificity and PPV 'rules IN' the disease while the one with high sensitivity and NPV 'rules OUT' the disease. 11 Sensitivity and specificity of a screening instrument are often in balance and can vary depending on cut-off scores.
Optimum cut-off scores are recommended through using a Youden index. 12 For effective depression screening in antenatal clinics in lowresource settings, instruments should be accurate. Accuracy refers to the degree to which a measurement represents the true value of an attribute being measured. 13 This can be determined by comparing results from a screening instrument with results generated by a gold standard using scores for area under curve (AUC), 13 sensitivity and specificity. 14 In this context, the terms accuracy and validity can be used synonymously. Screening instruments that are validated in specific settings such as antenatal clinics have a high likelihood of generating accurate results 15 and may reduce under-diagnosis of depression in these settings. However, screening instruments are not a replacement for gold-standard diagnostic assessments for depression, such as the Mini International Neuropsychiatric Interview (MINI). 16 Lastly, to be effective in a busy antenatal setting, screening instruments should be brief and easy to use. 17 The literature suggests that brief screening instruments have greater utility in low-resource settings. 18 There are reports which show that Edinburgh Postnatal Depression Scale (EPDS) and Self-Reporting Questionnaire (SRQ) have been used in research to detect antenatal depression in Malawi. 2 For these instruments to be considered suitable for use in low-resource settings, they should be easy to administer and acceptable for use by midwives in busy and usually understaffed antenatal clinics. 7 Sometimes brief screening instruments may be considered as too long and time consuming for routine screening, 19 especially in low-resource settings. As such, the use of ultra-brief screening instruments which have a maximum of four items or fewer and requiring less than 2 min to administer can be suitable when using staged screening 20 for depression in antenatal clinics with increased workloads. 17 Screening in stages may involve a two-step process where a short screening instrument is used to identify potential cases. 21 For those who screen positive (cases), a second, often more detailed instrument with greater specificity is used to confirm caseness. 21 This approach may be appropriate in busy antenatal settings which are not directly tasked to screen for depression as a key task. As such, the use of an ultra-brief screening instrument as the first step in screening in combination with a brief screening instrument (to be completed on a smaller group of initial screen positives) may be recommended in these settings. Screening instruments can be combined using compensatory, conjunctive, probability and sequential rules. 22 It is important that if screening instruments or a combination of instruments are considered for screening in antenatal settings, these should be reliable and valid in detecting individuals 23 with depression in this setting. A study was conducted to assess the validity of a range of instruments for screening of depression and to test the utility of combining these instruments for use in antenatal clinics in Blantyre district, Malawi.

Materials and methods
This was a sensitivity analysis study, which used a sub-sample drawn from a cross-sectional study (sample size = 480) that was screening for depression using the 3-item screener, EPDS, Hopkins Symptoms Checklist-15 (HSCL-15) and SRQ in eight antenatal clinics in Blantyre district from January to May 2016. A sample size for this sensitivity analysis study was calculated using a sample size calculator. 24 It was estimated that the prevalence of depression among pregnant women in Malawi is 21%. 2 Using 95% significance level, 7.12% confidence interval, proportion of 21% and 480 (sample size for cross-sectional study) as population, a sub-sample of 100 was calculated to be sufficient for this study. A research assistant randomly selected a subsample of 100 pregnant women who were participating in a cross-sectional study that was going on in the eight antenatal clinics, to be interviewed further by the researcher using the MINI. The research assistant sent every third pregnant woman for further interview using the MINI, after randomly picking the first one until the desired sub-sample for each of the eight antenatal clinics was achieved. Three pregnant women declined resulting in a sub-sample of 97 pregnant women (Ndirande The inclusion criterion for this study was accepting to undergo a further interview on the same day after participating in the cross-sectional study and those who declined were excluded.

Screening instruments
This study used HSCL-15, SRQ and EPDS because they were identified as effective screening instruments for antenatal depression in low-resource settings. 25 The 3-item screener for depression was included because it has been recommended that valid ultra-brief instruments for screening of depression may be more suitable in detecting possible cases of depression in primary care. 26,27 The MINI was also used because it was identified as the most widely used gold standard in low-resource settings. 25 The 3-item screener consisted of two ultra-brief depression screening instruments-Whooley's questions 28 and the oneitem screening question. 6 The Whooley's questions screen for sadness and loss of interest in the past month. The maximum total score for the 3-item screener was 3 and cut-off was set as ≥ 1 because each of the two instruments comprising the 3-item screener have a cut-off = 1. Unlike the 3-item screener, the HSCL-15 consists of 15 items of HSCL-25, a self-report inventory, which assesses for depressive symptoms a person has been bothered by in the past seven days. 29 Each item is rated on a Likert scale of 1-4 and the average of the 15 items is the depression score at a cut-off ≥ 1.75. Maximum average score for HSCL-15 is 4.
With regard to the SRQ, it was designed for screening psychiatric symptoms experienced by an individual in the previous four weeks and consists of 20 questions. 30 The instrument has a maximum total score of 20 with a standard cut-off ≥ 10. 31 As for the EPDS, it is a 10-item self-reported questionnaire which measures depressive symptoms experienced in the past seven days and each item is rated on four exclusive scores (0-3). 32 The instrument has a maximum total score of 30 with a standard cutoff ≥ 10. 33 As a gold standard, the MINI is a brief structured diagnostic interview for the Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV), 16 which was used to confirm presence or absence of depression in pregnant women in this study.

Translation of instruments
Previously validated Chichewa-language versions of EPDS and SRQ were used in this study. 34 The HSCL-15, the MINI and the 3-item screener were translated into Chichewa by the first author and a social worker based on the minimum standards (backtranslation and monolingual testing) for applying an instrument that was developed in another language. 35

Data collection
This study used data from a sub-sample of respondents (n = 97) who participated in a cross-sectional study that was screening for depression using the 3-item screener, EPDS, HSCL-15 and SRQ. The research assistant (registered midwife) trained in administration of data-collection instruments collected data for the cross-sectional study. In addition, he recruited a sub-sample (n = 97) of respondents from the cross-sectional study for further interview using the MINI in this sensitivity analysis study. The first author, a mental health nurse, administered the MINI to all respondents who agreed to participate in the sensitivity analysis study to confirm the presence or absence of depression in respondents on the same day. The first author was blind to the respondents' initial screening outcomes in the cross-sectional study. Due to the low literacy levels, the interviewer read the questions and recorded the answers on behalf of respondents.

Data analysis
Data were analysed using Statistical Package for Social Sciences (SPSS®) version 22.0 (IBM Corp, Armonk, NY, USA) and MedCalc® (www.medcalc.org). Prior to data analysis, respondents' outcomes on the MINI were extracted and entered into IBM SPSS® 22.0 together with their data from the cross-sectional study for EPDS, HSCL-15, SRQ and the 3-itm screener. Prevalence of depression as determined by the MINI was calculated. A chisquare test was used to test for significant differences between demographic characteristics and depression prevalence. The reliability of each screening instrument was calculated using Cronbach's alpha. In testing for validity of these instruments, Bayesian 2 x 2 tables and the MINI diagnosis of depression as the gold standard were used to compute sensitivity and specificity. PPV and NPV were also calculated to determine the predictive ability of the screening instruments. Receiver operating characteristics (ROC) curve analysis was used to generate AUC, standard cut-off scores, and Youden indices with their associated sensitivity and specificity for each instrument. Utility of combinations of the 3-item screener with either EPDS or HSCL-15 or SRQ to detect depression were tested using compensatory ('OR') rule, conjunctive ('AND') rule, probability rule and sequential rule. Odds ratios were computed to test the ability of individual instruments and combinations of instruments to predict antenatal depression.

Findings
A total of 97 pregnant women agreed to participate in the sensitivity analysis study. The respondents were from rural (32%, n = 31) and urban (68%, n = 66) areas of Blantyre district. More than half of them (53.6%, n = 52) had secondary education or above, most were married (74.2%, n = 72) and more than two-thirds were unemployed (71.1%, n = 69). The prevalence of major depression based on the MINI in this sample was 25.8% (n = 25). Major depression was most prevalent amongst unmarried (88%, n = 22) and unemployed pregnant women (80%, n = 20). Age (mean = 26 ± 5.7 years), number of pregnancies (mean = 2.5 ± 1.4) and gestation periods (mean = 27.9 ± 8.1 weeks) for respondents with depression were comparable to those without depression ( Table 1).
The 3-item screener at cut-off ≥ 1 was found to be a reliable (Cronbach's alpha = 0.7), accurate (AUC = 0.85) and valid instrument for screening depression among pregnant women. It

Conjunctive ('AND') rule
Respondents who screened positive on both combined instruments were considered as cases using the conjunctive had a good balance of sensitivity = 80%, and specificity = 81%, and NPV = 92%, suggesting that it would be good for 'ruling out' depression. The optimum cut-off score of the instrument was > 1 (Youden index = 0.61) ( Table 2). This demonstrated the potential of the 3-item screener as a valid ultra-brief screening instrument for depression during pregnancy. The 3-item screener was also good at predicting depression in pregnant women (OR = 4.1 [2.3-7.4], p < 0.001) with screen positives being four times more likely to have depression.
This study also found that HSCL-15 (cut-off ≥ 1.75) is a reliable (Cronbach's alpha = 0.85), accurate (AUC = 0.91) and valid (sensitivity = 72%, specificity = 93%) instrument for measuring depression (see Table 2). The high specificity (93%) and PPV (78%) showed that HSCL-15 could be a good instrument for 'ruling in' depression. The HSCL-15 had the second highest accuracy (AUC = 0.91) in detecting probable depression cases, confirming its utility as a screening instrument for antenatal depression. When the cut-off score was adjusted from ≥ 1.75 to > 1.7 in order to optimise sensitivity and specificity (Youden index = 0.65), the sensitivity = 72% and specificity = 93% of HSCL-15 remained constant. The HSCL-15 predicted depression in pregnant women very well (OR = 59.3 , p < 0.001) with screen positives being 59 times more likely to have depression.

Utility of combining depression screening instruments
The following combination rules were tested in this study: compensatory, conjunctive, probability 36 and sequential. 22

Compensatory ('OR') rule
The 3-item screener and either EPDS or HSCL-15 or SRQ were combined using the compensatory rule such that a respondent was considered a case if she screened positive on any of the two combined instruments. Combination of the 3-item screener and EPDS using the compensatory rule resulted in picking 49 cases, of which one case that was missed by the 3-item screener was picked up by EPDS ( Table 3). The 3-item screener detected 48 cases, which included all cases identified by HSCL-15 and SRQ.
There was a substantial increase in sensitivity and a drastic decrease in specificity of EPDS, HSCL-15 and SRQ when they were combined with the 3-item screener using the 'OR' rule with all combinations having sensitivity above 80% and specificity below 70%.  40 This was achieved by EPDS (sensitivity = 88%, specificity 74%, optimum cut-off > 6) and the 3-item screener (sensitivity = 80%, specificity 81%, optimum cut-off > 1), confirming their suitability for screening depression in this population. The 3-item screener had a moderate discriminant ability (AUC = 0.85) in detecting antenatal depression. The 3-item screener is advantageous over EPDS in clinical practice because it is very short, easy to administer and easy to score, making it feasible and acceptable for use in busy settings that have inadequate resources. Therefore, this study suggests that the 3-item screener may be a suitable instrument for initial depression screening in busy antenatal clinics where true and false positives would undergo further screening.
Working from the premise that midwives may be trained to screen and refer antenatal depression cases in low-resource settings, 7 the discriminant validity of screening instruments which can complement each other in detecting a condition if they are combined 41 were tested. Probability combination of the 3-item screener and SRQ provided the best discriminant ability (AUC = 0.92) in this study. Nonetheless, probability combination has limited utility in clinical practice because its outcomes scores are arbitrary and do not share attributes of either instruments combined, 36 making it difficult to interpret.
The most utility was achieved by sequential combination of the 3-item screener and SRQ, which had the best balance of sensitivity (78%) and specificity (88%) compared with other instruments combined at optimum cut-off scores. This suggests that a multistage process for depression screening 20 can be utilised to administer a combination of an ultra-brief instrument (as initial screener) followed by a more detailed instrument (only to those who initially screened positive) in busy and understaffed antenatal clinics. The 3-item screener and SRQ combination would be feasible and acceptable for use in busy local antenatal clinics where midwives may be required to participate in screening because both instruments have binary questions that would be easy to score and interpret. Screening instruments with binary questions are less time consuming, easy to score 38 and easily understood by illiterate pregnant women. 42

Implications
Screening for depression in antenatal services, which are busy and usually understaffed in low-resource settings, should be done as a multistage process 20 to reduce workload by referring initial screen positives only for more detailed screening. A twostep process can be used where the 3-item screener (ultra-brief instrument), would initially be used to identify potential depression cases followed by SRQ (a more detailed instrument) to confirm the cases. Referral for specialist clinical assessment will then be determined by SRQ results. It is therefore recommended that screening and referral protocols which are developed to facilitate the detection of depression during antenatal care should incorporate this two-step process for best utility and accuracy.
('AND') rule. All the combinations of instruments under this rule had sensitivity of ≤ 42% and specificity of ≥ 90% with AUCs of ≤ 0.772 (see Table 3). Furthermore, combinations of the 3-item screener and EPDS and that of the 3-item screener and SRQ under this rule were poor at discriminating probable cases from non-probable cases, p > 0. 05.

Probability combination
Mathematical combination of screening instruments was done using logistic regression to identify combinations which had test scores that best distinguished respondents with antenatal depression from those without. All the combinations performed in this manner achieved sensitivity of ≥ 88% and specificity of ≥ 82% with AUCs of ≥ 0.877 (see Table 3). The probability combination of the 3-item screener and SRQ had the best level of accuracy (AUC = 0.920 [0.856-0.983]) and a good balance between sensitivity (92%) and specificity (83%). Probability combination of the 3-item screener and SRQ was the best predictor of depression (OR = 479 [49-4689], p < 0.001) in this study (Table 4).

Sequential rule
In sequential combination of instruments, all respondents were initially screened using the 3-item screener and all respondents who screened positive (n = 48) were further assessed using EPDS, HSCL-15 and SRQ. Sequential combination of the 3-item screener and other instruments increased sensitivity above that of each instrument when used alone (see Table 3). Most of the sequential combinations' validity in detecting depression decreased below that of the individual instruments. For instance, the AUC of EPDS decreased from 0.850 (0.763-0.915) to 0.775 (0.631-0.883) and specificity decreased from 81% to 52% when the 3-item screener and EPDS were sequentially combined. The sequential combination of 3-item screener (cut-off > 1) and SRQ (cut-off > 9) had a good balance between sensitivity (78%) and specificity (88%) and demonstrated superior ability in detecting depression (AUC = 0.885 [0.760-0.959]) over other sequentially combined instruments.

Discussion
Availability of an accurate and usable screening instrument helps a health-care system to use its limited resources efficiently to provide care to those who are most vulnerable. 37 Screening instruments with less than four questions can effectively detect depression and are considered easy to use in clinical settings. 6,19 This is corroborated by van Heyningen et al., 38 who asserted that a screening instrument for use in antenatal care in low-resource