Meta-analysis of diagnostic performance of serology tests for COVID-19: impact of assay design and post-symptom-onset intervals

ABSTRACT Serology detection is recognized for its sensitivity in convalescent patients with COVID-19, in comparison with nucleic acid amplification tests (NAATs). This article aimed to evaluate the diagnostic accuracy of serologic methods for COVID-19 based on assay design and post-symptom-onset intervals. Two authors independently searched PubMed, Cochrane library, Ovid, EBSCO for case–control, longitudinal and cohort studies that determined the diagnostic accuracy of serology tests in comparison with NAATs in COVID-19 cases and used QUADAS-2 for quality assessment. Pooled accuracy was analysed using INLA method. A total of 27 studies were included in this meta-analysis, with 4 cohort, 16 case–control and 7 longitudinal studies and 4565 participants. Serology tests had the lowest sensitivity at 0–7 days after symptom onset and the highest at >14 days. TAB had a better sensitivity than IgG or IgM only. Using combined nucleocapsid (N) and spike(S) protein had a better sensitivity compared to N or S protein only. Lateral flow immunoassay (LFIA) had a lower sensitivity than enzyme-linked immunoassay (ELISA) and chemiluminescent immunoassay (CLIA). Serology tests will play an important role in the clinical diagnosis for later stage COVID-19 patients. ELISA tests, detecting TAB or targeting combined N and S proteins had a higher diagnostic sensitivity compared to other methods.


Introduction
On 11 March 2020, the World Health Organization (WHO) described the global COVID-19 outbreak as a worldwide pandemic 1 . SARS-CoV-2 is the etiologic agent of COVID-19 and primarily attacks the human respiratory system and can cause respiratory infections, diarrohea, and even multiple organ failure in patients 2 . By 10 July 2020, there were 12,102,328 cases of COVID-19 diagnosed worldwide and 551,046 deaths had been reported 3 . At the time of writing, the pandemic was still severe and the likelihood of persistence of SARS-CoV-2 within the human population is increasing.
As no definitely effective drugs or vaccines are yet available, rapid diagnosis of SARS-CoV-2 infection and quick isolation of the patients and tracing of their close contacts are currently the most effective means of preventing transmission. At present, the definitive diagnosis of COVID-19 mainly depends on the detection of SARS-CoV-2 RNA by nucleic acid amplification tests (NAATs) such as RT-PCR 4 . Serological methods have also become an important auxiliary testing tool, and play an important role in the diagnosis and epidemiological investigation of COVID-19 cases [5][6][7][8][9][10] . At the time of writing, the United States Food and Drug Administration has granted Emergency Use Authorization for 31 serology test kits 11 . Serological test methods for the detection of anti-SARS-CoV-2 IgG and IgM antibody include enzyme-linked immunosorbent assay (ELISA), chemiluminescent immunoassay (CLIA), and lateral flow immunoassay (LFIA).
Compared with some NAATs, serological testing is relatively easier to perform and requires less technologically advanced equipment. In addition, the blood samples are less likely to contain infectious SARS-CoV-2 virus than respiratory specimens, decreasing the potential risk of infection to laboratory staff 12 . However, there are questions remaining to be answered concerning the serological diagnosis of COVID-19. First, studies have reported that the seroconversion happened at 3-14 days post symptom onsets 13,14 , which may not facilitate the early diagnosis of the disease. What's more, the window periods of the different serological tests have not been directly assessed. Second, the specificity and sensitivity of serological methods can vary over the infection time course, and need to be further analysed 15 . Finally, the impact of assay design on the performance of serological tests has yet to be determined.
Meta-analysis is a quantitative evaluation method in evidence-based medicine and is widely accepted as one of the most reliable tools in clinical analysis. Our study evaluated all published case-control, longitudinal and cohort studies for the diagnostic efficacy and characteristics of the current serological tests for COVID-19.

Selection criteria
The inclusion criteria for this meta-analysis were the following: (1) all cohort, case-control, and longitudinal studies published between 1 January 2020 and 30 June 2020; (2) all studies that evaluated the diagnostic performance of serological tests for COIVD-19 in comparison with a SARS-CoV-2 NAAT as a reference test; (3) studies from which we could directly or indirectly extract data on true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN); (4) participants were 18-85 years of age; (5) published articles as well as letters and corrected proofs; and (6) only articles in English were included.
The exclusion criteria were the following: (1) preprint articles which had not been peer reviewed; (2) studies that had crossed data with other published articles; (3) participants were immunocompromised (cancer, AIDS patients, etc.); and (4) studies published before 2020. (5) Studies with more than one "high risk of bias" in QUADAS-2 quality assessment domain 2-4 were excluded.

Search strategy
We searched the databases using the following Medical Subject Heading words and key words, or the combination: COVID-19, SARS-CoV-2, severe acute respiratory syndrome coronavirus 2, serology, serology test, antibody, antigen, diagnostic test. Main medical databases including PubMed, Cochrane library, EBSCO, and OVID were searched in this study (Full search strategy in supplementary material (1). We set a time limit published between 1 January 2020 and 30 June 2020 and a language limit of English only.

Study evaluation and data extraction
Two researchers (Wang and Ai) independently scrutinized abstracts and titles to include potentially eligible articles and acquire full texts online. Articles unavailable online were excluded. Then, the same two researchers examined the full texts individually using the preset inclusion and exclusion criteria.
As recommended by Cochrane Handbook for Systematic Reviews of Diagnostic Accuracy 16 , we adopted QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies -2) to evaluate the bias and quality of selected studies 17 . The following four domains were considered for risks of bias and application concerns as depicted in the assessment tool: (1) participant selection; (2) index test; (3) reference text; and (4) flowing and timing. Studies with more than one "high risk of bias" in the later 3 part were excluded (supplementary material 2).
The following information was extracted from final eligible studies: (1) details of the study: author, title, published date, countries where studies were conducted, study design, participant inclusion manner and criteria, number of enrolled participants and the grouping, number of participants whose results were available; (2) clinical characteristics of participants: age, gender, COVID-19 status; (3) target data: the results of serologic tests and NAATs for COVID-19 (TP, FP, FN, TN) and symptom onset-specimen collection interval (days). One sample per participant was included in the overall sensitivity and specificity, while the accuracy on different post-symptom interval directly used the respective data from the articles; and (4) test profile: methods for serology and SARS-CoV-2 RNA detection, profile of detected antibodies, and targeted antigen of serologic tests.

Statistical analysis
We assessed risks of bias and application concerns using QUADAS-2 tool on Review Manager 5.4 software 18 . Meta-analysis over selected studies was performed using R software (version 3.6.1) with the meta4diag package 19 . TAB was defined as combined IgG and IgM results, or directly described in the primary articles. Diagnostic performance of IgG, IgM, TAB (or combined IgG and IgM), were analysed. Sensitivity and specificity were calculated. Data synthesis was performed using Bayesian bivariate integrated nested Laplace approximation (INLA) Figure 2. Risk of bias and application concerns of included studies assessed using QUADAS-2 tool. Red spots refer to high risk of bias or high concern, yellow refer to unclear and green refer to low.  method according to the protocol 19 . Forest plots of point estimates and 95% confidence intervals (95% CI) were provided. Summary receiver-operating characteristic (SROC) curves were plotted to evaluate the heterogeneity (threshold effect) between studies 20 .

Search results
A total of 1876 articles were identified by systematic literature research as of30 June 2020. A total of 167 studies were selected through title and abstract, in which 65 were duplicated and 102 were selected for further review. Through full-text review, 75 articles were excluded as depicted in Figure 1 and supplementary Table 1. A total of 27 articles were finally included for analysis: 16 case-control studies; 7 longitudinal studies; and 4 cohort studies  . Assessment of risks of bias and application concerns are described in Figure 2. 85.1% (23/27) studies were present with a high risk of bias in patient selection, where these articles did not avoid case-control or longitudinal design. We involved these studies for later analysis and evaluated possible risks of bias in discussion.
Detailed characteristics of these 27 articles are shown in Table 1 (Figure 5, supplementary figure).

Diagnostic performance of different serologic test methods, and by targeted antigen
The sensitivity of different serologic methods is plotted in Figure 6(A). Seven studies provided direct comparison between different methods while 20 articles didn't (supplementary Table 2

Discussion
Our meta-analysis included 27 articles, with 4 cohort studies, 16 case-control studies and 7 longitudinal studies to evaluate the overall diagnostic performance of serology tests for diagnosis of COVID-19, including the optimum time window and best performing methodology. Serology tests had a sensitivity of less than 40% at 0-7 days post symptom onset. Serology tests detecting TAB had a higher sensitivity than IgM or IgM alone. Targeting combined N and S proteins had a higher sensitivity than targeting N or S protein alone. LFIA tended to have a lower sensitivity than ELISA or CLIA.
The overall sensitivity of serology tests was poor, thus negative serological results alone cannot exclude the diagnosis of COVID-19. However, significant variation was observed in the forest plots of the sensitivity of serology tests (Figure 3(B-D)), with a range of 16%-93% in IgG, 42%-92% in IgM and 45%-92% in TAB. We attributed this mostly likely to different seroconversion times for different antibody classes, and further divided included articles according to symptom onset-specimen collection interval 48 . Our analysis suggested that serology tests had the lowest sensitivity at 0-7 days post symptom onset and the highest sensitivity at >14 days. Our findings and those of others suggest that 14 days post symptom onset is a point when the sensitivity serology tests is sufficiently high to replace NAATs for the optimal diagnosis of COVID-19 13,[49][50][51][52] . During the early acute phase of infection, antibody detection might cause numerous false negatives cases. Nonetheless, there have been rare detectable antibody responses during the early phase of COVID-19 concurrent with high virus load and a high risk of transmission 53 . In the late phase of disease, on the contrary, seroconversion occurs when virus load begins to decline, and serological tests might play a more important role in the diagnosis of COVID-19. Overall, our pooled analysis suggests a preferred diagnostic algorithm based on days post symptom onset: NAAT alone at 0-14 days, NAAT combined with a serology test at over 14 days, when virus shedding might drop below the detection limit of most NAATs 54 .
As for the serology test methodology, our analysis suggested that serology tests detecting TAB (or combined IgG and IgM), targeting N and S combined may provide greater sensitivity than tests based on N or S alone. LFIA had a relatively low sensitivity than ELISA or CLIA but provided a fast turn-around time and convenience, and had been authorized by FDA for emergency use. The choice of serology test methodology should be based on testing environment and patient population. LFIA tests could prove useful in the emergency room, ambulatory and outpatient settings rather than simply abandoned for its relatively poor performance. We didn't pool our analysis based on assays from different companies, but other headto-head studies had shown a variable accordance between different assays within only a small group of participants 28,38,46,55 . A recent study showed a high accordance between Abbott Architect, DiaSorin Liaison, Ortho VITROS, and Euroimmun among 1200 serum samples 56 . Considering that the clinical performance of commercial assays was varied from laboratory condition, immune status of participants, time from symptoms onset to sample collection, etc., more head-to-head comparison was needed to figure out the accordance between commercial assays on a relatively larger scale.
In this study, most studies remained to had no risk of bias in the domain 2-4 or fewer application concern compared with other meta-analysis of diagnostic test accuracy. We attributed this phenomenon due to the following reasons. First, studies with high risk of bias in the domain 2-4 were excluded. The detailed exclusion reasons included no prespecified threshold for serology test, not using NAATs as reference tests, not all participants receiving the NAATs, etc. All of these problems were considered to bring high risk of concerns while the first domain, with a non-cohort study design or unclear consecutive enrolment were considered to bring less effect to the analysis. Second, COVID-19 was a global public health problem broke out within less than one year and thus studies on serology test accuracy of COVID-19 had some similar features: (1) Participant enrolment was confined to a short time and the criterion was usually not complex, with no clear exclusion criterion. (2) NAATs is the only method suggested by WHO to diagnose COVID-19. (3) Most casecontrol studies used preserved serum or blood before 2020 as the control group for determining the accuracy for serology test. These features also led to a high agreement between enrolled articles in the assessment of risk of bias and application concern using QUADAS-2 tool.
Previously, NAATs were the recommended gold standard for COVID-19 diagnosis by the WHO, while antigen tests were not recommended due to insufficient performance data 57,58 . Another concern raised by the WHO regarding serology tests was the relatively long antibody window, with seroconversion occurring during the second week after symptom onset 52 . At present, antibody detection was only suggested for epidemiological research or disease surveillance 5,9,59,60 . This is the first study that meta-analysed the sensitivity of serology tests across different time windows. It also provides a general review of different serology test methods. Combined IgG and IgM, as well as combined N and S protein-based tests had better performance than IgG/IgM alone, or N/S protein alone based tests, while among method formats, LFIA had lower sensitivity than ELISA or CLIA.
This study has some limitations. First, we did not analyse the cross-reactivity/specificity of serology tests for COVID-19. This was limited by data extraction, where most qualified articles did not provide specificity data. Previous studies had reported that the serological cross-reactivity between COVID-19 and other coronavirus disease like SARS-CoV seemed to be high, suggesting that serology tests might bring more false negativities and should only be applied as a supplementary tool for clinical diagnosis 61 . Second, 23/27 (85.2%) of enrolled articles were present with high risks of bias for case-control or longitudinal design. Specificity in our study might be overestimated because most of the control group used samples from healthy donors before 2020, which avoid possible cross-reactivities as mentioned above. Another limitation was that we did not analyse the combined diagnostic performance of NAATs and serology tests, because clinically confirmed COVID-19 cases without positive RNA or serology test results were not enrolled into this meta-analysis. According to our study, the combination of these two tests was preferred during the late phase of disease progression. However, the actual sensitivity remains to be evaluated in the future.
Our results highlight that serology tests could play an important role in the diagnosis of suspected COVID-19 infections during later stage of the disease. In clinical practice, COVID-19 serological tests could contribute to the understanding of the immunological state of the population.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This study was supported in part by the National Natural Science Foundation of China (grant number 82041010), Shanghai Association for Science and Technology (grant number 20411950400), Shanghai Youth Science and Technology Talents Sailing Project (grant number 20YF1404300), and the Investigator Initiated Study grants (grant number Cepheid-IIS-2020-0001) to WHZ.

Contributors
HYW, JWA contributed equally to this article. YWT and WHZ conceptualized the paper. HYW collected and analysed the data. JWA, HYW, MJL wrote the initial draft, with all authors providing critical feedback and edits to subsequent revisions. All authors approved the final draft of the manuscript. YWT and WHZ is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Competing interests
MJL and YWT are employees of Cepheid, the commercial manufacturer of the Xpert Xpress SARS-CoV-2 test. HYW, JWA and WHZ declare no competing interests.

Data sharing
Additional data will be available on request.

Transparency
The lead authors and manuscript's guarantor affirm that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.