Epigenetic phenotype of plasma cell-free DNA in the prediction of early-onset preeclampsia

Abstract Background In the current study, we sought to characterise the methylation haplotypes and nucleosome positioning patterns of placental DNA and plasma cell-free DNA of pregnant women with early-onset preeclampsia using whole genome bisulphite sequencing (WGBS) and methylation capture bisulphite sequencing (MCBS) and further develop and examine the diagnostic performance of a generalised linear model (GLM) by incorporating the epigenetic features for early-onset preeclampsia. Methods This case-control study recruited pregnant women aged at least 18 years who delivered their babies at our Hospital. In addition, non-pregnant women with no previous history of diseases were included. Placental samples of the villous parenchyma were taken at the time of delivery and venous blood was drawn from pregnant women during non-invasive prenatal testing at 12–15 weeks of pregnancy and nonpregnant women during the physical check-up. WGBS and MCBS were carried out of extracted genomic DNA. Then, we established the GLM by incorporating preeclampsia-specific methylation haplotypes and nucleosome positioning patterns and examined the diagnostic performance of the model by receiver operating characteristic (ROC) curve analysis. Results The study included 135 pregnant women and 50 non-pregnant women. Our high-depth MCBS revealed notably different DNA methylation and nucleosome positioning patterns between women with and without preeclampsia. Preeclampsia-specific hypermethylated sites were found predominantly in the promoter regions and particularly enriched in CTCF on the X chromosome. Totally, 2379 preeclampsia-specific methylation haplotypes were found across the entire genome. ROC analysis showed that the area under the ROC curve (AUC) was 0.938 (95%CI 0.877, 1.000). At a GLM cut-off of 0.341, the AUC was the maximum, with a sensitivity of 95.6% and a specificity of 89.7%. Conclusion Pregnant women with early-onset preeclampsia exhibit DNA methylation and nucleosome positioning patterns in placental and plasma DNA. PLAIN LANGUAGE SUMMARY Early-onset preeclampsia is a potentially dangerous condition that can have a profound impact on the health of both the expectant mother and her unborn child. This condition is particularly concerning because it’s challenging to predict who may be affected using conventional methods such as monitoring blood pressure. In our research, we’ve developed an innovative, non-invasive approach to predict the onset of early preeclampsia. We do this by analysing the genetic material of the developing baby, which can be found in the mother’s blood. Our method has shown remarkable accuracy in our testing populations, and its implications are substantial. By providing an early warning system, this breakthrough can benefit pregnant women immensely. It means that early-onset preeclampsia can be identified and addressed well before it becomes a serious health threat. This allows for timely medical interventions and treatments, significantly improving the well-being of both mothers and their precious little ones.


Introduction
Preeclampsia is a complication of pregnancy and affects 2-8% of pregnant women and contributes to 14% of total maternal deaths.(Steegers et al. 2010;Lo et al. 2013;Mol et al. 2016;Sunjaya and Sunjaya 2019) Early-onset preeclampsia accounts for one-fifth of preeclampsia cases with severe complications and high mortality.Pathological changes in preeclampsia well precede the onset of clinical manifestations, rendering early diagnosis and intervention nearly impossible.Meanwhile, daily low-dose aspirin before 16 gestational weeks can prevent 30-50% of preeclampsia cases, (Nicolaides 2017) demonstrating the need for effective biomarkers for predicting early-onset preeclampsia.
Maternal demographic and clinical variables predict earlyonset preeclampsia risk with low sensitivity (38-49%) and high false-positive rate (FPR) (10%).(National Collaborating Centre for, W. s., & Children's, H 2010;Wright et al. 2015) A Bayesian predictor model incorporating maternal demographic variables, the uterine artery pulsatility index (UtPI), mean arterial pressure (MAP), plasma pregnancy-associated plasma protein-A(PAPP-A) and placental growth factor (PlGF) predicts severe early-onset preeclampsia in a European cohort with a sensitivity of 93% and an FPR of 10%.(O'Gorman et al. 2016;2017) The Bayesian predictor model was further examined in Asian population and showed a sensitivity of 38-64% and a FPR of 5-10%.(Chaemsaithong et al. 2019) In addition, the predictive variables of this model are amenable to subjective influences or errors as UtPI is determined by ultrasonography and operator dependent, and MAP requires multiple measurements in the resting state.
Aberrant DNA methylation during placentation is associated with the occurrence of preeclampsia.(Kamrani et al. 2019) Various studies have been attempted to identify global and gene-specific methylation patterns in early-onset preeclampsia.(Blair et al. 2013;Anton et al. 2014;Liu et al. 2014;Zadora et al. 2017) However, microarray chips and transcriptomic analyses only covers 5% of the whole genome.Meanwhile, whole genome bisulphite sequencing (WGBS) and methylation capture bisulphite sequencing (MCBS) could expand the capture and analysis of methylation patterns to the whole genome level and have been used to elucidate the role of epigenome in pre-eclampsia and define gene-specific methylations in the plasma of pregnant women.(Ariff et al. 2019;Chu et al. 2021) Cell-free DNA is normally present at low concentrations in the circulating plasma (�10% of the total cell-free DNA is foetal cell-free DNA originate from apoptotic or necrotic trophoblastic placental cells.(Ashoor et al. 2013)) and may carry nucleosome prints of its histologic origin, (Snyder et al. 2016;Ulz et al. 2016) thus offering a minimally invasive source of DNA for diagnostic purposes.
In this study, we characterised methylation haplotypes and nucleosome positioning patterns of placental DNA and plasma cell-free DNA of patients with and without early-onset preeclampsia using WGBS and MCBS and further developed and examined the diagnostic performance of a generalised linear model (GLM) by incorporating the methylation haplotypes and nucleosome positioning patterns for early-onset preeclampsia.The findings may help establish a predictive algorithm with a high sensitivity and specificity for early-onset preeclampsia.

Study population
This case-control study recruited pregnant women aged at least 18 years who gave birth at Guangdong Women and Children Hospital from September 2017 to September 2018.Early-onset preeclampsia history was collected by inquiring pregnant women and analysing medical records.Pregnancy without hypertension was considered normal pregnancy.Early-onset preeclampsia was defined by the 2013 guidelines of the American College of Obstetricians and Gynaecologists as a systolic blood pressure �140 mmHg and/or diastolic blood pressure �90 mmHg between 20 and 34 weeks of pregnancy in the presence of proteinuria or any organ damage in the absence of proteinuria.The main exclusion criteria were (1) severe learning handicap or psychiatric diseases, (2) women who had multiple pregnancies or pregnancy by assisted reproductive technology, (3) congenital malformations or major hereditary defects, (4) concurrent malignancy, (5) receipt of aspirin or other blood pressure-lowering drugs, (5) cardiac disease, chronic kidney disease, (6) transfusion during pregnancy, (7) history of hypertension prior to pregnancy, and (8) diabetes, cholestasis syndrome, thyroid dysfunction and immune diseases.In addition, non-pregnant women with no previous history of diseases were included.We excluded women with a history of smoking and excessive alcohol consumption.
The study protocol was approved by the Ethics Committee of Guangdong Women and Children's Hospital, Guangzhou, China.Written informed consent was obtained from all study women.

DNA extraction
Placental samples of the villous parenchyma were taken at the time of delivery from four quadrants between the chorionic and basal plate.All samples were rinsed in 0.9% sodium chloride and snap-frozen in liquid nitrogen.Genomic DNA was extracted from the placental tissue using the DNeasy Blood & Tissue Kit according to the manufacturer's protocol (QIAGEN, Venlo, The Netherlands).DNA quality was assessed by DNA concentration and agarose gel electrophoresis.DNA samples which passed quality control had a DNA concentration higher than 1 ng/mL and showed distinct genomic DNA bands on electrophoresis.
Venous blood (10 mL) was taken from pregnant women during non-invasive prenatal testing for T-21, T-18, and T-13 at 12-15 weeks of pregnancy and nonpregnant women at physical check-up.Plasma cell-free DNA was extracted using the MagPure Circulating DNA LQ Kit (Magen, Guangzhou, China) according to the manufacturer's protocol and stored in −80 � C. Furthermore, placental tissue samples were obtained at the end of pregnancy from women with or without early-onset preeclampsia who underwent Cesarian section.

Library preparation
DNA samples extracted from placental tissue (10 lg) were fragmented to �300bp using sonication (50 sec, duty factor ¼ 10%, peak incident power(W)¼5, cycles/burst ¼ 200; Covaris S220).No size selection was required prior to cell-free DNA sequencing.Genomic DNA libraries were constructed by following the protocol for whole-genome bisulphite sequencing for methylation analysis.The products were treated with EZ DNA Methylation-Gold (Zymo Research, Irvine, CA, USA) according to the manufacturer's protocol.DNA was subjected to end repair, mono-adenylation, and ligation.Bead-based purification (AMPure XP; Beckman Coulter, IN, USA) was performed after the ligation processes.The converted product was amplified using KAPA HiFi HotStart ReadyMix (KAPA Biosystems, MA, USA) following the manufacturer's protocol.
Furthermore, the nimblegen SeqCap Epi Target Enrichment Kit (Roche, Nimblegen, Switzerland) was used to enrich the methylation sequence of cell-free DNA according to the manufacturer's protocol.Library preparation and sequencing were performed according to methods listed above, with a depth of 120 million reads.

Methylation haplotype machine learning
For every region of interest (i), the cytosine modification read-out counts are distributed as '1 0 (modified cytosine), '-1 0 (unmodified cytosine), '0 0 (uncertain).All cytosines in DNA fragments form matrix R such as [0,1,-1;0,1,1;1,1,-1; … ], and each cytosine in all samples forms matrix H.For each R[i] from the reading, different proportions of methylation haplotype w[i] were calculated using the R Statistical Package by the formula: For R [i,j] values obtained from samples with and without early-onset preeclampsia, a general linear model was used to observe and predict the outcome w [i,j,k].

Nucleosome origin and early-onset preeclampsia-specific positioning analysis
For every region of interest (i), the start, end and length were calculated using function kde and kde2d from R (v3.3.1) and polled using ggplot2 (v3.1.0).The difference in nucleosome positioning between different samples was also calculated.
Furthermore, the probability density (D) of nucleosome positioning was calculated based on start and end positions and length probabilities of cell-free DNA.Mixtools from R were used to perform the Expectation-Maximization algorithm to develop the original cell-specific and allele-specific nucleosome positioning.Nucleosome origin analysis (F) was performed using quadratic optimisation programming by using a nucleosome probability density map.Early-onset preeclampsia-specific nucleosome positioning analysis was performed by comparing the nucleosome probability density map of cell-free DNA sequencing results of women with and without early-onset preeclampsia.
The nucleosome position of DNA fragments overlapping each DMR region was mapped to a two-dimensional space with fragment start (x) and end (y) to form a distribution.Gaussian kernel was applied to this distribution with mixtools (1.1.0)to learn the characteristic distribution functions.For deconvolution of each individual sample to the characteristic gaussian distributions, we used quadprog (1.5.0) package in R to deconvolve the DNA fragment ends distribution from a particular individual.This process was done for each individual DMR region.

Construction of a GLM early-onset preeclampsia prediction model
The relative frequency of binary haplotype-preeclampsia outcome was used to generate a methylation haplotype frequency matrix by using principal component analysis (PCA).For single-modality data, a deep autoencoder was used.Each region was coded by 0 (unmethylated), 1 (unknown), or 2 (methylated) indicating whether the fragment was methylated (as shown in Figure 1(A)).Early-onset preeclampsia-specific methylation haplotypes were defined as the coverage percentage of the methylation patterns in early-onset preeclampsia placenta, which was more than 20-fold of the percent coverage percentage in the placenta of pregnant women without early-onset preeclampsia, and these methylation patterns were not found in plasma cell-free DNA from non-pregnant patients.Differently methylated regions (DMRs) were identified using matrix algebra analysis and confirmed only if one or more methylation patterns on the DMR showed significant differences between early-onset preeclampsia and non-preeclampsia samples.The model was calibrated with a 10-fold validation.Optimising the classification threshold was done by choosing the cut-off with the best diagnostic performance.We used D and F as potentially interacting variables in the GLM model.PC 1 to PC 10 were principal component eigenvectors built upon methylation haplotype frequency matrix.The phenotype (y) was calculated using the following formula: where D, F, and PC1:PC10 were independent variables, and y ¼ 1 indicated PE and y ¼ 0 indicated no PE.11 samples (6 pregnant women without and 5 pregnant women with earlyonset preeclampsia) were used for training this model, and 84 samples (44 pregnant women without and 40 pregnant women with early-onset preeclampsia) for model validation.

Statistical analysis
Differences in methylation loci between groups were analysed using PageRank-like statistics VLR and Student's t test.Differences in placental DNA methylation patterns between women with and without early-onset preeclampsia were examined by Fisher's exact test.p < 0.01 for two sided tests were considered statistically significant after multiple testing adjustments with p.adjustment in R in the default setting.Baseline variables were analysed by Student's t test and p < 0.01 for two sided tests were considered statistically significant.The receiver operating characteristic (ROC) curves and the significance of differences in the area under the curve (AUC), sensitivity and specificity were plotted and calculated using the SPSS 13.0.

Demographic and baseline characteristics of the study women
The study flowchart is shown in Supplementary Figures 1  and 2. First, we analysed placental DNA samples to find the methylation patterns of early-onset preeclampsia.Placental DNA samples from 6 women without and 6 women with early-onset preeclampsia underwent WGBS and MCBS.The demographic and baseline characteristics of the study women are summarised in Table 1.

Placental DNA of pregnant women with early-onset preeclampsia exhibits distinct methylation patterns
We aimed to delineate the placental methylation patterns of women with and without early-onset preeclampsia.Our highdepth bisulphite capture sequencing, as shown in representative scatter plots in Figure 3(A), revealed a strong positive correlation between the methylation levels of CpG sites within the whole genome in placental DNA of women with and without early-onset preeclampsia (Figure 3(A)).The mean range of CpG sites in the haplotypes was 3 ± 0.97 (95%CI 2.5).The distribution of early-onset PE-related methylation haplotypes in all the chromosomes is detailed in Table 1.Heatmap showing supervised hierarchical clustering of methylation and hypomethylation using the top 100,000 genome sites Figure 1.Placental and plasma DNA of pregnant women with early-onset preeclampsia displays preeclampsia-specific methylation haplotypes and differently methylated regions.(A) A deep autoencoder and principal component analysis are used to identify early-onset preeclampsia-specific methylation patterns in the region spanning from 100 bp upstream to 100 bp downstream of each early-onset preeclampsia-specific methylation site.The methylation status of each region is coded by 0 (unmethylated), 1 (unknown), or 2 (methylated) by the autoencoder.(B) Distribution of early-onset preeclampsia-specific methylation haplotypes on all the chromosomes of the genome.A early-onset preeclampsia-specific methylation haplotype is established if the percent coverage of the methylation patterns in placental DNA of early-onset preeclampsia women is more than 20-fold of the percent coverage in placental DNA of women without early-onset preeclampsia, and if the pattern is not found in plasma cell-free DNA of non-pregnant women.(C) Differently methylated regions (DMRs) are identified using matrix algebra analysis as described in Methods.CTRL: pregnant women without early-onset preeclampsia; PE: pregnant women with early-onset preeclampsia.(D) One early-onset preeclampsia-specific methylation haplotype, which is located between nucleotides 149411224 and 149411379 on chromosome 7, is present on placental DNA of a early-onset preeclampsia woman but absent from placental DNA or plasma cell-free DNA of a pregnant woman without early-onset preeclampsia.Hypermethylation is indicated in red, hypomethylation in blue, and not available in white.
with the most significant difference in methylation and hypomethylation patterns between placental DNA samples of women with and without early-onset preeclampsia showed strikingly similar placental DNA methylation patterns among women with early-onset preeclampsia, but notably different placental DNA methylation patterns between women with and without early-onset preeclampsia (Figure 3(B)).Intriguingly, early-onset preeclampsia-specific hypermethylated sites were found predominantly in the promoter regions (2-3kb promoter: 1.71%; 1-2kb promoter: 2.93%; �1 kb promoter: 24.94%) and introns (total 29.32%) (Figure 3(C)).CCCTC-binding factor (CTCF), a zinc finger DNA-binding protein, is involved in chromatin organisation and reorganisation.Notably, early-onset preeclampsia-specific methylation haplotypes were enriched in CTCF on X chromosome (Figure GO and KEGG pathway analysis of genes within ±1 kb of early-onset preeclampsia-specific methylation sites indicated that these genes were related to signalling pathways implicated in cell proliferation and developmental biology (Figure 3(E,F)).

Placental and plasma DNA of pregnant women with early-onset preeclampsia displays specific methylation haplotypes
We used a deep autoencoder and principal component analysis to identify early-onset preeclampsia-specific methylation patterns in the region spanning from 100 bp upstream to 100 bp downstream of each early-onset preeclampsia-specific methylation site (Figure 3(A)).A total of 2379 early-onset preeclampsia-specific methylation haplotypes were identified on all the chromosomes of the genome (Figure 1(B)).Matrix algebra analysis further identified categorised these earlyonset preeclampsia-specific methylation haplotypes into 244 candidate DMRs (Figure 1(C)).An example is illustrated in Figure 1(D) showing one early-onset preeclampsia-specific methylation haplotype located between nucleotides 149411224 to149411379 on chromosome 7.Furthermore, during pregnancy, placental DNA fragments are shed into the maternal peripheral blood as cell-free DNA.We were interested in exploring whether early-onset preeclampsiaspecific methylation haplotypes in placental DNA could also be observed in cell-free DNA.Not unexpectedly, early-onset preeclampsia-specific methylation haplotypes fragments from the placenta gradually increased in plasma cell-free DNA as pregnancy advanced.

Placental and plasma DNA of pregnant women with early-onset preeclampsia shows disparate nucleosome positioning patterns
A nucleosome contains multiple haplotypes and nucleosome positioning patterns could be more informative of methylation changes in the genome.We further sought to examine   any early-onset preeclampsia-specific changes in nucleosome positioning by performing MCBS analysis of cell-free DNA samples from 22 pregnant women without early-onset preeclampsia and 6 women with early-onset preeclampsia.We observed early-onset preeclampsia-specific nucleosome positioning in placental DNA of pregnant women with earlyonset preeclampsia versus plasma cell-free DNA of nonpregnant women and placental DNA of pregnant women without preeclampsia (Figure S2A).An example is illustrated in Figure S2B showing the distribution of the length and the density of nucleosomes on chromosome 1 from plasma cell-free DNA of nonpregnant women and pregnant women with and without early-onset preeclampsia.

The diagnostic performance of a generalised linear model (GLM) for predicting early-onset preeclampsia
Given the distinct methylation haplotypes and Nucleosome positioning patterns of cell-free DNA and placental DNA of pregnant women with early-onset preeclampsia versus nonpregnant women and pregnant women without early-onset preeclampsia, we constructed a linear regression model incorporating early-onset preeclampsia-specific methylation haplotypes and nucleosome positioning patterns of plasma cell-free DNA of the training cohort including 6 pregnant women without and 5 pregnant women with early-onset preeclampsia.Using ROC analysis, when the GLM cut-off was 0.341, the AUC was the maximum, with a sensitivity of 95.6% and a specificity of 89.7% (Figure S3A).The prediction model was further tested in the validation cohort including 44 pregnant women without and 40 pregnant women with earlyonset preeclampsia.A significant difference in the GLM score was observed between pregnant women without and with early-onset preeclampsia (Figure S3B).By using the optimal GLM cut-off of 0.341 as the threshold, we successfully identified early-onset preeclampsia in 37 out of 40 pregnant women with early-onset preeclampsia and excluded 43 out of 44 pregnant women with early-onset preeclampsia (Figure S3C).

Discussion
Efforts in early-onset preeclampsia prediction have been directed towards clinical variables and plasma protein biomarkers, such as fms-like tyrosine kinase-1 (Flt-1), PlGF, and PAPP-A, (Poon and Nicolaides 2014;Zeisler et al. 2016;Tarca et al. 2019;Zhang et al. 2019) but without high sensitivity and/or specificity.We believe that our cell-free DNA methylation haplotype-based model would be a valuable tool for clinical practice.
During early-onset preeclampsia, changes in methylation patterns have been reported in placenta samples and endothelial progenitor cells in venous cord blood and omental fat arteries.(Mousa et al. 2012;Yeung et al. 2016;Brodowski et al. 2019) Using WGBS, we identified DNA methylation in the whole genome from the placenta of women with and without early-onset preeclampsia.This complete data set allowed us to further characterise the location, level, and density of methylation in the whole genome and delineate early-onset preeclampsia-specific methylation haplotypes.
Using genome-wide methylation bisulphite sequencing analysis, we found that many of the early-onset preeclampsia-specific methylation sites were preferentially distributed in the promoter regions, suggesting that DNA methylation may play a role in regulating early-onset preeclampsia-specific gene expression.Early-onset preeclampsia-specific methylation sites are enriched in the CTCF motif, suggesting hypermethylation of these loci may lead to higher dimensional shift in the chromatin structure, influencing gene expression in the transcriptionally active domain.Further, consistent with previous findings, (Li and Fang 2019) genes related to early-onset preeclampsia-specific methylation sites are involved in cancer pathways etc.
One of the major contributions of our work is development of a non-invasive method for screening of early-onset preeclampsia, using plasma cell-free DNA in blood.Our analysis of cell-free DNA methylation patterns revealed that the start, end, and length of cell-free DNA differed between early-onset preeclampsia patients and healthy pregnant women, which indicated that the positioning of nucleosomes was also altered in early-onset preeclampsia placenta.By combining this parameter and early-onset preeclampsiaspecific methylation information, we successfully increased the sensitivity and specificity of our early-onset preeclampsia prediction model.Our data provide strong evidence that supports the role of epigenetic regulation in the onset of preeclampsia and the potential to utilise methylation information for early detection of early-onset preeclampsia.

Ethical approval
The study protocol was approved by the Ethics Committee of Guangdong Women and Children's Hospital, Guangzhou, China (No.201701044).All patients gave their written informed consent.

Figure 2 .
Figure 2. The study flowcharts.(A) The schema for discovery and validation of specific methylation haplotypes and nucleosome positioning patterns in placental and plasma DNA.(B) The schema for construction of the GLM model and prediction of PE. (C) A detailed description of the application of the GLM for prediction of early-onset PE.GLM, generalised linear model; MCBS, methylation capture bisulphite sequencing; WGBS, whole genome bisulphite sequencing.

Figure 3 .
Figure 3. Placental DNA of pregnant women with early-onset preeclampsia exhibits a distinct methylation pattern.(A) Representative scatter plots of methylation levels of CpG sites within the whole genome in placental DNA of women with and without early-onset preeclampsia.(B) Heatmap showing supervised hierarchical clustering of methylation and hypomethylation using top 100,000 genome sites with the most significant difference in methylation and hypomethylation patterns between placental DNA samples of women with and without early-onset preeclampsia.Each row represents one placental DNA sample; each column represents one genome site.N ¼ 6 per group.(C) Early-onset preeclampsia-specific frequency of differentially methylated regions according to functional and CpG island contextual distribution.(D) Early-onset preeclampsia-specific methylation haplotypes are enriched on CTCF on X chromosome.(E) GO and (F) KEGG pathway analysis of genes within ±1 kb of early-onset preeclampsia-specific methylation sites.N ¼ 6 per group.