Distinguishing preeclampsia using the falling scaled slope (FSS) --- a novel photoplethysmographic morphological parameter

ABSTRACT Background Preeclampsia (PE) presence could lead to hemodynamic changes. Previous research suggested that morphological parameters based on photoplethysmographic pulse waves (PPGW) could help diagnose PE. Aim To investigate the performance of a novel PPGPW-based parameter, falling scaled slope (FSS), in distinguishing PE. To investigate the advantages of the machine learning algorithm over the conventional statistical methods in the analysis. Methods Eighty-one pieces of PPGPW data were acquired for the study (PE, n = 44; normotensive, n = 37). The FSS values were calculated and used to construct a PE classifier using the K-nearest neighbors (KNN) algorithm. A predicted PE state varying from 0 to 1 was also calculated. The classifier’s performance in distinguishing PE was evaluated using the ROC and AUC. A comparison was conducted with previously published PPGPW-based models. Result Compared to the previous PPGPW-based parameters, FSS showed a better performance in distinguishing PE with an AUC value of 0.924, the best threshold of 0.498 could predict PE with a sensitivity of 84.1% and a specificity of 89.2%. As for the analysis method, training a classifier using the KNN algorithm had an advantage over the conventional statistical methods with the AUC values of 0.878 and 0.749, respectively. Conclusion The result indicated that FSS might be an effective tool for identifying PE. Moreover, the machine learning algorithm could further help the data analysis and improve performance.


Introduction
Preeclampsia (PE) is a multi-system disorder, one of the most hazardous factors leading to maternal and perinatal morbidity and mortality (1,2).The onset of hypertension is the most representative characteristic of PE (1,2).Although the etiology of the disease is not fully clarified, it could affect 5%-8% of the universal pregnancies (1)(2)(3).Several factors are regarded as the risk and contributing to the occurrence of PE, including maternal age, parity, prior PE, multiple gestations and obesity (2,4,5).Specifically, PE could have a series of impacts on both patient and infant, resulting in maternal cerebrovascular, cardiac, hepatic, hematologic, and renal complications, sometimes even prematurity and neonatal mortality (2,6).As a result, it is essential to perform accurate PE identification and provide the best care possible (2).
In clinical practice, PE screening usually utilizes the data of the patient risk factors mentioned earlier (1,4,7).However, this method could not reflect any dynamic changes during pregnancy and has limited effects on PE prediction thus.Widely used PE biomarkers help solve these problems to a certain extent, but they usually require expensive and professional medical equipment and quite a long examination time as well (1,(8)(9)(10)(11)(12).In this condition, it will be more convenient if people develop any noninvasive multi-occasion inspection method for PE diagnosis.
Recently, the photoplethysmographic pulse wave (PPGPW) has been considered a potential noninvasive PE inspection index and has attracted the attention of researchers (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23).PPGPW could record changes in blood volume noninvasively and continuously, which tends to be concentrated expression of arterial and venous blood interaction with the cardiac, respiratory, and autonomic systems (24).On the other hand, some findings suggest that pregnancy is associated with changes in intravascular volume, cardiac output, and heart rate, a marked decrease in vascular resistance, and a tendency toward decreased mean blood pressure (13,22,25,26).Previous research have made some preliminary clinical explorations and found that the morphology of PPGPW could reflect these changes to a certain extent (14,15).As shown in Figure 1, there are some obvious morphological distinctions between preeclamptic and healthy patient pulses.Both amplitude and pulse-duration varied for the pulses, and the pulse pattern between peak and minimum value was also dissimilar.Moreover, the presence of PE affected whether the pulses showed the apparent notch point, which is one of the most representative feature points for the pulse.Therefore, it is logical that such morphological distinctions of the pulses should be quantized accurately.
Previous research have already proposed some morphological parameters of the PPGPW and analyzed the PE presence using the specific mathematical algorithms and tools.The results of these studies showed the latent capacity of the PPGPW to diagnose/predict PE (16)(17)(18)(19)(20)(21)(22)(23).However, these studies used the conventional statistical methods to obtain the conclusions, which means that the original PPGPW data were not utilized at the most.Thus, the following two studies were conducted in this paper.First, a novel PPGPW-based parameter, the falling scaled slope (FSS), was proposed to describe the morphology of the data and distinguish the PE presence.The FSS utilized a set of slopes to demonstrate the PPGPW after locating the specific points in the pulse, and the slope refers to the ratio of the amplitude to the time duration for the pulse.Second, the machine learning (ML) algorithm was utilized to analyze to data involved in this study.ML helped infer significant Figure 1.Pulse pattern contrast for preeclamptic and healthy patients.Pulse waveforms from patients with/without PE were drawn in different colours and shapes with magnitudes in arbitrary units (AU).There are obvious differences in amplitude and pulse shape.The notch point is inapparent for the preeclamptic pulse.
In summary, a trial to analyze PE occurrence was conducted in this study using ML technology with PPGPW-based morphological parameters.

Data source
Due to the deficiency of the standard dataset that contains the available cases for PE study, an arduous data collection process was compulsory before all the possible research.As a result, all the PPGPW data utilized in this paper were obtained from the Department of Gynaecology and Obstetrics at the Women's Hospital, Zhejiang University School of Medicine.A cohort study was conducted from June 2017 to April 2019 with 44 PE patient volunteers at gestation weeks 33.1 ± 4.1, and 37 normotensive patient volunteers at gestation weeks 34.8 ± 4.2 served as the contrast group.The approval was obtained from the local Research Ethics Committee (No. 20170131) and so were the informed consents from all the patient volunteers.
Abiding by the broadened definition proposed by the International Society for the Study of Hypertension in Pregnancy, the previously normotensive patient would be assessed as PE cases whose systolic blood pressure (SBP) at ≥140 mmHg and/or diastolic blood pressure (DBP) at ≥90 mmHg on at least two occasions measured 4 h apart and who suffered from new-onset proteinuria, maternal organ dysfunction, and uteroplacental dysfunction at or after 20 weeks of gestation (1).No patients were involved in this cohort study if they suffered from any type of cardiovascular, renal, or other hypertension-associated diseases, especially those with hypercoagulable disorders.Neither multiple gestation patients nor artificial insemination patients were involved in this study.

Data acquirement
The PPGPW data acquisition was performed using the common standard medical monitor CARESCAPE B650 Patient Monitor (General Electric, Boston, USA) with the oxygen sensor (DS-100A Dura sensor, OxiMax, Nellcor Puritan Bennett Inc, USA).The CARESCAPE has been designed to be strong on both the motion artifacts and white noises.Patient volunteers were required to maintain a seated posture throughout the PPGPW acquisition process.After the volunteer took a rest for at least 5 min, the data were obtained from the index finger of the non-dominant hand of the subject for at least 1 min.The PPGPW signals were then exported to PC as [sample time, sample value] pairs in comma-separated-value (CSV) format files for further analysis with a sample rate of 100 Hz and a 12bit digital resolution.

Pulse pre-processes
Before any analysis began, some pre-processes on the original PPGW data were mandatory.These preprocesses included raw data cleaning to discard data with severe distortions, data filtering to remove noise within the acceptable ranges, and the pulse wave localization and detection using the corresponding algorithm (41,42).Specifically, both manual and auto-code pulse checks were conducted to ensure that all detected pulses were corrected.Finally, the PPGW data were ready for the morphological parameters' calculation.

Falling scaled slope definition
In this material, FSS was defined as a slope vector to quantify the morphological differences between the PPGPW pulses shown in Figure 1. Figure 2 helps explain the process of locating specific points when calculating the FSS, and the detailed steps were as follows.For a single PPGPW beat TTY,' firstly locate the peak point P and its projection O on the baseline TT.' We then locate points Q i which separate line P into n equal subsections and find the point Q i matching point Q i on the falling part of the pulse with the same amplitude 0Q i (0 ≤ i ≤ n −1).Finally, we calculate the slope of line 0Q i and mark it as K i .Thus, FSS was presented in the formula: In fact, the slopesK i was the ratio of the amplitude to the time duration for the PPGPW beat TT.' The subscript n was set to 10 in this research which meant that FSS was a 9-element vector for the subsequent analysis.Specifically, Figure 2 boxes two pulses from the preeclamptic and healthy patients, and the detailed FSS values for them were contrasted in Table .1

Machine learning algorithm
The purpose of FSS calculation is to confirm whether there is any discrepancy between pulses from preeclamptic and healthy patients, which is a binary classification task (PE or normotensive) based on the FSS actually (43).Recently, a series of ML algorithms in the field of computer science were introduced to study the PE classification problem based on different types of data (32)(33)(34)(35)(36)(37)(38)(39)(40).In general, depending on whether the data processed any data label or not, ML tasks could be divided into supervised learning and non-supervised learning (43).Obviously, the task above is an instance of supervised learning with the data label "PE presence", while the FSS values acted as the input data.The ML process could be summarized as utilizing part of the input data to train an ML model based on the specific ML algorithm and utilizing the rest to evaluate the performance of the model using the specific evaluation index.The assessment was often conducted comparing the true label with the predicted label for data not involved in the model training process (43).
After the preliminary screening of our research group, the K-nearest neighbors (KNN), a popular non-parametric supervised ML algorithm, was selected for the classification task (43,44).When training ML models, the KNN makes no assumptions about the underlying data distribution and there is no explicit process for constructing the models in fact.Instead, it classifies new data points based on their similarity to the existing examples.When predicting the data label for a given new input, the algorithm would find a certain number of the "nearest" data points among the existing examples and acquire their data label and then assign the input with the most common class label in the neighbors after a plurality vote.The nearest in the KNN implies the degree of data similarity, which is usually assessed using the distance metric.The K in the KNN refers to the number of nearest neighbors when voting (43,44).

Parameters evaluation
This section provides a more detailed explanation of the supervised ML processing process and the FSS evaluation.First, the pulse rather than the entire PPGW data piece was treated as the basic analysis element for ML processing, which could increase the number of inputs.Second, the FSS value for each pulse was treated as a vector input, and the nine vector elements in the formula.1 were treated as the different features of the input.For the data label of the pulse, the PE presence/absence was set to a logic value of 1/0 consistent with the PE state of the patient the pulse belonged.Third, the input data were divided into training set and test set, and the 5-fold cross-validation strategy was applied to avoid the trained  model overfitting the data, which meant that the model learned too much for the given data but behaved poorly for the unseen data (43).Fourth, the training set helped construct the classification model using the KNN, while the test set helped to evaluate it.Once the ML model has been trained, every random pulse in the test set could be predicted (determined) whether its data label is PE or not.For a random pulse, set the FSS value as the model's input, the model would generate a logic output 0 or 1.Since the pulse was treated as the basic analysis element, an additional statistic should be performed to obtain the prediction for a specific patient after the prediction value for all of her pulses.The exact amounts of her PE and normotensive pulses could be counted, which could be presented as the ratio of PE pulses to total pulses.Thus, all patients would process a predicted PE state varying from 0 to 1 and an actual PE state of 0 or 1.Moreover, considering that other possible covariates might contribute to the PE prediction, a binary logistic regression (BLR) analysis was also conducted together with the predicted PE state for further correction.

Statistical analysis
The pre-processes were conducted using the MATLAB (R2018a, The MathWorks, United States).The KNN classifier was constructed using the Anaconda Python (3.6.5, Anaconda, United States) and the scikit-learn package (0.19.1, open-source online) (33).SPSS analysis tools were utilized to quantify and assess the performance of FSS using SPSS Statistics (25.0,IBM, United States).Student's t-test and the Mann-Whitney U-test were used for categorical variables.The evaluation metric used to validate the models was the Area Under the Curve (AUC) from the Receiver Operating Characteristic Curve (ROC).The BLR was used for all possible covariates, and Hosmer-Lemeshow goodnessof-fit was also conducted for the calibration check.All figures were drawn using Origin Pro (b9.2.272, OriginLab Corporation, United States).

Demographics
The demographic information of the patient volunteers involved is shown in Table 2.It was suggested that there were no clinically significant differences in age, height, gestational week at sampling, and heart rate between healthy patients and those with PE.As expected, preeclamptic patients had significantly higher systolic and diastolic blood pressure when compared to the healthy.Moreover, the weight and body mass index (BMI) between the patients were statistically significant.

The model's output based on FSS
Figure 3 illustrates the proportion distributions of the model's output based on FSS for the patient volunteers involved.Intuitively, patients with/without PE tended to be a cluster, which was also proved by the SPSS analysis (p-value less than 0.001).Patients with PE tended to have a more significant predicted portion near 1, whereas those without PE had a smaller one near 0. The AUC and ROC analysis helped quantify how the parameter was affected, as interpreted in Table 3 and Figure 4. SPSS suggested that the AUC was 0.924 using the output to predict PE.The optimal portion threshold value for PE classification was 0.498: when the output portion was less than 0.498, PE could be predicted with a sensitivity of 84.1% and a specificity of 89.2%.

Parameters contrast
The widely used augmentation index (AIX) was not considered for comparison, as it is derived invasively

HYPERTENSION IN PREGNANCY
from the central aortic pulse (18).A comparison was performed to quality FSS between the comparative hierarchical area ratio (CHAR) proposed by our research group earlier (16).As shown in Figure 4, the KNN classifier based on FSS processed the best performance among all these methods.The AUC of CHAR was calculated following the previous standarddeviation-based and mean-based analysis methods based on the new dataset.As a supplement, we also computed the AUC of CHAR using the KNN classifier.Detailed results can be found in Table 3.The SPSS analysis suggested that the AUC was 0.878 when using the mean value of CHAR to indicate PE and the AUC was 0.749 when using the standard deviation.When using the KNN classifier to analyze the parameters, the AUC value was 0.891 for CHAR and 0.924 for FSS.

Binary logistic regression
A BLR was applied to distinguish PE using the portions and the statistically significant demographic information in Table 2 as covariates, including BMI, SBP, and DBP.The output of the KNN model based on CHAR and FSS, respectively, is contrasted in Table 4.The coefficients are given for the covariates involved as well as the calibration results.The p-values of the Hosmer-Lemeshow test for the two BLR models were both above 0.05, which showed that the generated BLR prediction models possessed good calibration capability.P values for the portions of the two models are both statistically significant (both <.001) compared to the p values for BMI (1.000 versus 0.816), SBP (0.062 versus 0.497) and DBP (0.388 versus 0.851).This proved that the PPGPW-based parameter played a more important role in the BLR models.

Discussion
Despite all the efforts of previous research, predicting preeclampsia is still challenging due to its high variability from patient to patient and the number of factors involved (1,2,5,6,(8)(9)(10)(11)(12).This research promoted FSS, a novel morphological parameter of PPGPW, to distinguish PE using the ML method KNN.An ML classifier was trained based on the FSS parameter and would determine whether the PE presence/absence for a patient by outputting a value between 0 and 1.The closer the value got to 1, the more likely the model determined the patient suffered from PE.Meanwhile, it was observed that the patient with/without PE tended to be a cluster for the learned models' output.Meanwhile, ROC analysis also explained that FSS processed a high sensitivity and specificity in distinguishing PE.

The machine learning
As ML technologies developed, researchers have already explored the possibility to apply them in the analysis of clinical medical problems (27)(28)(29)(30)(31), and PE prediction is one of these popular research topics (32)(33)(34)(35)(36)(37)(38)(39)(40).In fact, ML helps to identify the key  among the large number of influencing factors that may contribute to PE and the potential relationships between these factors and PE, which could be difficult for conventional statistical methods (32)(33)(34)(35)(36)(37)(38)(39)(40)43).On the other hand, previous ML studies on PE were usually based on the data of PE risk factors, PE biomarkers, and mRNAs (32)(33)(34)(35)(36)(37)(38)(39)(40).These types of data could not be acquired without professional medical devices, and usually cost a great deal at the same time.Furthermore, electrophysiological signals were rarely utilized in these studies.To solve these disadvantages, PPGW-based data were applied to find out the PE presence through ML methods and algorithms.PPG is a common and noninvasive signal in clinical practice and the detection process could be both fast and low-cost (1).

The KNN algorithm
When constructing the ML model based on FSS values in this research, the KNN algorithm was selected as the training method (43,44).As a matter of fact, there are quite a lot of supervised ML algorithms besides KNN.Decision tree (DT) and random forest (RF) were utilized to construct models as well (43)(44)(45)(46).Among these three algorithms, the KNN could achieve the best performance when predicting PE.As algorithm comparison was not the focus of our study, the relevant content was not given in the text.On the other hand, the results of our research showed that PE presence might be related to PPGW-based parameters.Therefore, exploring the best algorithm to construct a PE prediction model could be one of the goals for the next stage.

Parameters Effect
The results of our study suggested that FFS, a novel morphological PPGW parameter, is related to PE presence with the help of ML method.Since the ML method could be regarded as a black box model to an extent, the reason why this parameter effect is worthy of further discussion (43).Clinical consultation helped explain that PE might result in hemodynamic changes in the vascular system, including increased vascular stiffness and resistance, mean arterial pressure, and a decrease in small artery compliance and cardiac output (13,22,(24)(25)(26).Thus, a reasonable explanation is that these parameters defined by area and slope were two-dimensional reflections of the original PPGPW in integrations and ratios.Detailed parameter values quantify the morphological variations of PPGPW and eventually map the hemodynamic changes caused by PE.

Contrast with previous parameters
Our group defined and designed different morphological PPGPW-based parameters concerning arc lengths, angles, and time durations during the primary screening (16).Nevertheless, our previous research also indicated that these simple one-dimensional parameters had limited effects on PE distinguishing PE, and none could produce an AUC value above 0.85.Therefore, we focused on area-based (CHAR) and slope-based (FSS) two-dimensional parameters to quantize the details and trends of a pulse.As shown in Table 2, the AUC results suggested that the ML method might make the best use of data, and the novel parameter FSS had an advantage over CHAR to indicate PE.Moreover, as the ratio of PE pulses to total pulses calculated using the KNN classifier is normalized, it could also be regarded as a possibility index that predicts whether the patient suffers from PE.The learned best threshold could be viewed as the tolerance level for pulse aberration.

Contrast with statistically significant factors
As a supplement, a BLR analysis was carried out to find out the roles of the statistically significant factors in our study.As shown in Table 4, the Hosmer-Lemeshow test result suggested that the generated BLR prediction model possessed good calibration capability after combining PPGPW-based parameters with statistically significant factors.Moreover, the p-value of the BLR analysis showed the advantages of PPGPW-based parameters in PE distinguishing over SBP, DBP, and BMI.If this conjecture could be confirmed by results based on a larger dataset, the method proposed in this study would be worthy of further research.

Limitations
There were also some limitations throughout this research.The first issue was the insufficiency of data size for the patient records.A larger number of PPGPW records could lead to a more precise and convincing finding.The second question was about the definition and extraction of PPGPW parameters.The models' evaluation performance could be enhanced if better parameters were designed and set as the input.
Last but not least, the research results were based on a single parameter each time.Making full use of the parameter combinations would be the logical emphasis in the next step.

Figure 2 .
Figure 2. The diagram of falling scaled slope parameter.The pre-process had already adjusted the baseline TT" to 0. The curve TT" indicated acomplete PPGPW beat.Point P was the peak for the pulse, and Point O was the vertical projection of point P on the baseline.Points separated line OP into n pieces.Points were the matching points of on the PPGPW beat.

Figure 3 .
Figure 3. Predicted proportion distributions indicating PE based on PPGPW morphological parameters.The distributions generated from parameter CHAR (above) and FSS (below) were contrasted.The red circle represented the predicted output for the preeclamptic and the bluesquare stood for the healthy.The best threshold values for classification were also given.Threshold at 0.557 for CHAR could achieve a sensitivity of 0.818 and a specificity of 0.892, and threshold at 0.498 for FSS could achieve a sensitivity of 0.841 and a specificity of 0.892.

Figure 4 .
Figure 4.The contrast of ROC indicating PE based on the PPGPW morphological parameters.The AUC values for each parameter were marked in legend.The identity line was also drawn using dash dot.

Table 1 .
The FSS values contrast for the pulse from the preeclamptic and healthy patients.

Table 2 .
Demographic and clinical information of control and preeclampsia groups.

Table 3 .
Detailed AUC values and thresholds for different models involved.

Table 4 .
Detailed results for the binary logistic regression and Hosmer-Lemeshow test.