Predicting adverse perinatal outcomes among gestational diabetes complicated pregnancies using neural network algorithm

Abstract Objective The primary aim of this study is to utilize a neural network model to predict adverse neonatal outcomes in pregnancies complicated by gestational diabetes (GDM). Design Our model, based on XGBoost, was implemented using Python 3.6 with the Keras framework built on TensorFlow by Google. We sourced data from medical records of GDM-diagnosed individuals who delivered at our tertiary medical center between 2012 and 2016. The model included simple pregnancy parameters, maternal age, body mass index (BMI), parity, gravity, results of oral glucose tests, treatment modality, and glycemic control. The composite neonatal adverse outcomes defined as one of the following: large or small for gestational age, shoulder dystocia, fetal umbilical pH less than 7.2, neonatal intensive care unit (NICU) admission, respiratory distress syndrome (RDS), hyperbilirubinemia, or polycythemia. For the machine training phase, 70% of the cohort was randomly chosen. Each sample in this set consisted of baseline parameters and the composite outcome. The remaining samples were then employed to assess the accuracy of our model. Results The study encompassed a total of 452 participants. The composite adverse outcome occurred in 29% of cases. Our model exhibited prediction accuracies of 82% at the time of GDM diagnosis and 91% at delivery. The factors most contributing to the prediction model were maternal age, pre-pregnancy BMI, and the results of the single 3-h 100 g oral glucose tolerance test. Conclusion Our advanced neural network algorithm has significant potential in predicting adverse neonatal outcomes in GDM-diagnosed individuals.

These risks are associated with elevated glucose levels in gestational diabetes (GDM), but the connection is intricate and influenced by various interconnected factors, including and estimated fetal weight and maternal weight [8].While early delivery may increase the risk of neonatal prematurity related morbidity, delivery at later gestational ages exposes the neonate to risks such as LGA and macrosomia with their associated consequences [9].Therefore, riskbased patient counseling is necessary, as perinatal outcomes tend to differ according to existing risk factors.Early detection of high-risk women might reduce adverse outcome by early or more frequent monitoring, treatment adjustments and interventions such as early delivery.Stratifying affected women by their risk of pregnancy complications requires a method to estimate the absolute risk of future events in an individual based on readily available characteristics.Many fields of medicine have seen rapid growth in the development of prediction models.However, such models are rarely translated to clinical practice [10].Machine learning is subset of artificial intelligence (AI) which allows computers the ability to "learn" with data, without being explicitly programmed.These advanced algorithms which detect patterns in data, have increasingly attracted attention because of their superior predictive ability compared with statistical models [11].One of the advantages of this methodology is the ability to find correlations in an automatic way, without handcrafting the features.These innovative approaches have yet to be widely tested in obstetrics [12].
Several studies have identified individual risk factors associated with adverse perinatal outcomes in GDM patients [10,[13][14][15][16][17], but none have used AI for poor neonatal outcome prediction.In our study we aimed to determine if using AI and machine learning algorithms, such as neural network models, could predict adverse neonatal outcome in people with GDM.

Methods
A retrospective study was carried out by reviewing medical records of all pregnant people diagnosed with GDM who were delivered at a single tertiary medical center, between November 2012 and July 2016.

Study population
Eligibility criteria for study participation included all women with singleton gestations diagnosed with GDM during pregnancy, and for whom the diagnosis, follow-up, treatment and delivery all took place at the maternal-fetal medicine clinics and delivery ward at our medical center.All people had completed GCT (50 g glucose chalnage test) and OGTT (100 g oral glucose tolerance test).Exclusion criteria included multifetal gestation, known genetic or anatomic fetal abnormalities, pre-gestational diabetes or suspected type 2 diabetes diagnosed during pregnancy, and women who delivered before 34 weeks of gestation.

GDM diagnosis and management
GDM was diagnosed according to the two-step approach [18].In short, universal screening was done by a 50 g, 1-h, glucose challenge test (GCT), between 24 and 28 gestational weeks.Women with a GCT value higher than 140 mg/dL were considered screen positive and were followed by a diagnostic 100 g, 3-h, oral glucose tolerance test (OGTT), performed after overnight fasting.GDM diagnosis was made if one or more of the glucose measurements exceeded the established thresholds according to Carpenter and Coustan's criteria -fasting glucose > 95 mg/dL; 1-h > 180 mg/dL; 2-h >155 mg/dL; 3-h > 140 mg/dL [15].
Management of people diagnosed with GDM was done by a multidisciplinary team, led by maternal-fetal medicine specialists.This included lifestyle education for appropriate nutrition and exercise, conducted by a nurse educator and certified dietician.Antenatal care, including surveillance for glycemic control, maternal weight and fetal growth, was completed during routine follow-up visits every 1-4 weeks.Self-monitoring of blood glucose was performed via glucometers 6 times daily: while fasting, pre-prandial and 1 or 2 h post-prandial.The following thresholds were pursued: 90 mg/dL while fasting, 95 mg/dL pre-prandial and 120 mg/dL 2-h postprandial or 140 mg/dL 1-h postprandial.Primary recommended treatment was lifestyle modifications and diet.If glucose thresholds were met at more than 80% of measurements, glucose control was considered satisfactory.Otherwise, pharmacological treatment was initiated with either Glyburide or Insulin, at the discretion of the treating physician.

Outcome measures
We defined a composite of adverse neonatal outcome including any of the following: LGA, defined as neonatal weight above the 90th percentile for gestational age according to nationally accepted growth curves [19]; Small for gestational age, defined as neonatal weight under the 10th percentile for gestational age according to nationally accepted growth curves [19]; shoulder dystocia; umbilical cord pH < 7.2; neonatal intensive care unit admission for any reason; neonatal hypoglycemia (glucose below 45 mg/dL); respiratory distress syndrome; hyperbilirubinemia or polycythemia.

Machine learning algorithm
We applied a machine learning technique to predict the composite adverse neonatal outcome based on maternal factors.Machine learning is a subset of artificial intelligence in the field of computer science that often uses statistical measures to allow computers the ability to learn data, without being explicitly programmed.Machine learning, by learning to predict an expected outcome out of given samples, can find complex connections and correlations which are initially hidden, nearly impossible to detect otherwise.For machine learning, in order to learn the correlations between data samples and be able to generalize to unseen data, at least hundreds of samples are needed.
Our dataset was comprised of samples collected from 452 people with GDM.Each woman's tested parameters were simple basic measurements, available in any prenatal follow-up, and included: maternal age, parity, gravidity, pre-pregnancy body mass index (BMI), GCT and OGTT values, maternal weight (pre-pregnancy, at GDM diagnosis, and delivery), type of treatment modality for GDM (diet or pharmacological -glyburide or insulin), and degree of glycemic control (categorized as poor or good control).These parameters were used as input features for the model to learn from, with the composite neonatal outcome as the output to predict.
Since our dataset was unbalanced (131 samples with adverse composite neonatal outcome, compared to 321 samples without the composite neonatal outcome), we balanced the training dataset by randomly duplicating the adverse composite neonatal samples.
Our model was implemented using a 4-layer fullyconnected neural network.The data was fed after batch normalization, and another layer of drop-out was applied to prevent the model from memorizing the training samples and overfitting the data.We trained our model for 10K epochs, with batch size of 100 and using Adam optimizer [20,21].Our model was coded using Python3.6 with Keras framework based on Google's TensorFlow.
Both models were evaluated, the neural network was used and achieved the best accuracy.XGBoost was used as it gives a better visibility of the data, with an importance score for each factor (F-score).

Statistical analysis
Standard statistical analysis was performed using the SAS software (SAS Cooperation, Version 9.4).Descriptive statistics are presented by number and percentage for categorical variables, and by means and standard deviations for continuous variables.A probability value below 0.05 was considered significant and was calculated using the chi-square test or one-way ANOVA, as appropriate.Accuracy score quantifies the proportion of accurate predictions made by a model out of the total number of predictions made.It is computed by dividing the number of correct predictions by the total number of predictions.

Ethics
The study was approved by the local institutional review board at Rabin medical center (Approval no.0020-17-RMC).Informed consent was waived due to the retrospective design of the study.

Results
Overall, we retrieved full data for 452 people with GDM, according to our inclusion and exclusion criteria.Baseline characteristics and maternal-neonatal outcomes, stratified according to the primary outcome, are presented in Tables 1 and 2, respectively.
We compared 131 (29%) women with adverse composite neonatal outcome with 321 (71%) women with a normal neonatal outcome.We observed significant differences between the subgroups in only one maternal parameter which was planned a-priori to be included in the input parameters for the model; Mean GCT value was higher in the adverse neonatal adverse outcome group compared to the normal one (168.04± 32.81 vs. 161.44± 25.26 mg/dL, p ¼ 0.008).In addition, there was a higher rate of cesarean delivery in the adverse neonatal outcome subgroup (35.6% vs. 21.9%,p ¼ 0.002).Mode of delivery, as a maternal outcome, was not included in the algorithm's pre-defined input data or outcome.
All other parameters were not significantly different.We used our neural algorithm network, applying each patient's simple basic features, as outlined in the methods section.For the first run we have used all the data parameters known at the time before delivery We demonstrated an accuracy of 91% for identifying those women who will deliver a baby with an adverse composite neonatal outcome.Figure 1 illustrates the training process for our model.Variable importance in the extreme gradient boosting model for adverse neonatal outcome among GDM patients ranked from most to least important (Figure 2).
We also generated a logistic regression model.The sensitivity and specificity of the regression model were 75% and 76%, respectively (p < 0.01), with accuracy of 78%.
Next, in a second run of the algorithm, we included only parameters which were known at the time of GDM diagnosis, between 24 and 28 weeks of gestation, before follow-up and treatment of GDM began.Using these input measurements of age, parity, gravidity, BMI, GCT and OGTT values and maternal weight (pre-pregnancy and at GDM diagnosis), disregarding type of treatment modality, degree of glycemic control and weight at delivery, we demonstrated an accuracy of 82% for identifying those women who will deliver a baby with an adverse composite neonatal outcome.

Principal findings
We conducted a retrospective analysis of 452 people with GDM, aiming to predict adverse neonatal outcome using a machine learning algorithm.Our main results demonstrate that the algorithm can predict and identify people with GDM who will eventually have an adverse composite neonatal outcome.We demonstrated an accuracy of 82% at time of GDM diagnosis and 91% at delivery.
We were able to categorize the relative contribution of each selective feature to the overall model.Surprisingly, most of the prediction rests on factors preceding treatment for GDM.Most of the post diagnosis features -treatment modality, glycemic control and additional weight gain -contribute an additional 9% to the prediction of adverse outcome.
Similar individual features were also shown, as either positive or negative contributors to adverse outcome, in several recent studies.In the Atlantic DIP study there was no difference in hypoglycemia rates in newborns of patients with GDM treated with insulin vs. medical nutritional care [22].Cheng et al. demonstrated that people diagnosed with GDM who had gestational weight gain above the IOM guidelines had a higher risk of undesirable  outcomes, including preterm delivery, macrosomia, and cesarean delivery [17].Antoniou et al. showed that pre-pregnancy BMI, gestational weight gain, maternal treatment requirement and HbA1c at the end of pregnancy can predict adverse outcomes in people with GDM [23].Cosson et al. showed that overweight and obesity were associated with LGA infants regardless of GDM status [24].

Results
Overall, the rate of adverse outcomes in our study is similar to previous studies [7,21].Langer et al. [3] showed that in people diagnosed with and treated for GDM, a similar composite outcome was demonstrated for approximately 37% of the population.
Having been thoroughly investigated, it is well known that GDM can cause a variety of maternal, fetal and neonatal complications [3,14,25,26].Risk factors that increase chances for an adverse outcome have also been well described.Minji et al. found that pharmacologically treated GDM, increased maternal BMI and increased fetal biometry are all risk factors for adverse neonatal outcome [15].Feng et al. reported that as the number of hyperglycemic values in the OGTT increased, the risk of LGA and macrosomia [16].The results of the Hyperglycemia and Adverse Pregnancy Outcome study (HAPO) demonstrated a continuous relationship between 7 maternal glucose categorical levels on each of the three values of the 75-g OGTT and rates of cesarean delivery, LGA, clinical neonatal hypoglycemia and fetal hyperinsulinemia [26].Some of these findings are supported by our data, as we also demonstrated that specific factors such as GCT values and poor glycemic control are more prevalent in the adverse outcome population.However, up until now, these risk factors were estimated individually and independently and were not integrated into a single prediction model.
Importantly, we demonstrated that most of the maternal characteristics in our two groups were similar.To a "naked eye" this would have been interpreted as inability to identify those at risk.Nevertheless, in our prediction model, this capability to differentiate high versus standard risk for complications, clearly emerges.

Clinical implication
GDM prevalence is increasing worldwide [27].As a result, a greater proportion of pregnancies are identified as being at high risk and are managed with additional education, lifestyle modification, pharmacological therapy, and other interventions.However, intervening in a greater proportion of pregnancies has not necessarily led to an overall reduction in pregnancy complications [17].Furthermore, such interventions lead to increased overall diabetesrelated healthcare costs [28] and psychosocial burden for affected people [29].Therefore, there is a need to develop a more advanced and personalized risk and prognosis approach to GDM considering other relevant clinical factors driving adverse outcomes.Such a prediction model will allow calculation of odds for a pregnancy complication to occur and offer tailor-made management, updated throughout the course of gestation.
Using Machine learning and neural networks we can find correlations and connections between parameters that were currently unknown and unthought-of, to better predict our desired outcome [30].Unlike non-machine-learning methods, that aimed at predicting constant weights for each input parameter, the machine learning algorithm can constantly tune the weight with each training example [12].Therefore, the method is improved and updated on each new patient.The more cases we have for training, the better generalization and prediction accuracy we get.Hence, we expect to attain even better results in the future, after collecting more patient examples and labels [31].To date, a physician assesses a patient's condition based on his previous knowledge and experience, usually unable to mathematically address each patient's full characteristics.Using a machine learning algorithm, trained on hundreds of real examples, can equip the physician with a powerful tool to assess the patient's condition and prognosis.Assessing the importance of continuous parameters, such as OGTT values, together with discrete parameters such as the use of medications, is difficult even for an experienced physician, causing important factors in decisionmaking to often be neglected.Implementing an external algorithm, that takes into account all patient factors together, and outputting a predicted outcome, can be a significant tool in the hands of the physician.With this powerful tool, redundant interventions may be prevented and adverse neonatal outcome minimized by appropriate medical intervention, whose impact can be quantified.

Strengths and limitations
Development of models predicting the risk for various obstetric complications has been increasing.However, to the best of our knowledge, no research as of yet has taken into account the entire set of risk factors in order to generate a reliable prediction model by either a calculator or by using a more advanced model such as machine learning.There is no algorithm or calculator model published or used on an everyday basis in the clinic today.Clinical implementation is lacking, which may be due to limited evaluation of prediction model performance, impact, and usefulness in clinical practice [32].
Our study is not free of limitations, mainly due to its size.The more cases available for training, the better generalization and prediction accuracy we get.Hence, we expect to attain even better results in the future, after collecting more patient samples and labels.Additionally, this is a single-center retrospective study with a limited rate of heterogeneity in the study population, which may differ from other hospitals.Moreover, because there were only a small number of occurrences for each adverse neonatal outcome, we opted to predict a composite neonatal outcome rather than each event individually.However, the prediction model is general and can be retrained with any new dataset.In this manner, the prediction model potentially can be applied to women in other regions or communities, and from other races.
Nevertheless, despite these limitations, we have demonstrated a highly accurate prediction model.

Conclusion
Each woman and each pregnancy are unique and comprised of many parameters which can be influential for the outcome.Our algorithm showed promising results in personalized medicine which recognizes diabetic pregnant people at risk for adverse neonatal outcome.The algorithm provides an opportunity for risk-assessment to identify at-risk patients who may benefit from early monitoring and intervention.Our next steps are validating the algorithm and performing prospective trials to test whether it can improve or lower the risk of adverse neonatal outcomes.

Figure 1 .
Figure 1.Figure1illustrates the training process for our model.In each epoch (the x-axis) the model sees the entire dataset and tunes its parameters according to the distance from the prediction to the ground truth results.After each epoch, the model predicts the outcome while training loss decreases with epochs, and so does the validation.The presented F score is a metric specific for XGBoost method, that indicates the relevance of a feature to the prediction, and it is used as a relative measure between the features.

Figure 2 .
Figure 2. Variable importance in the extreme gradient boosting model for adverse neonatal outcome among GDM patients, which was the best performing feature.The top ten variables, ranked from most to least important.BMI: Pre-Pregnancy Body Mass Index; OGTT3: 3-Hours Oral Glucose Tolerance Test value; OGTT1: 1 hour OGTT value; Wg_Diag: maternal weight at GDM diagnosis; OGTT2: 2 hours OGTT value; OGTT0: fasting OGTT value; Hg: maternal height; Wg_Pre: pre-pregnancy maternal weight.

Table 1 .
Maternal demographic and baseline characteristics of the study cohort, stratified according to neonatal outcome.Data are presented as n (%) for categorical variables and mean ± standard deviation for continuous variables.Poor Neonatal outcome was defined as a composite neonatal outcome including any of the following: Large for gestational age, small for gestational age, shoulder dystocia, umbilical cord pH < 7.2, neonatal intensive care unit admission, respiratory distress syndrome, hyperbilirubinemia or polycythemia.

Table 2 .
Maternal and neonatal outcomes of the study cohort, stratified according to neonatal outcome.Data are presented as n(%) for categorical variables and as mean ± standard deviation for continuous variables.Poor Neonatal outcome was defined as a composite neonatal outcome including any of the following: Large for gestational age, small for gestational age, shoulder dystocia, umbilical cord pH < 7.2, neonatal intensive care unit admission, respiratory distress syndrome, hypoglycemia, hyperbilirubinemia or polycythemia.
NICU: neonatal intensive care unit.