Simultaneous segmentation of all four chambers in cardiac ultrasound images

ABSTRACT Echocardiographic analysis of cardiac chamber volumes throughout the heart cycle is important to assess left- and right-heart function. We have developed and validated a method to simultaneously segment all four heart chambers in 3D echocardiography cine loops. We described the heart using several Doo Sabin models, one for each chamber and one for the pericardium. The model was fitted to 3D echo images using edge detection and a Kalman filter. The average signed error in estimating volume across all chambers was −1.0 ± 12.3 ml. These results show that the model slightly underestimates the volumes. Average signed error for the ejection fraction was −3.2 7.4 pts, with the highest error being the right atrium. This all-chamber automatic segmentation method provides high accuracy and may have utility in understanding chamber interactions.


Introduction
Echocardiography is the first-line modality for imaging the heart due to its low cost, real-time visualisation and absence of ionising radiation (Szabo 2013). Segmentation of cardiac chambers in 3D images allows for quantitative functional analysis of volumes and interactions (Frangi et al. 2001). For 3D ultrasound, there exists extensive literature on segmentation on the left ventricle (LV) and right ventricle (RV)for instance, by Orderud and Inge Rabben (2008), Bersvendsen (2016) and Bølviken et al. (2020). These works focused on a single chamber, but all used extended Kalman filters and Doo-Sabin surfaces as will be done in this paper. Deep learning has also become a prominent way of segmenting the LV, which has been done in 2D by Cui et al. (2022), in multi-view (MV) by Li et al. (2020) and Yang et al. (2020) and in full 3D by Dong et al. (2016) There has also been research on segmentation of the left atrium (LA) in ultrasound images by, for instance, Degel, Navab and Albarquoni (2018) and on the right atrium (RA) by for instance Hillier et al. (2010), using convolusional neural networks and topographic cellular contour extraction, respectively.
Complete heart segmentation can provide important information concerning functional analysis of the heart as volume measures are often used as a measure for cardiac function and for discovering abnormalities (Wang et al. 1984;Frangi et al. 2001;Punithakumar et al. 2013). Automatic simultaneous segmentation of all chambers takes less time than evaluating each chamber individually, and the segmentation of one chamber can provide guidance for the segmentation of the others.
Using one model means that intersecting chambers can be prevented and applying physical constraints like volume preservation becomes possible. In addition, a model segmenting all chambers could help in studying the interactions between the chambers. Several approaches to simultaneous four chamber segmentation have been madefor instance, by Pace et al. (2015), Zheng et al. (2008), Zhen et al. (2017) and Jafari et al. (2019). These works focus on CT or MR images. These works did not use Kalman filters or Doo-Sabin models, instead segmenting using tools such as Supervised Descriptor Learning or active learning.
Segmentation of all four chambers of the heart is challenging because of the high anatomical variability between individuals (Pace et al. 2015). For ultrasound images, there is the additional challenges of depth, sector width and probe position (Ostenfeld et al. 2012).
This work uses deformable models to simultaneously segment all four heart chambers in 3D ultrasound images. The algorithm uses a generic model with a strong presupposition regarding the final model's geometry and the model is modified based on an extended Kalman filter (Kalman 1960;Orderud 2010) with edge detection along the surface normals of the model. This model has the potential to work even if some of the four cardiac chambers are not fully shown in the image.
All ultrasound images used for training and validation were collected and anonymised from several institutions under data agreements with GE Vingmed Ultrasound ensuring compliance with applicable privacy legislations. The training and validation images were kept separate.

The deformable model
A deformable model was constructed to represent the four cardiac chambers and surrounding myocardium. The Doo-Sabin model was used for this purpose, a generalisation of quadratic B-splines originally described by Doo (1978). The Doo-Sabin model uses control nodes and a topology between those nodes to generate the surface. The surface can be evaluated locally around each node, splitting the surface into several patches. The Doo-Sabin algorithm gives rounded, organic-looking surfaces, which are well suited for medical shape modelling, and has been used for that purpose by Orderud and Inge Rabben (2008) as well as Dikici (2013). In addition, the surface vertices of the model are easy to compute, as are the surface normals.
The model consisted of four surface meshes meant to model the endocardium of each of the four heart chambers, and an outer layer representing the epicardium. The latter was used in order to determine the volume conservation of the cardiac tissue. The myocardium was considered to be nearly incompressible (Tavakoli and Amini 2013), and volume conservation was applied in the algorithm, as detailed in Section 2.3.3. As each of the chambers was modelled as a closed mesh, valves and vessels were not modelled.
In order to create a model where the chambers are modelled together, we devised several sub-models with shared nodes where the chambers meet. The model had 78 nodes, where 34 were associated with the LV, 40 with the RV, 27 with the LA and 20 with the RA. Twenty-one nodes were associated with both the LV and RV, along the septum and the base. Nine nodes were associated with the LV and LA along the LV base and 6 nodes were associated with both the RV and RA along the tricuspid valve. Ten nodes were associated with both LA and RA along the atrial septum and the base. Figure 1 shows the four chambers together, and Figure 2 shows them individually.

Model fitting
In order to fit the model to the image, an extended Kalman algorithm was used (Kalman 1960;Orderud 2010;Bersvendsen 2016). The Kalman algorithm is a method for estimating values based on both theoretical models of how a system function, and measurements of the system. The Kalman algorithm delivers continuous updates of the state vector as the system it is modelling changes. The extended version has advantages where the system is non-linear, using the Jacobian matrices of the system as input. The extended Kalman filter used in this article will be referred to simply as the Kalman filter in the rest of this article.
The implementation of the Kalman filters is mainly the same as used by Orderud (2010), with some changes detailed below. The state vector x that the Kalman filter outputted consists of three types of values. There are values for individual node displacement, changing the shape of the model to fit the image. This change is applied to the model first. Next, there are variables for the thickness of the walls between the models, which are applied second. Finally, the global variables handle the position, scale and rotation of the model. The Kalman filter has two stages for each time-step in the algorithm: prediction and updating. In our case, the first stage consisted of predicting the placement of the model in the current frame based on the value of the previous frame. This was done using a linear function slightly regressing the state vector towards the initial values. The updating stage used edge detection as a measurement to create an updated estimate of the state vector.
The edge detection used as input for the update state was mainly step edges, but a strongest gradient edge detection was used in the apical region. Edge detection was done in the normal direction of the surface at each evaluation point with a 2 cm capture range in each direction.
Some changes were made to Orderud's image segmentation framework. In order to improve accuracy, restrictions were placed on the Kalman filter output. The apex of the model was constrained to never move above the top of the image nor below the next layer of nodes, and the individual nodes were constrained not to deviate from the initial values beyond a threshold. The threshold was set so that the shape of the model should not be degenerated or self-intersect. These restrictions were determined based on training data.
These modifications were added to the filter itself, allowing the next frame's prediction step to use the modified values. This means that unlike the regular Kalman filter, which normally has a step of prediction based on theoretical models of behaviour and then an update based on measurements, this algorithm adds a final step of theoretical adjustments, based on the heart's physical properties. Figure 3 shows two examples of the model being fitted to images.

Modelling thickness
We used used a thickness parameter to determine the thickness of the walls between the chambers. Each node in the model was associated with a thickness value. After the preliminary node displacement based on node movement was done, the thickness value was used to adjust node placements. This displacement was done in the normal direction of the surface mesh.
The thickness was determined by the Kalman filter, and the divergence theorem was used to estimate the volume of the region around each node. The volume was then divided by the local patch area around the node. This gave the local node thickness. The mathematical foundation of the node thickness and how it interacts with the Kalman filter was covered in detail by Bersvendsen et al. (2017).
The myocardium was considered to be nearly incompressible due to its high water content (Tavakoli and Amini 2013), meaning the volume should be almost constant. This was accomplished by having the noise variable of the volume be set low in the Kalman filter. Change in thickness should mainly be caused by change in patch area caused by contraction and expansion of the chambers. Some variation of volume was allowed to capture differences between different images.

Validation of the algorithm
The algorithm was validated by doing a comparison with standard software for estimating cardiac function. The evaluation was done using volume metrics, and GE's 4D AutoLVQ, AutoRVQ and AutoLAQ features (EchoPAC SoftwareOnly v204, GE Vingmed Ultrasound, Horten Norway) with manual editing was used to establish a ground truth for the LV, RV and LA, respectively. Biplane Simpson was used to estimate the RA volume, as has been done previously by for instance Wang et al. (1984) and Aune et al. (2009). The algorithm was evaluated on 42 images. Across those 42 images, the LV was evaluated in 31 images, the RV in 16 images, the LA in 11 images and the RA in 14 images, for a total of 72 chambers. Which chambers were evaluated was determined by manually checking which chambers were fully visible in the image, and the evaluation was done before the algorithm was applied to the images. Validation of the algorithm consisted of evaluating end-systolic and end-diastolic volumes, stroke volumes and ejection fraction.
To prepare for validation, the model was manually tuned on 11 3D ultrasound images showing a total of 23 chambers. This tuning determined node configuration and layout. The training images were kept separate from the validation images.
For each image, a manual translation of the model was done. The model was translated to align the base of the septum wall in the LV with the image. Aside from that, the algorithm was fully automatic.

Results
Validation of the algorithm was done by comparing it to a ground truth as described in Section 2.4.4. All measures used for validation were based on end-diastolic volumes (EDVs) and end-systolic volumes (ESVs) over a single cardiac cycle, giving 144 different measurements across all 42 images. For all signed results in this paper, a positive value means that the algorithm overestimated the value, while a negative value means that it is underestimated. Signed average volume value for the ground truth at endiastole is shown in Table 1.
The average signed error and standard deviation across the 144 measurements were −1.0 ml � 12.3 ml. The signed average error and standard deviation for end-diastole volume estimates were −3.2 ml � 13.3 ml and for endsystole volume estimates were 1.2 ml � 10.7 ml. The median errors were −3.2 ml and 0.85 ml, respectively. For comparison, the mean ground truth for each chamber was 144 ml for the LV, 81 ml for the RV, 43 ml for the LA and 37 ml for the RA. The signed errors of end-diastolic and endsystolic volume estimates sorted by chamber type are shown in Table 2 and the mean absolute errors (MAE) are shown in Table 3. Table 4 shows the Pearson correlation coefficient for EDV and ESV for each chamber.  Calculating the stroke volume difference between model and ground truth, the average signed error and standard deviation for each of the 72 chambers was −4.4 ml � 11.1 ml, the median error was −3.3 ml. Table 5 shows the average error of each chamber type.
A measure of the quality of the model's evaluation is a comparison of the stroke volume of the LV and RV. In theory, the stroke volumes of the two chambers should be equal (Franklin et al. 1962). Five of the images had both LV and RV fully included in the field of view, and in these images, the signed average of the difference between LV and RV stroke volume in each image was 11.6 � 13.8 ml.
The ejection fraction of each chamber was calculated and compared to the ejection fraction of the ground truth. The average signed ejection fraction error per chamber was −3.2 pts � 7.4 pts and median was −3.1 pts. The highest error was found for the RA. Table 6 shows the signed average ejection fraction error and Pearson correlation coefficient for each chamber type. Figure 4 shows a Bland Altman plot of the ejection fraction estimates.

Discussion
The purpose of this research was to create a four-chamber model of the heart based on a 3D ultrasound image. Simultaneous segmentation of the chambers has advantages over individual segmentation in that good visibility and segmentation of one chamber can be used to improve the placement of other chambers and that less user placed landmarks are needed in order to perform segmentation. GE's AutoRVQ requires 6 points to be manually placed, but by segmenting together with the LV, LV's placement can be used as a guide. This is true even if the entirety of the LV is not visible, as long as the septal area is well visualised. In the same way, the LA and RA can more easily be placed in the image if other feature's placement can be used. This algorithm required only one input, which is needed due to the large variation in how the heart is placed in the image.
Some restrictions on the model were introduced in Section 2.2. These included restrictions to make sure that the model did not self-intersect and to stop the apex from going above the top of the ultrasound image. The reason for these restrictions was to avoid the outer RV wall being attracted towards the septum, which can happen if the outer wall is not properly visualised. The same is the case for not allowing the LA and RA apex to move too high up towards the base.
The presuppositions about the cardiac geometry is useful to have robustness against reduced view or image quality. In general, adding restrictions to the Kalman filter output based on what output is physically possible can be a good way of improving accuracy. This is especially true if the restrictions are based on different information than what is used in the prediction step, as it means that more of the systems characteristics can be put into the algorithm. Such a stage could have advantages in other applications of the Kalman filter.   It is possible that this four-chamber model could be helpful in cases where the field of view is partially limited for a chamber. The strong presuppositions on the final shape inherent to a deformable model means that it could be better at guessing the partially seen shape. An example of this is seen in Figure 5. Validating and exploring this approach is an avenue for further research.
The average EF error was slightly negative for all chambers, meaning the four chamber model underestimated the EF. In addition, average end-diastole volume was underestimated for all chambers, while the end-systole volume was overestimated, pointing to an issue where the model is too rigid and having a slight issue with changing enough between end-systole and end-diastole. In median values, however, the signed LV and RV EF errors were positive, pointing to a slight overestimation of EF instead. This seems to be due to a few images with a high underestimation. This is likely to be cases where the image quality was poor and high uncertainty about proper surface placement.
Segmentation of the RA has been less studied than the other chambers, and in this work a 2D biplane Simpson method was used for determining volume instead of a true 3D segmentation method. Use of the biplane Simpson for atrial volumes is considered standard and has been used by for instance Wang et al. (1984) and Aune et al. (2009). The latter researched 3D methods of atrial volume quantification, but as that is not considered standard it was not used as a ground truth in this article. Aune et al. does conclude that a 3D method has higher reproducibility, and the high error of the RA in this article could partially be explained by the ground truth being a 2D method instead of 3D like the other chambers. In addition, the Simpson method is not designed for RA volume quantification in particular, and there is no commercial software specifically designed for RA assessment. Using a more generic method rather than a specially designed one as ground truth could further explain the high error. This work could be considered an extension of the biventricular method made by Bersvendsen et al. (2017) also made for segmentation of ultrasound images. The implementation of the multichamber algorithm and the models used are different, even though the mathematics of thickness adjustment is the same. Bersvendsen reported errors of −0.7 � 5.2 pts for the LV ejection fraction and 2.4 � 7.2 pts for the RV. This is close to the values found in this paper.
A comparison can also be made to work on one-chamber cardiac segmentation of ultrasound images. Orderud and Inge Rabben (2008) used Kalman filters to segment the LV and had a 3.6 � 21.4 ml error for ED and 9 � 17.4 ml error for ES. The algorithm presented in that paper used fewer control nodes and that combined with the increased focus on an accurate apex in this paper might explain the difference. In general, the four-chamber model is as accurate as other segmentation algorithms using similar techniques.
There are several previous works dealing with four-chamber models,for instance, by (Pace et al. 2015;Zheng et al. 2008;Zhen et al. 2017;Jafari et al. 2019). These works focus on CT or MR images and use different methods than those found in this work. In terms of ultrasound, Medvedofsky et al. (2018) did a four-chamber segmentation of 3D ultrasound images using an automated adaptive analytics Algorithm. While all four chambers were segmented, the only metrics were based on LV and LA volumes.
Comparing Pearson correlations between this work and the one in Medvedofsky et al. (2018) the values are similar for LV EDV, LV ESV and LAV, but our work has a lower EF correlation. LA EF correlation is not listed in the other paper, and it only has a single value for LA volume, instead of EDV and ESV. In terms of signed average error for LV EDV, LV ESV and LV EF, this work got better results. Comparison of LA values is slightly more difficult as they did not provide LA ESV or LA EDV but instead LA volume at LV ES, but they list a higher signed average error than the ones in this paper for either LA volume error.
Deep learning methods have risen in popularity in later years and have proven a viable tool in many fields including image analysis (Sermesant et al. 2021;Li et al. 2020) used deep learning on multi-view images of the LV and compared it with several other algorithms on volumetric clinical indices. The signed average errors in their MV-RAN model for LV EDV and ESV were −7.5 � 11.0 ml and −3.8 � 9.2 ml, respectively. Signed average error on EF was −0.9 � 6.8 ml. This means that their error was smaller in terms of both LV EDV and ESV, but EF was essentially the same, with the algorithm in this paper being a small improvement.
Li et al. also calculated the signed errors for clinical indices in the U-Net, U-Net++ and ACNN models. Much like with MC-RAN, their signed error was better than our model for LV EDV and ESV, but comparable or slightly worse in terms of EF. Figure 5. A comparison of a single chamber and four chamber approach in a case with partially obscured LV. The four chamber approach is shown on the left, the single on the right. the four chamber version of the LV is better at determining the apex and the base, likely helped by the RV and LA placement.
A possible reason for the difference is the use of multi-view (MV) instead of full 3D images, as is used in this work. As both ground truth and model only relied on three 2D images, good accuracy only in those directions would lead to good results, so it might be slightly easier to get low errors with multi-view instead of 3D. The advantage of our model in terms of EF might be that if the model overestimates the chamber in EDV, it might make a similar overestimation at ESV, cancelling the error out in EF calculation. This is because the segmentation for each frame depends on the previous, so if an area has low image quality, an error is likely to persist and be segmented the same way across frames.
For deep learning used to segment a fully 3D image, Dong et al. (2016) used a random forest to determine LV volumes and used LV EDV, LV ESV, and LV EF as metrics. Their results seem to have higher confidence intervals, with LV EF having a mean error of 4 pts and a confidence interval of (−19 pts, 12 pts), compared to a mean error of −1.2 pts with a confidence interval of (−12.7 pts, 10.5 pts). Nillesen et al. (2016) used a fully automatic algorithm using deformable models to segment the RV in 3D TEE images. The signed errors found there are slightly bigger, which could in part be due to the difference between a fully automatic model and the model in this work, which requires a manually done translation. This difference is important to consider in all the above comparisons.
The deep learning articles referred to above have some commonalities in comparison to this article. They do not require user input, as they evaluate the entire image, but they appear to have weaker consistency across time. They all focus on one chamber, unlike the main novelty of this paper of evaluating all chambers. This also allows for the evaluation of the lessstudied LA and RA chambers. The promising results in this paper suggest that multi-chamber evaluation is a good method. While the EDV and ESV results are slightly worse in this work than the deep learning ones, the EF values are mostly better.
Pearson correlation can be used to compare the output of the algorithm with the ground truth. The results in this paper can be compared with Moradi et al. (2019) for the LV and Nillesen et al. (2016) for the RV. Our algorithm got similar correlation coefficients, with the exception of having slightly better values for RV quantification, and worse results for the LV ejection fraction correlation, where Nillesen et al. achieved a value of 0.85.
The model could easily be expanded to determine important landmarks in the model or to include more features, like the aortic outflow tract. This would lead to an even more extensive evaluation and could be used to evaluate, for instance, the aortic valve size. This could be a direction of further research.
There are some limitations to the study. Ultrasound images often have a limited field of view, and especially the RV free wall can be difficult to determine in the anterior wall (Ostenfeld et al. 2012). This could indicate that there is some uncertainty to the ground truth values, and this is especially true for the right chambers.
Some chambers had high error either in terms of enddiastolic volume or end-systolic volume estimation. In particular, determining the LV apex and the anterior RV wall proved difficult, as can be expected from the sometimes poor image quality in that area (Ostenfeld et al. 2012).
This model was not made to run in real time. Segmenting a single frame takes an average time of 0.1 s. By removing nodes in areas considered less important, real-time running could be achieved, but for this work accuracy was considered more important.

Conclusions
In this paper, we have developed a method for simultaneous segmentation of the endocardial borders of the four-chamber model of the heart. The algorithm tracks the left and right ventricles, and left and right atriums in an ultrasound image using a Kalman filter with several implicit assumptions on the output.
The model achieved good agreement with manual measurements of the chamber volumes and may be useful for volume estimation and landmark placement.