Encoder-decoder with dense dilated spatial pyramid pooling for prostate MR images segmentation

Abstract Automatic segmentation of prostate magnetic resonance (MR) images has great significance for the diagnosis and clinical application of prostate diseases. It faces enormous challenges because of the low contrast of the tissue boundary and the small effective area of the prostate MR images. In order to solve these problems, we propose a novel end-to-end professional network which consists of an Encoder-Decoder structure with dense dilated spatial pyramid pooling (DDSPP) for prostate segmentation based on deep learning. First, the DDSPP module is used to extract the multi-scale convolution features in the prostate MR images, and then the decoder is used to capture the clear boundary of prostate. Competitive results are produced over state of the art on 130 MR images which key metrics Dice similarity coefficient (DSC) and Hausdorff distance (HD) are 0.954 and 1.752 mm respectively. Experimental results show that our method has high accuracy and robustness.


Introduction
Prostate cancer is one of the most common cancers in men. Statistics from the National Cancer Institute has shown that 164,690 new prostate cancer patients are expected to be added in 2018, which is the highest proportion of all male cancers [1]. The diagnosis of prostate disease has always been the focus of imaging research. Currently common imaging techniques for prostate imaging include rectal ultrasound (TRUS), computed tomography (CT), and magnetic resonance imaging (MRI). Compared with other imaging methods, the image quality of MRI is clearer for distinguishing prostate anatomical regions and more sensitive to diseased tissue. Therefore, MRI is recognized as the most effective method for diagnosing cancerous prostate, and plays an important role in assessing the nature of prostate lesions [2].
Clinical manual segmentation of the prostate often requires expert manual interaction, which is time-consuming, poorly reproducible, and dependent on specialist experience. The automatic segmentation can improve the repeatability of the results and the clinical efficiency, which has important clinical significance. Many semi-automatic or fully automated methods [3][4][5][6] have been proposed for the segmentation of various organs and tissues in medical images, but automatic prostate segmentation is difficult. The main reasons affecting magnetic resonance images segmentation are as follows: (1) Prostate tissue has low contrast with other surrounding tissues and it is difficult to distinguish the boundaries of it. (2) There is less effective information available in a MR image because of the small size of prostate tissue. (3) The shape of prostate is varied, which brings difficulty to segmentation algorithm. Complicated algorithm time consumption may delay clinical diagnosis.
Litjens et al. [7] used the anatomy, gray value and texture features to classify the internal body elements of the prostate that realize the full segmentation of the internal and external contour. The outer contour is segmented manually and used as the initialization of the inner contour, which makes the whole prostate segmentation time-consuming and laborious. Zhang et al. [8] proposed a new prostate MRI two step segmentation method based on the edge distance adjustment level set evolution to realize the full segmentation of the prostate contour. The segmentation effect of this method depends more on the quality of the segmented image, and it takes a long time to train a large number of atlas for the segmentation of an image. Jia et al. [9] proposed a coarse-tofine prostate segmentation approach based on a probabilistic atlas-based coarse segmentation. Mahapatra et al. [10] proposed a fully automated method for prostate segmentation using Random forests and graph cuts.
In recent years, deep learning has outperformed state of the art in many fields such as computer vision and medical image processing. The availability of large annotated medical imaging data now makes it feasible to use deep convolutional neural networks (DCNNs) for medical image segmentation and classification [11]. Zhu et al. [12] proposed a model named Deeply-Supervised CNN to segment prostate MR images. Karimi et al. [13] put forward a prostate MR image segmentation strategy based on CNN and statistics. Zhan et al. [14] used deconvolution neural network to segment the MR images of prostate. These methods solve the problem of the segmentation of the prostate MR image in some degree. However, the automatic segmentation of the prostate MR image is still a huge challenge, due to the large variability in prostate contour, the interference of the surrounding tissue and the imaging artifact.
In this paper a deep neural network is proposed to solve the above problems. It contains an Encoder-Decoder structure with dense dilated spatial pyramid pooling (DDSPP). Firstly, DDSPP is used to extract multi-scale features of MR images, and then decoder is used for up-sampling to obtain prediction information.

DDSPP
Dilated convolution [15] which named atrous convolution in [16], it can increase the receptive field exponentially without reducing the spatial dimension ( Figure 1).
The dilated convolution can increase the receiving field of the convolution kernel. When the convolution kernel size is K Â K; the rate is R and the receptive field of the convolution kernel is: When rate ¼ 6, the convolution kernel size is 3 Â 3; In Deep Labv3 [17], ASPP (Atrous Spatial Pyramid Pooling) is used to obtain multi-scale context information, and the prediction results are obtained by up-sampling directly. The ASPP and cascaded modules with dilated convolution are shown in Figure 2.
Combining the advantages of the ASPP and cascaded modules with dilated convolution, the DDSPP which can generate more scale features over a wider range is designed. It's shown in Figure 3.
Stacking two convolutional layers together can give us a larger receptive field. Suppose we have two convolution layers with the filter size K 1 and K 2 respectively, the receptive field is: When rate ¼ 6, 12 will result in a new receptive field of size F ¼ 37 Â 37: Table 1 shows the contrast of receptive field between DDSPP module and ASPP module. Where F 1 ; F 2 ; F 3 are the receptive field of the dilated convolution of rate ¼ 6,12,18 in the ASPP module and the stacking receptive field as shown in Figure 3.
It is obvious that dense connections between stacked dilated layers are able to compose feature pyramid with much denser scale diversity. The receptive fields of DDSPP are larger than the ASPP.

Encoder-decoder
Encoder-Decoder architecture is successful in many computer vision tasks, such as human pose estimation [18], object detection [19], and semantic segmentation [20][21][22][23]. The Encoder-Decoder network includes an encoder module and a decoder module. The encoder module gradually reduces the feature maps and captures higher semantic information and the decoder module gradually recovers space information.

Network architecture
In this paper encoder with DDSPP is used to get information on MR images of the prostate, which can get the edge information of the prostate more clearly. And then gradually recover the details of the prostate through the up-sampling. With the image convolution and pooling, the resolution decreases, deconvolution the feature map will lead to the rough output and loss of many details directly. Therefore, we connect the low-level features and high-level features to produce more accurate results. The network architecture is shown in Figure 4.

Experimental results
The experimental platform for this paper is tensor-flow1. 6 Figure 6. From Figure (a-1-a-3), Figure ( Figure (c-1-c-3), Figure (d-1-d-3) we can know that our method can accurately segment the prostate MR images and overcome the effects of the around tissues and identify prostate tissue as entire section. Figure (d-1-d-3) show small difference between our results in green and the ground truth in red, because the prostate tissue has so low contrast with other surrounding tissues and the shape of them become so small. Significantly, our method shows a delightful impact in the prostate MR images segmentation.

Performance evaluation
Evaluation process aims to measure the performance of proposed scheme. In this paper the segmentation precision of prostate MR images can be evaluated from shape distance and area overlap. In this experiment, two parameters are used to quantitatively evaluate segmentation algorithm. The performance is Dice similarity coefficient (DSC), accuracy, Intersection over Union (IoU) and Hausdorff distance (HD).
DSC calculates the degree of similarity between the two contour regions. DSC is computed as  (a-1) (e-1) (e-2) (e-3) Figure 6. Segmentation results of our network on representative MR images from different patients.
Accuracy and IoU are defined as where true positive (TP) represents the common area of manual segmentation and algorithm segmentation. True negative (TN) represents the manual segmentation of external and algorithmic segmentation of the external common area. False positive (FP) represented in the algorithm segmentation area, but outside the manual segmentation area and false negative (FN) represents an area that is contained within the manual outline but is missing by the algorithm. HD reflects the biggest difference between the two contour points set. Suppose there are two sets A ¼ fa 1 ; a 2 ; . . . ; a p g and B ¼ fb 1 ; b 2 ; . . . ; b q g; then the HD between the two-point sets is defined as HDðA; BÞ ¼ maxðhðA; BÞ; hðB; AÞÞ where A is a combination of manually segmented contour point coordinates, B is a combination of algorithm segmented contour point coordinates. At first, we compare the consequence between original network which without DDSPP and unconnecting the low-level features and high-level features and with DDSPP while connecting and unconnecting the low-level features and high-level features. It is observed from Figure 7.
From Figure 7 we can get the best results using DDSPP architecture and connecting the low-level features and the high-level features.
We emphatically compared our method with advanced methods [9,14] which using our datasets, and convolution neural networks; including U-net [23], PixelNet [24], and DeepLabV3 þ [20] which are the prominent network so far in the domain of semantic segmentation. U-net and PixelNet was based on VGG-16 model, DeepLabV3 þ was based on Resnet.
As we all know, the bigger the DSC and the smaller the HD value is, the closer the predict result to the ground truth is. The detailed comparison between different methods is illustrated in Table 2 from which we can see that the proposed method outperforms other methods. Dice similarity coefficient (DSC) and Hausdorff distance (HD) over other methods are up to 0.954 and 1.752 mm, respectively.

Discussion
In this study, a deep neural network is proposed to capture the boundary of the prostate. The proposed study leverages on the inherent advantages of dilated convolution, Encoder-Decoder architecture and deep learning. Having successfully trained, tested and validated on the 1392 prostate MR images, we were able to consistently achieve a good qualitative and quantitative results. Thus, we may be able to offer a robust segmentation framework, for the automated segmentation study of the prostate tissues.
Compared with the state-of-the-art methods [12][13][14], our approach has several advantages. Such as the high rate of accuracy and the bigger DSC is. Most traditional methods [7,8] rely on the shape of prostate to segment. It's not only time-consuming but also low in accuracy. Though the method used in [23] based on U-Net convolutional neural network also can avoid above steps as ours, it attached much importance on reducing prediction time, which resulted in relatively poor accuracy. We also compare our method with PixelNet [24] and DeepLabV3 þ [20]; the results show that our method has higher segmentation performance.
It suggests that the proposed algorithm for the segmentation of the prostate tissues in MR images can be welled in clinical diagnosis research.

Conclusion
In this paper, we demonstrate that the inherently difficult problems of MR images can be solved well with deep learning. We have proposed a robust automatic prostate segmentation network jointly utilizing Encoder-Decoder architecture applied with dense dilated spatial pyramid pooling. On the one hand, DDSPP can get more receptive fields shows that resampling features of different scales is effective and can accurately and efficiently classify areas of an arbitrary scale. On the other hand, connecting the low-level features and high-level features enhances the robustness of the algorithm, and produces competitive results in decoder part. The experimental results show that the proposed method has better robustness and accuracy than other methods, which means remarkable performance in prostate segmentation.

Disclosure statement
No potential conflict of interest was reported by the authors.