Computer-aided breast cancer diagnosis based on image segmentation and interval analysis

Uncertainties are one principal part of any practical problem. Like any application, image processing process has different unknown parts as uncertainties which are derived from different reasons like initial digitalization, sampling to noise, special domain, and intensity. This study presents strong image segmentation for the breast cancer mammography images by considering the interval uncertainties. To consider the system uncertainties, interval analysis has been proposed. The main prominence of this method is taking into account errors in independent variables. An unclear method has the element of subjectivity, while the deterministic methods are not applicable in all cases. Besides, this method is always guaranteed to include the exact result, no matter that its upper and lower bounds happen to be overestimated. The principle theory here is to develop the traditional Laplacian of Gaussian filter based on interval analysis to consider the intensity uncertainties. Experimental results are applied on MIAS that is a popular breast cancer database for medical image segmentation. The performance of the system has been compared with Prewitt, LoG and canny filters based on PSNR.


Introduction
Breast cancer is the highest level of cancer among women worldwide. However, prevention may reduce the danger of this cancer, these methods cannot cure a high amount of breast cancers in the developed stages. So, early diagnosis of this cancer can improve the survival ratio. Due to the lack of facilities and low medical facilities, most of breast cancer cases, especially in the developing countries have relatively low ability of diagnosis and a diagnosis in late stages, while they can be diagnosed in early stages based on machine visionbased medical facilities with a reasonable cost [1].
Research reveals the exact biology of breast cancer, and only a few of the major factors, such as familial history, are provided to enhance the incidence of breast cancer in women. Of course, many women with breast cancer still have no history of illness in their families. This kind of cancer cannot be prevented because the reasons for its origin are not yet known, but its timely detection can enhance the person's chance of complete recovery [2].
However, in many cases, breast cancer is not recognized until the advanced stage. This conclusion is linked to the unfortunate statistics of the survival of people with breast cancer and needs rapid diagnosis of breast cancer. The mammography has met this need [3,4].
The results of a study on 280,000 women in the USA show that (BSE) breast cancer screening, clinical examinations and imaging mammography help identify early cancer, and more than 41% of cancer patients were identified with mammography and even more of the people with breast cancer at the first stages, they were identified by mammography [5,6].
Similar projects have shown that early detection of breast cancer increases its curative rate. Recently, several research works are introduced to detect cancer cells from the mammography images without the intervention of the person using image processing, computer vision, etc. to prevent fatigue, inaccuracy and mistakes [7,8].
Recently, new technologies based on artificial intelligence have been introduced to solve this issue. Image processing-based methods increase the speed of detection by decreasing human errors. Besides, using this technology can help the radiology experts to better and easily detect the disease [8][9][10][11][12][13][14].
The expression of a suitable technique for cancerous images segmentation is the beginning of work for the next stages of diagnosis, like feature extraction and image classification, in turn, makes it easy for the skin specialists to diagnose the disease [15].
In other words, the importance of the medical image processing, including the mammogram image processing, assists the radiologists and physicians for easy detection of the disease while patient protection from irreparable risks that will come about. Given the precise details of the exact segmentation of cancer, the method recommended should be carefully considered [16].
Image segmentation is a technique for dividing an image into its principal components in some areas which are actually different objects in the image that are uniform in terms of texture or colour. The areas should not have small cavities. The adjacent areas of a piece should have a significant difference with that area. Image segmentation is used in cases such as image processing, machine vision, medical image processing, digital libraries, content-based information retrieval in pictures and videos, data transfer through the Internet and image compression [17][18][19][20].
In other words, the accurate delineation between lesion and background by image segmentation makes it easier for physicians to diagnose breast cancer [21]. A primary move before image segmentation is quantizing and sampling the range of input image in computer memory for discretization of the image from the spatial domain.
Doing discretization has always a bad effect on the input image, i.e. missing of the information in the input image. This problem makes uncertain intensity information for the image pixels.
These phenomena effect on the secondary steps of image processing from image segmentation to image feature extraction or image classification. For example, in image segmentation, like image thresholding, due to these uncertainties, it is even difficult to reach an agreement for selecting a correct boundary among objects.
Interval analysis, in between, is a suitable method that just requires lower and upper bounds for uncertainty modelling. Due to the nature of digital images, interval analysis can be a good selection for handling the uncertainty. In this paper, interval analysis is employed for extending and designing a robust method for breast cancer images in the presence of the explained above uncertainties.
The rest of the paper is given in the following: Section 2 briefly describes the nature of interval analysis for an image. Section 3 and Section 4 reviewed the histogram equalization and the median filtering as preprocessing of the input medical images, respectively. Section 5 declares a representation about the Kapur thresholding. In Section 6, the mathematical modelling of the extended edge detection is introduced based on interval analysis. Implementation results are described and analysed in Section 7 and the conclusion of the study is explained in Section 8.

Using interval analysis for image representation
Generally, each image is described by a matrix of i rows and j columns, i.e. the image p = [1, . . . , i] × [1, . . . , j] defines the set of its position.
Consider the above-described image p with its pixel value as p(z) with a definite position α ∈ z. Furthermore, n(α) ⊂ z in a 3 × 3 neighbourhood centred at α, including themselves. |n (α)| = 9 except α belongs to the image margin [29].
As explained in the introduction, it is concluded that the discretization of the input images makes some uncertainties that effects the performance of the image processes partially or completely.
There are different reasons for producing such uncertainties; for instance, existing the noise in the image, brightness intensity limiting during the discretization, etc.
As it is defined before, the image can be discretized based on two different cases: sampling and quantizing. In this paper, the intensity of a considered image, as a part of sampling uncertainty has been described.
In the brightness sampling, there are finite numbers of intensities that have been stored (i.e. a greyscale image with 28 tones and an RGB image with 224 tones) in the process; however, by considering more details, there are some limitations about the precision of the image intensity.
By considering a different number of tone in an image using finer detail coding, there are always some limitations. Therefore, tone error measurement for the pixel is achieved ±δ.
By assuming a greyscale image p with i rows and j columns, the extension of the intensity for the image by interval analysis can be defined by I(p) and can be presented as below.
(1) where L defines the maximum value for the intensity in different image classes; for instance, L = 255 for unit 8 class and L = 1 for double class and δ describes the uncertainty in the intensity. Figure 1 shows a simple breast cancer image represented by interval representation (lower and upper bounds).

Histogram equalization
Histogram equalization is a process for reducing the contrast of low contrast images. When we say that the image contrast is low, this means that the brightness difference between the highest and lowest values of the image is low. Histogram equalization increases the contrast of the input image as much as possible.
Consider an input image, im as m by n matrix of integer pixel intensities spaced in the interval [0, L-1], and L describes the possible intensity quantity, which can h n = number of pixels with intensity n total number of pixels, Based on the above definition, the histogram equalization is presented below: where floor(.) means flooring round of the value into the nearest integer. This approach is derived from assuming the intensities of h and eq as continuous random variables X, Y in the interval [0, L-1]. Y can be presented as where hi describes the histogram of im, and d signifies the cumulative distributive function of X multiplied by (L−1). d is considered differentiable and invertible. A simple low contrast image and its histogram equalization are shown in Figure 2.

Removing noises based on median filtering
Noise is referred to every unintentional oscillation changing that appears on measured signals. Any quantity can accept the noise. Sometimes, there is a little noise in mammography images. Noise is a serious problem for image processing operations, especially when we need image edge detection which needs differentiation. Differentiation increases the effect of highfrequency pixels which directly includes noises. Since, one part of our work is based on this process, it is better to employ a median filter as a pre-processing process before image segmentation [30,31]. Among different filters, Median Filter has a large application in medical image processing. The main reason for this application is its ability to remove noises while keeping edges.
The median filter is a low pass filter, which requires more processing time than other filters. The median filter uses a neighbourhood of m × n and to organize the neighbourhoods in rising order and selecting the middle component of the numbers and replace the central pixel. It is observed that the median filter has the ability to use as a tool for removing salt and pepper noises. To do this, a medium filter with 5 × 5 mask has been performed to the input images. The larger the size of the mask, the better the noise lessening, but the loss of the edges. In the following, a simulated noise sample and its filter are shown on a sample image.
When a median filter has implemented to a grayscale image, the median of the grey values of the pixels below the mask is achieved for each pixel. By applying a median filter with an odd number of elements to a logical image, the pixels in the centre are mapped to true, if there are truer than false pixels below the mask.
The list of the grey values at the pixels below the mask should be sorted and the median should be searched if the number of elements is odd or even be calculated if the number of elements is even. In this research, 5 × 5 median filter is employed for removing digital noisy particles. Median filtering can be applied for removing small noise objects (see Figure 3).

Image thresholding using Kapur method
Before using the proposed image edge detection, we need to threshold the filtered image. Here, we utilized Kapur's method [32,33]. The image histogram has  been used in the Kapur method. The main purpose of Kapur's method is searching the best value for threshold point for maximizing the image entropy and describing the compactness and separability in the classes. Entropy value will be maximum, if the optimal value is achieved for the algorithm. To achieve the optimal value in the Kapur's method, we need to maximize the following cost function: where H 1 and H 2 define the entropies and can be achieved as follows:

Interval analysis
Consider X as an interval integer over I(R). In this condition, the main definition for the classic (Minkowski) interval analysis is presented in the following equation: where x andx describe the lower and upper limitations, respectively [34,35].
It is important to know that the interval is called degenerate if the lower and upper values for it are equal, i.e. x =x.
There are three main definitions in interval analysis including interval width, interval mean value and interval radius that are presented in the following equation [34,36,37].
Based on Equations (8)-(10), an interval integer can be represented by the following definition: where δ describes the symmetric interval of [x].

The basic of interval analysis
In this section, a short overview of primary mathematical operations of the interval integers has been introduced. Here, the basic algebraic operations of interval integers have been described.

Hukuhara difference method
where More details can be found in [43][44][45][46]. Since most of the image segmentation methods have a part of differentiation with different orders in them, interval analysis-based image segmentation has been described by applying the amalgamation of interval differentiation and image segmentation on the original image. In this section, an interval definition and its improvements on the derivation operator have been described. There are different types of interval-based definitions for derivation operator. One of the most popular of them belongs to Stefanio et al. [43].

Interval derivation
Consider the function f with its first-order derivation (f ) in the interval [x]: In the following, consider x 0 ∈]a, b[ and h whereas x 0 + h ∈]a, b[; in this condition, the generalized Hukuhara (g-H) derivative by the function ]a, b[→ IR in x 0 can be described by the following equation: By considering that f (x 0 ) ∈ IR satisfies the equation above, f in x 0 will be met the G-H derivative condition. We can prove the continuity of the interval function as follows: (22) The G-H derivation will be fulfilled if the function has its right continuous derivative (f r (x 0 )) and the left derivative (f l (x 0 )) are equal, i.e.
By considering the central definition, of the function (i.e. x = x c + x r I c ), Therefore, the definition above can be rewritten for a partial derivative as follows:

Interval-based Taylor method
Interval-based Taylor method is achieved based on applying and extending the interval centred meaning into derivatives with higher orders. A twofold extension of this meaning is presented below [47][48][49]:  1, . . . , n), 1, . . . , n), 0 O t h e r w i s e (27) In between, the symmetric form of the Hessian matrix can be considered as h ij = ∂ 2 f /∂x i x j . Consequently, the description of the interval-based Taylor for a single-valued system of order n can be presented as follows [50]:

Interval edge detection
The process of characterizing the boundaries among the objects in an image is called edge detection. One of the proper methods in edge detection is Laplacian of Gaussian (LoG). The theory of LoG was first introduced by Marr and Hildreth which combines the Gaussian filtering with the Laplacian. Recently, it loses the popularity among edge detection. Therefore, the development of this method can increase its popularity. Laplacian, indeed, is the image second-order derivative. Since performing derivative into an image, enhances its high-frequency edges, the Laplacian filter is utilized to edges detection of an image.
The approach of LoG is to apply Laplacian operator following the smoothing it on an image by Gaussian filter to reduce the noise. The Laplacian, L(i,j) of an image is presented as a pixel intensity values ϕ(i, j) by The equation above can be achieved by a convolution filter. The first step in image processing is to image acquisition. Image acquisition is done by discretizing the convolution kernel for approximating the Laplacian filter. A useful small kernel for the Laplacian is shown in Figure 4.
A most important drawback of LOG is that doesn't work properly where the image tone (intensity level) is varying and has uncertainties. Hence, using a method that considers these kinds of uncertainties can improve the method performance. In the following, we utilize an interval extension of Laplacian to improve the performance of LOG.
By considering Equations (30) and (31), the Laplacian (Hessian) equation is obtained by the following formula: Therefore, And, based on interval analysis: Therefore, for an i × j image matrix, the Hessian matrix is achieved by the following equation: The interval extension for the interval Laplacian as a kernel is indicated in Figure 5.

Implemented results
This section illustrates some numerical simulations about analysing the proposed method on the breast mammography images. Simulation is applied on the platform of MATLAB R2017. Mammographic Image Analysis Society Digital Mammogram Database (MIAS) is employed as the most applicable mammographic database [51]. The MIAS is a database collected by the UK National Breast Screening Program. It includes 322 digitized images with radiologist's "truth"markings results. The images are cropped to the size of 1024 × 1024. Figure 6 shows some results of the sequence of simulated operations are shown. From Figure 6, it is observed that the performance in the presented edge detection has a good performance for segmentation of the input image in the presence of system uncertainties such as intensity variations.
For better performance analysing the proposed method, it has compared with some popular edge detection methods including Prewitt, LoG and canny. To do the comparison, signal to noise ratio (PSNR) is adopted with two various environments (Gaussian and salt and pepper). Table 1 illustrates the PSNR changes for the studied database by applying the salt and pepper noise and varying variance (σ ) from 0.2 to 1 for different edge detection methods.
Results show that in the presence of the uncertainties in the proposed system, the quality of the PSNR is higher than the others. Another result which can be    extracted from the result is that by enhancing the σ value, the results of the other techniques are getting worst while the proposed technique has consistency in the performance. Table 2 and Table 3 indicate the results of applying the Gaussian noise on the database for more analysis of the PSNR on the compared methods. Table 2 presents a mean value of the Gaussian with μ = 0.1 while Table 3 describes the value with μ = 0.5.
From the results, in the low variance salt and pepper noise, the canny filter gives well performance; it even outperforms the proposed method in some cases. But, by growing the variance value, particularly in the Gaussian noise, the presented technique gives better results. However, the method is designed for the Mammogram images, it can be justified for analysing the Breast MRI and thermal images by adjusting the image uncertainties.

Conclusion
There are always some uncertainties in an input image from its sampling, quantizing to different types of noises, etc. Since, using classic methods for image processing of these images is so complicated, different methods have been applied to them. An important and susceptible usage of the image processing is its application in medical image analysis. The main idea of this study is utilizing a new methodology for breast cancer diagnosis in the medical images, of course in the presence of the uncertainties. Here, an extended method of edge detection based on Laplacian of Gaussian is developed for the final segmentation of the images. Using the proposed method makes the system to consider a large part of uncertainties including the intensity (tone) variations in the image. MIAS database is employed to analyse the presented method efficiency. We also applied salt and pepper and also Gaussian noises to validate the proposed method efficiency compared with Prewitt, LoG and canny filters. Experimental results indicated that the presented technique gives a promising performance than the compared methods. The main limitation of using the interval analysis is that the selection ranges can be grown so fast. Also, interval analysis addresses only the range of the uncertainties, but doesn't give any information about how likely such extreme risks are. Therefore, the future work may be studying on different types of uncertainties on the mammogram and thermal images and using different techniques, like fuzzy algorithm to solve the problem.