Residual U-Net approach for thyroid nodule detection and classification from thyroid ultrasound images

With so many thyroid knobs (nodules) discovered by accident, it is critical to recognize as many aberrant knobs (nodules) as possible from fine-needle aspiration (FNA) biopsies or other medical procedures while excluding those that are virtually certainly benign. Thyroid ultrasonography, on the other hand, is prone to interobserver variability and subjective translations. An effective deep learning model for segmenting and categorizing thyroid nodules in this study follows the stages below: data collection from a well-known archive, The Thyroid Digital Image Database (TDID), which comprises ultrasound pictures from 298 patients, preprocessing using anisotropic diffusion filter (ADF) for removing noise and enhancing the images, segmentation using a bilateral filter for segmenting images, feature extraction using grey level occurrence matrix (GLCM), feature selection using Multi-objective Particle Swarm with Random Forest Optimization (MbPSRA) and finally classification happens were Residual U-Net will be used. Experiment evaluation states the proposed model outperforms well than other state-of-art models.


Introduction
Thyroid nodule (TN) detection has grown dramatically during the last two decades, with many more nodules being discovered by chance.Because many thyroid nodules are benign or act in a nonthreatening manner, establishing whether they are benign or malignant via fine needle analysis (FNA) biopsies and/or surgery can save patients time and money.Sonography is commonly used to check these atypical thyroid nodules.Radiologists have identified hypoechogenicity, microcalcifications, hardness and a taller-than-wide morphology as sonographic indications of thyroid nodules that signal malignancy [1].'Not suspicious', 'probably benign', 'one suspicious feature', 'two suspicious features', 'three or more suspicious features' and 'probable malignancy' are the TI-RADS scores for 'not suspicious', 'probably benign', 'one suspicious feature', 'two suspicious features', 'three or more suspicious features' and 'probable malignancy', respectively [2].On the other hand, TI-RADS TN assessment takes a long time and is rarely correct.Because current sonographic criteria for identifying malignant nodules are insufficient, and variability in thyroid nodule echo patterns restrict radiologists' judgement capability [3] radiologists' accuracy mainly relies on personal experience.
The deployment of a validated TI-RADS reporting mechanism would allow clinicians to stop doing routine thyroid ultrasound exams with the assurance that they would not miss a malignancy.It would also give doctors a suggested follow-up approach for patients with a moderate-risk TI-RADS score, sparing the system a lot of unneeded imaging tests and the stress that comes with them.Furthermore, because sonographic properties may be obtained in ultrasound pictures that can be digitized and fed into a machine learning system, radiologists can use TI-RADS scores to learn how the computer classifies the image [4][5][6][7][8].Thyroid nodules are groups of thyroid cells that have developed improperly in the thyroid gland.The thyroid is an endocrine gland that produces and disperses hormones throughout the body.Thyroid hormones are produced and released by the thyroid gland, which is made up of two lobes connected by an isthmus (or 'bridge').The isthmus is 1.2 cm long and 1.2 cm broad, and each lobe is pear-shaped and 5 cm long and 2.5 cm wide.It regulates thyroid hormone secretion, which is important for body temperature regulation, and has a big impact on children's intellect and growth.It also creates hormones that aid in the body's metabolic management.Thyroid hormone production that is excessive or insufficient (due to a thyroid that is too big or too tiny, respectively) produces pathological alterations and thyroid abnormalities.As a result, the volume of the thyroid gland is frequently used to identify abnormal thyroid symptoms.Thyroid illnesses are divided into three categories: hyperthyroidism, hypothyroidis, and Hashimoto's thyroiditis.
Thyroid hormones regulate a number of activities in the body, including body temperature, digestion and heart rate [8].These hormones play a vital role in protein synthesis, body temperature regulation, and total energy generation and regulation.Thyroid illnesses are commonly split into two groups: those that primarily impair thyroid function and those that entail thyroid neoplasms or tumours.In the general population, both sorts of illnesses are rather frequent.The majority of thyroid issues may be properly addressed.Thyroid dysfunction is generally linked with either insufficient hormone production (hypothyroidism) or excessive hormone production (hyperthyroidism).The enlargement of a thyroid nodule in the thyroid gland is a warning indication of thyroid cancer.Thyroid nodules cause no symptoms in the majority of persons.Benign nodules are generally tiny (less than one centimetre in diameter) and require regular monitoring.An ultrasonic (US) imaging technology can detect a thyroid tumour in its early stages.US can assess blood surge to the thyroid and its nodules in addition to volume, position, and number of nodules, distinctness of borders, extra nodule filling such as calcium deposits or the quantity of blood flow.A sample ultrasound image with a thyroid nodule and thyroxine is shown in Figure 1.
Overgrowth of the thyroid gland can result in the formation of one or more nodules.It is unknown why this occurs.When nodules develop, the primary fear is cancer.Fortunately, malignancy is extremely rare, occurring in fewer than 5% of all nodules.People with a family history of nodules and those who do not get enough iodine are more likely to develop nodules.Thyroid hormone requires iodine to be produced.Thyroid nodules can in a variety of shapes and sizes [9].Colloid nodules: these are thyroid tissue overgrowths that might be one or many in number.These growths are completely safe (not cancer).They can get pretty big, but they never grow bigger than the thyroid gland.Inflammatory nodules: these nodules grow when the thyroid gland is inflamed for a long time (swelling).The pain caused by these growths may or may not be felt.Multinodular goitre: a multinodular goitre is a kind of enlarged thyroid that consists of several nodules (mostly benign).Thyroid nodules that are hyperfunctioning: these nodules produce thyroid hormone on their own, by passing normal feedback control and increasing the risk of hyperthyroidism.Thyroid cancer is rare, with malignant thyroid nodules accounting for less than 5% of all thyroid nodules.This paper focuses on bringing an efficient deep-learning model for thyroid nodule detection which following are the objectives: • An efficient deep-learning model is implemented for the classification of thyroid nodules • Deep learning model is performed over ultrasonic images of the thyroid • Extraction and selection using GLCM and Multiobjective Particle Swarm with Random Forest Optimization (MbPSRA) for better extraction and dimensionality reduction • With the help of Residual U-Net architecture, the classification will be done Paper organization: Section 1 of the study provided an overview of thyroid nodules; the remainder of the paper is as follows.The literature review is presented in Section 2, the methodology is shown in Section 3, the performance analysis is presented in Section 4 and the chapter is concluded in Section 5. Zhou et al. (2018) [10] used U-Net to differentiate thyroid nodules and presented a segmentation strategy based on it with annotation marks as guidance.To begin, a nodule's four main and minor axis endpoints are manually calculated.Then, at the four places in the picture, four white dots are directly created to aid the deep neural network's training and inference.To differentiate thyroid nodules, Zhou et al. (2018) [10] also used U-Net, and built an interactive segmentation strategy utilizing annotation marks as guidance.A nodule's four main and minor axis endpoints are manually calculated first.Liang et al. (2020) [11] wanted to create a multiorgan CAD system based on CNNs for detecting thyroid and breast issues, as well as see how this system affects the diagnostic effectiveness of various methods of preprocessing.Yang et al. (2021) [12] created a multi-task cascade deep learning model (MCDLM) for automated thyroid nodule detection that uses multimodal ultrasound data and combines radiologists' various topic expertise (DK).To acquire more precise nodule segmentation findings, we transfer Uknowledge networks.The ultrasonic features (UF) of the nodule are then quantified as conditions, which aid in the generation of stronger images and discriminators.Chu et al. (2021) [13] offer a thyroid nodule mark-guided ultrasound deep network segmentation model.The approach utilized in this study achieves similar results when compared to VGG19, Inception V3, DenseNet 161, segmentation accuracy and network operation time.

Literature review
Edgar Gabriel et al. [5] developed two versions of a code of thyroid FNAC images using texturebased segmentation, which is an important first step towards realizing a fully automated CAD solution.The code has been developed in MPI format to take advantage of distributed memory compute resources The Variable Background Active Contour algorithm was developed by Maroulis et al. [10] for the detection of abnormal nodules in thyroid images.When compared to the conventional active contour model, the variable background model improved the sensitivity of nodule detection.The authors achieved an average accuracy of 90% for detecting nodules in ultrasound images.To detect and segment abnormal cancer cells in ultrasound thyroid images, Kobayashi et al. [14] used a fuzzy edge detection algorithm.The researchers developed improved generalized fuzzy rules for cancer cell boundary segmentation.Savelonas et al. [6] suggested the variable background active contour model as an active contour model.It is used in ultrasound images to detect thyroid nodules.The new model provides edge independence, less operation smoothing and topological changes.It outperforms the active contour without edges model in terms of accuracy.Accuracy can be increased by using a limited image subset as the background, which changes shape appropriately to decrease the effects of background inhomogeneity.
Preeti Aggarwal et al. [7] introduced a method for automatic segmentation.It contains a summary of all the results obtained using either automatic tools or by applying specific algorithm segmentation on lung CT and thyroid US.For the segmentation of thyroid US images, two tools are available: Analyse 10.0 and Mazda.Tsuda et al. [3] investigated thyroid cancer in Fukushima residents aged 18-26.The ultrasound images were used by the authors to screen for thyroid cancer.While screening thyroid ultrasound images for thyroid cancer, the authors achieved a 95% confidence level.To detect the thyroid nodule in ultrasound images, Nugroho et al. [2] used an active contour bilateral filtering method.. Using this bilateral filtering approach, the irregular boundary of the thyroid nodule was founded and accurately segmented.Before segmentation, the authors found the speckle noises and removed them from the thyroid image.For detecting the thyroid gland in ultrasound images, Gomathy et al. [4] used the principle component analysis method.The thyroid area was accurately segmented using region of interest (ROI) and morphological operations.Du et al. [5] devised a technique for finding thyroid nodules in ultrasound images.An anisotropic diffusion filter (ADF) was used to detect and remove speckle noise.For accurate nodule segmentation, local phase symmetry features were extracted from thyroid images.

Proposed method
The proposed framework's general design with the steps are listed below.Data extraction from well-known repositories Thyroid Digital Image Database (TDID), which has ultrasound scans of 298 individuals, is then used to gather these images that have been preprocessed from raw files; however, the possibility of noises and anomalies is high.To remove those and enhance the images even better for further stage prediction, the use of ADF has been used.These preprocessed images are passed to the segmention stage for segmenting the regions using a bilateral filter.Those identified regions will be passed to feature extraction where sufficient features are extracted using GLCM and then from those features, specific features are identified by the feature selection method with the help of MbPSRA.Finally, selected features are passed to the classification stage where with the help of Residual U-Net, it performs and thereby gives the output.Figure 2

Data collection
The ultrasound image-based thyroid nodule categorization problem has been investigated [15][16][17], despite the fact that the majority of the data sets utilized in this research are private.Furthermore, due to insufficient time and the particular characteristics of medical diseases, acquiring a large amount of data is exceedingly challenging, necessitating the employment of expensive photo-gathering equipment and patient participation.
As a result, we used Pedraza et al.Picture's Database (TDID), a public thyroid nodule image collection established [15].In 2015, the TDID data set was made available, which includes ultrasonography thyroid pictures from 298 persons.As a consequence, we were able to collect 450 pictures of thyroid nodules for our research.
Each picture assessed by radiologists to determine the state of the thyroid area is given a Thyroid Imaging Reporting And Data System (TI-RADS) score.Thyroid nodule status is assessed using the TI-RADS score, which runs from 1 to 7. It might be any of the numbers below: 1, 2, 3, 4a, 4b, 4c or 5 are the choices.Normal (TI-RADS score 1), benign (TI-RADS score 2) and no abnormal ultrasonography findings are the TI-RADS scores for thyroid nodules with TI-RADS values of 1, 2 and 3. (On the TI-RADS, a score of 3) The letters 4a, 4b, 4c and 5 are used to signify one, two, three or five thyroid nodule characteristics.Thyroid nodule pictures with these four TI-RADS scores have been used to diagnose thyroid nodules in the past (ground-truth labels).
The TDID data set is described in Table 1.

Preprocessing
Preprocessing is the most important step in image processing, this stage will clear the noise and anomalies present in the images, enhance the quality, and thereby it will be much more effective in ADF.

Anisotropic diffusion filter
Inhomogeneous regions, such as those around the borders and with few features, AD is adaptable in that it does not employ hard thresholds to change performance [18].The speckled ultrasound pictures can be improved using AD, however the technique may destroy a few data in the process.Peron and Mallik [19] suggested that nonlinear PDE may be used to smooth an image by implying: where ∇ is the gradient operator, which recognizes the image's edge.Furthermore, the authors suggested employing two coefficients: where the edge magnitude is represented by k resulting in an Gaussian filtering.A discrete form is denoted by where I t s the discretely sampled image, pixel position in a discrete 2D grid denoted by s, the time step size is denoted t ηe is the spatial neighbourhood of s, | ηs | is the number of pixels present in the window.

∇I t
S,q = I t q − I t S , ∀q ∈ ηs Intra-region smoothing and edge maintenance are two of AD's key advantages.For images contaminated by additive noise, AD yields extraordinarily good results [18].Also, due to its low processing complexity, this approach is favoured [20].

Segmentation
Because as shown in [1,6], lesion form and border are connected to several important parameters for detecting benign and malignant lesions.The borders of most malignant tumours are hazy and uneven.The segmentation process is carried out by a bilateral filter.

Bilateral filter
Bilateral filtering [21] is a low-pass filter that is used to smooth an image by keeping the quality of the object's edge.The effectiveness and reliability of different filters in decreasing speckle are described in [22], whereas the overall equation is as follows: where the original image is f (q), h(q) is the filtered image, a measure of neighbourhood window is denoted by Q(q), while ε shows the pixel location.(ε, q) and s(ε ), f (q), respectively, defined as where σ s is the standard deviation of the Gaussian random value and σ c is the standard deviation for the φ window area.

Feature extraction
Feature extraction is the process of gathering more detailed information about an image, such as colour, shape and texture.The significant information of a picture is included in features.The types of attributes that features describe are separated into distinct kinds.Texture refers to a distinctive and spatially repeated surface structure created by repeating a single piece or a group of components in different relative spatial places.Texture is a key property for recognizing places of interest in a photograph.One of the first methodologies for extracting texture characteristics was Grey Level Co-occurrence Matrices (GLCM) [23].After that, the application generates matrices from which statistical measures may be extracted.The texture of a photograph is its most important feature, and it is frequently employed [24,25].Texture feature extraction is a crucial stage in the texture analysis process for obtaining this information.Texture attributes may be retrieved in a variety of ways, including structural, statistical, model-based and transform data approaches, with the Gray Level Co-occurrence Matrix being one of the most well known (GLCM).The GLCM maintains second-order statistical information regarding picture pixel spatial connectivity.Haralick derived the Haralick texture features from GLCM by identifying 13 related statistical characteristics.Texture, like clouds and water, is a visual pattern that is not created by the presence of only one colour.Haralick advocated the usage of a co-occurrence matrix with grey levels.It always assesses the connection between two adjacent pixels, the reference pixel being the first and the neighbour pixel being the second.The GLCM matrix is shown in Table 2 for four distinct grey levels.We have utilized a one-pixel offset here (a reference pixel and its immediate neighbour).A bigger offset can be employed when the window is large enough.

Feature selection
A very large number of features are produced by the MbPSRA complexity of an image's content.As we have already mentioned, not all of an image's features are helpful in solving a particular issue, and even those that are can occasionally be redundant.We applied an optimization strategy to only save the most important features in order to solve this issue.The three stages of this strategy, which is based on Multi-Objective Particle Swarm Optimization with Random Forest (MbPSRA),  are estimating the significance of the characteristics, assessing redundancy and optimizing.
Measuring feature's relevance: The effectiveness of a feature in terms of classification is related to the idea of that characteristic's relevance.We used RFs, which provide an indication of the features that have the greatest effectiveness during a classification, to calculate the relevance value of each feature.A set of binary decision trees known as an RF is generally more effective than a simple decision tree, but it has the drawback of being more challenging to interpret.A decision tree node for each feature in the RF serves as its representation.
Measuring redundancy: Redundancy and feature correlation are related.The more closely the traits are related, the more redundantly they represent the same information.The dissimilarity matrix of the input entities is computed using the following correlation coefficient to translate this concept: var() stands for a feature's variance, while cov() stands for the covariance between two features, where x and y are two features.A complete graph is used to model the issue, with the nodes representing the value of each entity's relevance and the edges representing the value of each dissimilarity.Optimization: We build a complete and weighted graph G from the dissimilarity matrix, where the nodes represent the features and the edges (weight) reflect the similarity between the features.This allows us to optimize the subset of retrieved features.To do this, we use a multi-objective binary PSO G to locate an ideal subgraph that only includes the most important attributes.This optimization algorithm has been applied to the extracted features.

Classification
The downsampling path as in Figure 5 on the left successfully extracts image features; the upsampling  path on the right recovers image size, improves image segmentation accuracy, and allows reconstruction of details through a series of successive transposition convolution operations; and the middle is a series of several convolution operations.Previously, the structure served as a link between the encoder and decoder feature maps, bridging the semantic gap. Figure 3 shows the residual U-Net architecture.Although the shallow convolution structure is unable to properly capture the image's complex structure, the stack's deep convolution and redundant structure result in gradient disappearance and tearing.Convolutional blocks must be used in each layer to extract features from the Dense-ResUNet model.To boost model convergence as in Figure 4, a BatchNorm (BN) layer is added to each of these two convolutional units.Feature maps are present in each of the convolutional layers, as shown in Figure 5.
When the convolution kernel scans a pixel in an image and employs detailed content information in the environment build semantic image features, the activation function (ReLU) characterizes picture characteristics with content and space: where Xjl + 1 shows the feature map, Xil denotes the input feature in the (l + 1)th layer, t denotes the offset term, f represents the activation function (rectifier linear unit, ReLU), ij denotes a collection of input eigenmatrices and k denotes the convolution kernel.By reducing the size of visual components, the pooling layer can communicate high-level information and semantics: where * refers to the convolutional structure's pooling procedures.Finally, as the prediction layer, the fully connected layer completes the photo categorization assignment by using the maximum likelihood function.
Because of its superior learning capabilities, the U-Net model introduced by Ronberger et al. (2015) works very well in analysing medical pictures containing small samples and difficult modalities.(15) where s stands for stride, w for convolution kernel and X stands for input.The encoder's feature maps pass through a thick convolution block, as can be seen.The number of blocks in the convolution layers is determined by the pyramid level.We suppose that Xld,lc is a model node, with dl referring to the encoder's downsampling layer and lc referring to the dense block's convolution layer along the skip connection.Meanwhile, we define xld,lc as the output of Xld,lc, then the xld,lc can show the feature maps as in Equation ( 6).
where (•) stands for the convolution step that follows ReLU, M() for the upsampling step and [for the concatenation step.It is the pooling layer that merges the data it receives.In this piece we used the maximum pooling approach.Equations ( 7) and ( 8) are used to compute the output's height and weight (8).26,27] The padding is shown by p, the dilation is represented by d, the height is shown by H and the weight   is represented by W. The network's architecture was created in accordance with the provided requirements.Gaussian noise was applied in the input, coupled with histogram equalization, to make the network contrast independent and linearly scaled to get the Gaussian distribution, similar to UNet.An Adam optimizer was used to train the network, with a learning rate of 1e-4 and beta1 = 0.99.As a loss function, binary crossentropy was applied.With a batch size of 32 and early stopping, the network was trained for 350 epochs.

Performance analysis
The model is implemented using hardware specifications like Ryzen 5/7 series CPU, 128 GB RAM, 1TB HDD and windows 10 OS, Software specifications like PyTorch an open source python library for developing deep learning models and Google Collaboratory an open source Google environment for building frameworks.
Experiment evaluations are done over various models such as CNN, UNet, Attention UNet, RCNN UNet, Nested UNet over measures like accuracy, sensitivity, specificity, recall, precision, F1-score, detection rate, TPR, FPR and computation time.Tables 3 and 4 depict the overall analysis of models over accuracy, sensitivity and specificity.Figure 6 depicts the graphical representation of various models in which our model outperforms better (accuracy:0.95,sensitivity:0.97,specificity:0.98).
Table 5 depicts the overall analysis under detection rate, TPR and FPR. Figure 7 depicts a graphical representation of various models over the proposed method in which our model outperforms better (detection rate: 0.94, TPR:0.95,FPR:0.5). Figure 8 shows the TPR and FPR, Figure 9(a) shows a graphical representation of different models using the suggested method; our model performs better (FPR:0.5,TPR:0.95, and detection rate: 0.94).range of epochs taken for execution.Figure 10 shows the sample images from the TDID data set.
With a total of 16 3D image volumes, the 3DThyroid dataset was published and designed for thyroid region segmentation.
Our suggested method was used to segment the thyroid region's main region, as seen in the top row of Figure 11.The segmentation accuracy between the TDID and 3DThyroid data sets shows a significant difference.This is because the TDID data set includes a thyroid that is ill and has a wide range of pixel brightness in that area.
The created images adhere to the accepted terminology because our technology synthesizes the image in the same way as the labels.In other cases, as shown in Figure 12, our method creates synthetic training examples with image textures that are more in line with the texture of the image.On the other hand, radiologists' expertise in their field is quite valuable for identifying diseases.The manual elements that we derive from standardized terminology are currently employed in clinical diagnosis most frequently.

Conclusion
This paper clearly depicts the importance of thyroid nodule segmentation and classification over ultrasound images.With the help of a deep learning framework, effective detection has taken place.From this paper we clearly gained the procedure of segmenting and classifying the thyroid nodules data's collected from the TDID dataset and segmenting using the active contour model and thereby classifying using the ResUnet model.Experiment results state ResUnet creates a great performance when compared to other models.Also, this paper will be helpful for other research specialists to dig deep learning and generate new integrated models for even more effective results.
displays the overall architecture of the proposed work.

Figure 2 .
Figure 2. Overall architecture of the proposed framework.

Figure 5 .
Figure 5. Features of a malignant thyroid nodule.

Figure 9 .
Figure 9. (a) Models vs computation time (b) accuracy vs epoch range in percentage wise.
Figure 9(b) depicts accuracy vs the

Figure 10 .
Figure 10.Thyroid sample images for the TDID data set.

Figure 11 .
Figure 11.Gives the segmented result of the thyroid lung nodule.

Figure 12 .
Figure 12. Results of synthetic thyroid nodule images for each class.(a) Benign and (b)malign.

Table 2 .
Illustration of grey levels using GLCM.

Table 3 .
Overall analysis under accuracy, sensitivity and specificity.

Table 4 .
Overall analysis under precision, recall and F1-score.

Table 5 .
Overall analysis under detection rate, TPR and FPR.