Saliency Detection Using a Bio-inspired Spiking Neural Network Driven by Local and Global Saliency

ABSTRACT The detection of the most salient parts of images as objects in salient object detection tasks mimics human behavior, which is useful for a variety of computer vision applications. In this paper, the Local and Global Saliency Driven Dual-Channel Pulse Coupled Neural Network (LGSD-DCPCNN) model is used to provide a novel strategy for saliency detection. To achieve visually homogeneous sections and save computation costs, the input image is first subjected to superpixel segmentation. The global and local saliency maps are then created using the segmented image’s position, color, and textural properties. The LGSD-DCPCNN network is activated using these saliency maps to extract visually consistent features from the input maps to provide the final saliency map. An extensive qualitative and quantitative performance study is undertaken to assess the efficacy of the proposed method. When compared to state-of-the-art approaches, the experimental results show a considerable improvement in the detection of salient regions. Quantitative analysis of the proposed method reveals a significant improvement in the area under the ROC curve (AUC) score, F-measure score, and mean absolute error (MAE) score. The qualitative analysis describes the proposed algorithm’s ability to detect multiple salient objects accurately while maintaining significant border preservation.


Introduction
Salient object detection techniques mirror human behavior and identify the most prominent regions of images or scenes as an object. It is used in numerous essential applications in the computer vision area. The human visual system (HVS) has an extraordinary capacity to swiftly notice and emphasize the remarkable things or areas in images that are more distinct in appearance and conspicuous. The aim of salient object detection (Shi et al. 2015) is to explore the majority of the distinguishable targets in an image and then fragment it from the rest of the image. Salient object detection, unlike image segmentation tasks, focuses on a small number of intriguing and appealing things. Because of this valuable characteristic, salient object recognition is commonly used as a preliminary step in various applications such as image compression, object recognition, quality assessment, video summarization, image retrieval, object tracking, and image segmentation (Borji et al. 2019;Cong et al. 2018).
Many techniques for recognizing salient objects have emerged in recent years which mainly depend on two approaches to find saliency: bottom-up saliency models and top-down saliency models. Computer vision researchers typically use the bottom-up saliency model to recreate the process of the human gaze. Bottom-up approaches mainly deal with handcrafted low-level features which are generally data-driven. The bottom-up saliency methods use self-information, histogram, regionbased features, locally measured dissimilarities, information contents weighting, and frequency refined approach to compute the saliency (Borji et al. 2019;Cong et al. 2018;Duan et al. 2016;Zhang, Wang, and Lv 2016). Zhang et al. (2017;2018b) used graph-based approaches to find the saliency which improved the saliency detection results with low-level features, objectness map, and compactness map. (Lu et al. 2019) and (Qi et al. 2015) used multiple graph-based manifold ranking to detect the salient objects. (Wang et al. 2021) used foreground and background seed selection models using graph-based extended random walk model to generate the saliency, in which the background and foreground seeds are produced using the convex hull approach and boundary prior knowledge respectively. All these methods mainly use graph-based saliency models which detects the salient objects by considering color contrast features. So the objects having similar color as background can not be detected by these methods, also textures and edges are not preserved in these methods. Bottom-up approaches are simple to implement and computationally efficient however their performance is limited for low contrast and complicated patterned images. Topdown saliency detection algorithms, on the contrary, are task-driven that are based on task-specific high-level features using convolutional and deep neural networks (Ji et al. 2021). Utilization of such approaches yields improved performance in this area but at the cost of large data availability and substantial computing needs. Also, the results are quantitatively good but lack in preserving the complete boundaries and edges of the objects. As a result, the work of SOD employing handcrafted features plays an important role in today's world for applications where data availability is limited and low computing complexity is preferred with better preservation of details. There are various machine learning (ML)-based bottom-up saliency detection techniques which have been introduced in recent years. (Pang et al. 2020a) used a bagging-based distributed learning approach for saliency detection which uses the training samples based on the center prior and background prior information. (Lei et al. 2016) uses Bayesian decision framework to refine the primary rough saliency map which is extracted using other existing techniques. (Tong et al. 2015a) employs the bootstrap learning approach to build a powerful classifier that can distinguish between prominent and background objects. The local saliency detection of the proposed method is also based on ML-based bottom-up salient object detection approach.
Many recent approaches (Wang and Peng 2021a;Wang et al. 2021) use saliency map integration to produce the final saliency map. Along with properly detecting saliency, it is very important to fuse different saliency maps to achieve higher performance in a saliency detection task. But a majority of these approaches (Shariatmadar and Faez 2019;Tong et al. 2015a) use simple pixel-wise addition or multiplication of the global and local saliency maps for generating the final saliency maps. Such approaches overlook the intensity variations of the neighboring pixels and may result in edge blurring or artifacts around the object boundaries. Some of the methods also use weighted average-based integration of global and local saliency maps but the selection of weights is mostly done using the hit and trial approach. Meta-heuristic optimization approaches have also been suggested but they suffer from increased computational cost. In context with the above discussion, to overcome the discussed limitations the proposed method uses pulsecoupled neural networks (PCNN)-based saliency map fusion to provide perceptually appealing results. Recently, PCNN-based approaches have also been explored in the area of saliency detection wherein the pixel intensities are used to activate the PCNN neurons (Wang and Shang 2020). These approaches have marked significant performance with improved preservation of object boundaries. Some visual saliency-driven PCNN models have also been presented to achieve image segmentation (Z. Yang, Ma, Lian, Guo et al. 2018) and fusion tasks (Yang and Li 2014). Most of the methods discussed above use the PCNN model to directly generate saliency maps without differentiating between the local and global salient features of the input image while the proposed method uses a dual-channel pulse-coupled network (DCPCNN) (Chai, Li, and Qu 2010) network to fuse the local and global saliency maps by preserving the perceptual quality of an image.
Considering the discussed limitations of the extant work on salient object detection, the proposed method uses pixel-related superpixel segmentation using the Gaussian mixture model which has not been used earlier in a saliency detection task. Also, the combination of various features with center and objectness prior in the generation of global saliency detection is novel in the field of saliency detection. Many graph-related saliency techniques Zhang et al. 2017;Zhang et al., 2018) use only color contrast features to get the saliency map which fails when the contrast between objects and background is same, also these methods can not preserve the boundaries of the objects. The local saliency map generation extracts the K-nearest foreground and background superpixels based on the global saliency map and tries to detect saliency using random forest regression. The novelty of the proposed work as compared to existing saliency detection techniques is both the baseline global saliency map as well as local saliency map use texture-based features that preserves the boundaries of the objects preciously as well as DCPCNN model is first time used in saliency detection task to merge the global and local saliency maps, which helps to preserve the visually prominent features of global and local saliency maps.
The comparison of the proposed approach with center prior and boundary prior-based saliency detection methods proposed by  (GR) and ) (MR), respectively, is shown in Figure 1. Center prior and boundary prior-based methods generally fail to detect the objects away from the image's center and containing the boundary. While the proposed LGSD-DCPCNN can detect the objects at the center and boundary to a great extent. So the major goal of the proposed work is to consider the image's local detail with global information to provide boundary preserved and perceptually appealing final saliency maps. Most saliency detection methods take into account only the saliency attributes which are based on the fundamental characteristics of an image, but the overall significance is typically overlooked. The global saliency map represents the global information based on low-level features. It recognizes the entire salient object more precisely with greater integrity. The objectness prior and the center prior used in generating the global saliency maps helps to detect more than one object accurately but is biased as well. On the other hand, the local saliency map takes into account the local spatial variations of an object and helps to detect the salient objects more precisely which are not correctly identified by global saliency maps. So integrating the global and local saliency maps achieves higher performance in saliency detection. In the proposed approach, we use a DCPCNN to process both the source images' global and local information. The DCPCNN is an extended model of the original PCNN which can process two inputs simultaneously (Chai, Li, and Qu 2010). Processing both the global and local salient features together helps in retaining the most meaningful visual features of the input images and can provide better results. Recently, many researchers are working on video salient object detection (Xu et al. 2019;Xu et al., Jul. 2020) which uses traditional graph-based models to detect salient objects in videos. These methods fail to detect the complete salient objects which can be overcome by extending the proposed work for the video saliency detection task, as the proposed method is computationally efficient.
Taking into account the earlier discussion, in the presented work, the LGSD-DCPCNN model is proposed to merge the image's local details with global features. The input image is first subjected to superpixel segmentation to obtain visually consistent regions for further processing. Working on a smaller number of superpixels reduces the computational complexity of the proposed approach. Then, global and local saliency maps of the superpixel segmented image are generated and the pixels of the global and local saliency maps are used to activate the neurons of the visual cortex-inspired DCPCNN model. LGSD-DCPCNN model extracts the prominent visual features of the global and local saliency maps and gives a more informative and perceptually appealing resultant saliency maps. The following are the major contributions of the proposed work: • Unlike current existing saliency map integration models, the novel approach for integrating global and local saliency maps using visual cortex-inspired dual-channel pulse coupled neural network is proposed to generate visually consistent saliency maps from global and local saliency maps with improved boundary preservation.
• To provide computationally efficient performance a pixel-related Gaussian mixture model (GMM) based superpixel segmentation is used as an initial step.
• The global and local saliency maps are produced to provide efficient demarcation between foreground and background regions.
• Detailed experiments on widely known four public datasets demonstrate that the suggested methodology beats the most recent unsupervised handcrafted feature-based algorithms for detecting salient objects in terms of detecting more than one salient object accurately by preserving boundaries and edges of salient objects.
The rest of the paper is structured as follows: Section 2 describes, in brief, the methods and materials used in implementing the proposed salient object detection method. Section 3 outlines in detail the proposed methods' implementation steps. Section 4 presents the experiments and discussions. The run time of the proposed method is discussed in section 5. Finally, limitations and conclusion are presented in section 6 and section 7, respectively.

Superpixel Segmentation
The images' visual segments are more appealing to HVS than the image pixel values. Superpixels are groupings of pixels that are color and other low-level features alike. A pixel-related GMM enables superpixels to propagate locally throughout an image, reducing computing complexity than earlier expectation maximization (EM) GMM algorithms. In salient object detection tasks, the SLIC superpixel segmentation algorithm is widely used. SLIC is very effective and computationally efficient (Mu, Qi, and Li 2020), also it is excellent at detecting spherical regions but it may not be able to separate items with odd patterns, such as those that are elongated. Pixelrelated GMM-based superpixel segmentation (Ban, Liu, and Cao 2018) is also computationally efficient and provides visually consistent superpixels with regular size and can efficiently detect the superpixels with unusual shapes. For a given image I of size w � h, each image pixel is assigned an index j. Image pixels are clustered using the GMM superpixel method depending on w s and h s , that are the superpixel's allowable width and height, so that w mod w s , and h mod h s equal 0. The number of superpixels, N, is computed as in Eq. 1 and a set of N superpixels, sp 1 ; sp 2 ; :::sp N , is created.
The label map L b j for superpixels generation based on pixel-related GMM is given by Eq. 2. The superpixels are linked with the Gaussian distribution (Ban, Liu, and Cao 2018), which is defined by the probability distribution function pðz; θ i Þ as given in (Ban, Liu, and Cao 2018).
Here, the term = j is a superpixel set belonging to the pixel j which is unique for each j th pixel. Each superpixel i has the local distributing region around it which is called the i th distributing region for superpixel i. It usually has a local limitation, which means that each superpixel can only appear in a small area of an image. As a result, the superpixel i must be present in each superpixel set = j in the i th distribution region. The total number of elements in the superpixel set = j is assumed to be constant in this study, which is detailed in (Ban, Liu, and Cao 2018).

Center Prior
The background is more likely to be found toward the image's edges which has been demonstrated to be true in (Wang and Peng 2021a). Under this concept, a global saliency map is constructed. While capturing the images the objects are mainly placed at the center and saliency is generally considered to be an actual object which takes into account various persons, cars, objects, boxes, etc. Humans observe a scene from the perspective of the intellect, and they tend to focus on the central portions. Cameras are frequently used to capture important items, which are always placed in the image's central location, which is the center prior. The center is generally formulated using a Gaussian kernel, which was defined as follows: referred as x and y coordinates of pixel p and X 0 ; Y 0 ð Þ is center coordinate of an image. σ 2 x and σ 2 y are the variance of the image in x and y direction, respectively.

Objectness Prior
Another important consideration is the objectness prior (Alexe, Deselaers, and Ferrari 2012), which is utilized to differentiate the salient object windows from the ones in the background. (Alexe, Deselaers, and Ferrari 2012) present an objectness assessment that combines multiple picture features such as edge density, color contrast, multiscale saliency, and straddles using a Bayesian framework. The approach proposed in (Alexe, Deselaers, and Ferrari 2010) is used to get the objectness information about the region, where the objectness information is given in a window form which indicates whether that particular region contains the salient object or not. As a result, we receive an objectness prior map defined by OpðmÞ based on how often the pixel m falls into objectness windows. S op s i ð Þ is the objectness value calculated as follows: where N s i is the total number of pixels in the image region s i .

Random Forest Regression
To tackle regression problems, a Random Forest is an ensemble technique that employs many decision trees and a process known as Bootstrap and Aggregation. Instead of focusing on each decision tree to determine the outcome, the basic idea is to merge the decisions of multiple decision trees. Although every decision tree has significant variance as we mix them all, the total variance is considered to be low since every decision tree has been extensively trained on a sample of data, and so the outcome is based on multiple decision trees rather than just one. In random forest regression, every decision tree is trained with only a few randomly drawn features. And the final output of the random forest regression is the average of the outputs of all the trees. Random forest regression is beneficial for very high-dimensional data. It gives good performance in a salient object detection task.

Dual-channel Pulse Coupled Neural Network
The PCNN model is a bio-inspired spiking neural network that mimics the neuronal assemblies of the mammalian visual cortex. It consists of a single layer, two-dimensional, feed-forward network wherein the neurons are connected latterly (Wang et al. 2016b). Under the impression of an input image, the PCNN neurons respond sharply to various features like position, orientation, direction, etc. Further, the responses of the neurons present within and neighboring cortex columns are also synchronized to generate the final neuronal activity. This feature linking phenomenon generates the coherent spiking of the neurons enabling the PCNN model to extract visually consistent image features. The DCPCNN is an extended model of the original PCNN which can process two inputs simultaneously (Chai, Li, and Qu 2010). The following is the mathematical formulation of a DCPCNN model: Here, E 1 x;y and E 2 x;y denote input feeding channels. S 1 x;y and S 2 x;y denote the external stimulus to the DCPCNN model which can be either pixel intensities or pixel activities of the input signal. L x;y , U x;y , T x;y , and Y x;y represents the linking input, internal activity, variable threshold, and external activity of neurons, respectively. The surrounding activities of the neurons are weighted by synaptic weight matrix W x;y of window size (w). n refers to the number of iterations. α L and α T denote the decay time constants, linking parameters of the two channels are represented by β 1 and β 2 . V T and V L denote the threshold voltage and linking voltage, respectively. All the free parameters of the DCPCNN model generally depend on the nature of the texture of the image.

Proposed Method
The workflow of the proposed method based on superpixel segmentation and LGSD-DCPCNN is shown in Figure 2. The step-wise implementation information of the proposed method is given below.
Step 1: Superpixel segmentation In the proposed method the input image is first converted CIELAB color space and segmented to superpixels as indicated in Eq. 6. Here, the number of superpixels N is considered to be 500.
The labels L b of the superpixel segmented image are obtained using pixelrelated GMM-based superpixel segmentation (Achanta et al. 2012) using Eq. 2.
Step 2: Features used for global saliency map generation In the proposed approach, the global saliency map is constructed using data-driven 81-dimensional low-level features mentioned in Table 1. For every superpixel region sp i , the 81-dimensional feature vector is given by The feature vector includes location, color, histogram, and textural features.
Step 3: Center prior used for global saliency map generation where X; Y ð Þ referred as the average coordinate of the superpixel sp i and X 0 ; Y 0 ð Þ is center coordinate of an image. σ 2 x and σ 2 y are the variance of the image in x and y direction, respectively.
Step 4: Objectness prior used for global saliency map generation The superpixel region's average level of objectness is given by Eq. 8, where m is the pixels contained by i th superpixel and N sp i is the total number of pixel in the i th superpixel.
Step 5: Global saliency map In the proposed approach, the global contrast of the image regions with the boundary image regions is obtained and the discrepancies between them give the global saliency map. The boundary superpixel regions of an image is is the Euclidean distance between the 81-dimensional feature vectors of the i th superpixel region and boundary superpixel regions. Here, we have used Euclidean distance as the feature vector used in the proposed work is not sparse and the Euclidean distance is computationally efficient. Due to non-binary feature data, the Euclidean distance performs better than the L 1 distance measure for the proposed work. The Smap G gives the global saliency map of the superpixel segmented image by considering center prior, objectness prior, and difference of feature vectors of each superpixel to the boundary superpixels of an image. In the proposed method, Eq.9 gives the global saliency maps by retaining the objects containing the boundary part too to a great extent, even though the boundary prior treats boundary as a background.
Step 6: Features used for generating local saliency map The local saliency map is constructed using spatial, color, and textural differences of the superpixel with neighboring foreground and background superpixels. In the proposed method global saliency map is considered as the initial saliency map to find the K nearest background as well as K nearest foreground superpixels simply using the nearest Euclidean distance with the superpixel regions of the 2 � mean thresholded global saliency map. The features used for local saliency map generation are listed in Table 2. The salient parts which are not recognized in the global saliency map generation method are also recognized by this local saliency-based method using spatial and color contrast features by considering the global saliency map as an initial saliency map. For every i th superpixel sp i , firstly we obtain K-nearest foreground superpixels sp fi 1 ; sp fi 2 ; sp fi 3 ; ::::::sp fi K � � and K-nearest background superpixels sp bi 1 ; sp bi 2 ; sp bi 3 ; ::::::sp bi K f g. The value of K is set to 20 as it was Table 2. Features used in local saliency map generation.
Features Dimension Superpixels spatial distance from K-nearest foreground superpixels K Superpixels spatial distance from K-nearest background superpixels K Superpixels color distance from K-nearest foreground superpixels 8 K Superpixels color distance from K-nearest background superpixels 8 K Superpixels texture distance from K-nearest foreground superpixels 10 K Superpixels texture distance from K-nearest background superpixels 10 K giving best F-measure value. The Euclidean distance feature vector of i th superpixel from K-nearest foreground and background superpixels where d fi 2 R K�1 À � and d bi 2 R K�1 À � is computed by Eq. 10.
Where l i indicates the location of the i th superpixel region, l fi and l bi indicates the location of K-nearest foreground and background superpixels, respectively, of the i th superpixel. The color contrast feature vector of i th superpixel region from K-nearest foreground and background superpixels where Here, eight color channels i.e. hue, saturation, CIELAB, and RGB are utilized to get the color contrast feature vector. c i , c fi and c bi are 8 dimensional color vectors of the i th superpixel, K-nearest foreground and background superpixels of the i th superpixel, respectively. The distance vectors d c i ; c fi n À � and d c i ; c bi n ð Þ indicates the Euclidean distance between i th and n th superpixel color attributes where n 2 1; 2; . . . :; K f g i.e. K-nearest background and foreground superpixels. The textural distance feature vector of i th superpixel from K-nearest foreground and background superpixels where d t fi 2 R 10K�1 � � and d t bi 2 R 10K�1 À � is computed by Eq. 12.
Where tð�Þ denotes the textural attributes of the superpixels such as gradient mean, gradient direction, and histogram of gradients. Parameters: Number of superpixels ðNÞ, K nearest background and foreground superpixels ðKÞ, DCPCNN parameters: no. of iterations ðnÞ, synaptic weight matrix ðW x;y Þ of size w, decay parameters α L and α T , linking parameters of two channels β 1 and β 2 , linking voltage V T , threshold voltage V T .
(1) Consider input image as Iðx; yÞ. First convert the I(x,y) from RGB to CIELAB color space.
(2) Segment an image into N superpixels given in Eq. 6 using label map L b generated by a pixel related GMM based superpixel segmentation algorithm as in Eq. 2.
(3) Obtain the color, spatial distance and texture based 81 dimensional features for global saliency map generation as in Table 1.
(4) Compute the center prior and objectness prior of superpixel segmented image using Eq. 7 and Eq. 8, respectively.
(5) Compute the global saliency Smap G x;y map using difference of 81 dimensional features from boundary superpixels, center prior and objectness prior as per Eq. 9.
(6) Considering Smap G x;y as initial image, obtain K-nearest foreground and background superpixels for each superpixel in an image.
(7) For generating local saliency map, obtain spatial distance, color difference and textural difference features of superpixels from K nearest background and foreground superpixels using Eq. 10, Eq. 11, and Eq. 12, respectively.
(8) Compute the local saliency map Smap L x;y using random forest regression using features given in Table 2.
(10) Feed the two input channels of DCPCNN by pixel intensities of Smap G x;y and Smap L x;y and obtain the firing map Y x;y and internal activities U 1 x;y and U 2 x;y of two channel of DCPCNN using Eq. 13. (11) Generate the final saliency map Smap F x;y based on the choosemax rule to the internal activities U 1 x;y and U 2 x;y using Eq. 14.
Step 7: Local saliency map The local saliency map is obtained using the above mentioned feature vectors by using random forest regression (Breiman 2001) algorithm, as it is very effective for large dimensional feature vectors. For training the random forest, 3000 MSRA-B dataset (Liu et al. 2010) images have been used and annotated ground-truth images are used as labels. For random forest regression, 200 trees are used and the maximum depth of the tree is considered as 10 (Becker et al. 2013). The random forest outcome decides whether the superpixel belongs to the background or foreground region and a corresponding saliency map is generated for the particular image. Local saliency map Smap L is produced using random forest regression, for preserving the local characteristics of the objects that might not get captured by the global saliency map.
Step 8: Global and local saliency map integration using DCPCNN The Smap G and Smap L obtained from step 5 and step 7, respectively, are fed to the two channels of the DCPCNN model shown below: The DCPCNN model used has w dc ¼ 5 � 5, n ¼ 200, α L ¼ 1, α T ¼ 0:2, β 1 ¼ 3, β 2 ¼ 3, V L ¼ 1 and V T ¼ 10. The internal activities of a neuron for the inputs E 1 x;y and E 2 x;y are denoted by U 1 x;y and U 2 x;y , respectively. Based on the internal activities of the two channels the final saliency map Smap F x;y is obtained as follows;

Experiments and Discussions
This section presents a detailed qualitative and quantitative performance comparison of the proposed LGSD-DCPCNN method with 17 saliency detection methods.

Datasets for Salient Object Detection
Four salient object detection datasets mentioned in Table 3 are used for the performance assessment of the proposed method. The datasets contain images with a complex and cluttered background, multiple objects, and low contrast. The performance is evaluated over the entire dataset to show that the proposed LGSD-DCPCNN method is capable of performing consistently and reliably over a diverse set of images.

Evaluation Metrics
The proposed method's performance is assessed using different evaluation measures tabulated in Table 4. The evaluation parameters used are the precision-recall (PR) curve, the mean absolute error (MAE) score, F-measure score, receiver operating characteristic (ROC) curve, the area under the ROC curve (AUC) score. F-measure, recall, and precision are used widely for evaluating the overall Many images include multiple prominent items with little color contrast to the background 2 ECSSD (Shi et al. 2015) 1000 The dataset contains multiple salient objects and also having a complex background, which makes the dataset more challenging for the salient object detection task 3 DUT-OMRON  5168 Images of high quality with multiple significant objects, while the backgrounds are relatively cluttered 4 HKU-IS (Li and Yu 2015) 4447 Majority of the images are low contrast with several salient objects  (Achanta et al. 2009). In Table 4, Smap FT x;y and Smap FN x;y are thresholded saliency map and normalized saliency map, respectively. GT x;y is the groundtruth of the particular image. The F-measure value is the combination of the recall as well as precision values that gives a comprehensive measure for saliency detection tasks. To evaluate these parameters thresholded saliency maps of the obtained gray level saliency maps are needed. To obtain the segmented binary saliency maps, the threshold is altered from 0 to 255 for a saliency map whose grayscale values of the pixels are in the range of [0,255]. To evaluate the PR curve, the final saliency map Smap F x;y is binarised using thresholds 0 to 255, and recall and precision values are calculated for each value of threshold which is further used to plot the precision-recall curve. At each threshold, FPR and TPR values are also computed to plot the ROC curve. The ROC curve gives the 2D description of the presented model's effectiveness, while the AUC value summarizes this description into a single quantity. The area under the ROC curve is used to calculate the AUC value. The true negative assignment of saliency is not taken into account by the overlap-based performance metrics. These metrics prefer approaches that assign strong saliency to prominent pixels while unable to recognize non-salient regions. In some applications like content-aware image resizing the continuous saliency maps have more importance than the thresholded binary saliency maps. In such situations, the MAE gives a comprehensive comparison between the groundtruth and the saliency map. The MAE is computed between the normalized final saliency map Smap FN x;y which is normalized in the range ½0; 1� and the groundtruth.

Qualitative Analysis
In the qualitative analysis, the saliency maps obtained by the proposed and mentioned other saliency detection methods are evaluated subjectively based on the criteria like, the degree of isolation between foreground and background regions, the region of the salient object to be outlined, homogeneity in highlighting different regions, detection of salient objects in the complex and low contrast background, and detecting more than one salient objects accurately. Experimental results are shown in Figure 3- Figure 6 for DUT-OMRON, ECSSD, HKUIS, SOD datasets, respectively. It can be observed from Figure 3 - Figure 6 -Image 1 and Image 2 that CA (Goferman, Zelnik-Manor, and Tal 2011) and SEG (Rahtu et al. 2010) methods are giving blurred results with loss of boundary preservation. Figure 3 -Image 1 and Image 2 of DUT-OMRON dataset indicates that GR , MR , MC (Jiang et al. 2013), RR (Li et al., 2015a), LGF (Tong et al. 2015b), DPSG (Zhou et al. 2017), NCUT (Fu et al. 2015), RCRR (Yuan et al. 2018) and SMD (Peng et al. 2016) methods are able to preserve the boundaries but they are unable to detect all the objects present in an image while the proposed method is able to detect multiple salient objects present in an image accurately with preserving the boundaries. In Figure 4 -Image 1, the proposed method is able to detect the salient objects with preserving the fine details of an image as   compared to all other methods except SMD (Peng et al. 2016) method, where the proposed LGSD-DCPCNN and SMD (Peng et al. 2016) give comparable performance. Figure 4 -Image 2 shows the proposed method is accurately detecting the salient object as compared to other methods even the color of the object and background is not so different. From Figure 5 -Image 1, it can be observed that the proposed method detects multiple salient objects with boundary preservation as compared to GR , MR , MC (Jiang et al. 2013), RR (Li et al., 2015a), LGF (Tong et al. 2015b), DPSG (Zhou et al. 2017), NCUT (Fu et al. 2015), RCRR (Yuan et al. 2018) and SMD (Peng et al. 2016) methods. It can be observed that these methods can preserve the boundaries to some extent but fail to detect multiple salient objects. SMD (Peng et al. 2016) method in Figure 5 -Image 2 can detect multiple salient objects but it fails to do so for image shown in Figure 5-Image 1. While the proposed method is detecting multiple salient objects almost for all kinds of images in the HKUIS dataset. Figure 6 -Image 1 shows that the proposed method accurately detects the complete salient object as compared to all the other methods which are miss detecting the tail portion of the object. Figure 6 -Image 2 indicates the image is having 5 salient objects, where only the proposed method can detect all the salient objects accurately compared to all the other methods. From the qualitative analysis, it can be inferred that the proposed LGSD-DCPCNN method outperforms other methods for all the datasets covering different scenarios. In comparison with other saliency methods, LGSD-DCPCNN produces a high-resolution saliency output on a variety of tough natural images. In particular, when comparatively evaluated with other approaches, LGSD-DCPCNN provides a saliency map that evenly highlights salient regions and efficiently suppresses background regions. It effectively separates background and foreground and detects the regions which are salient even in some complex background images. It also detects more than one salient object more accurately than other methods.

Quantitative Analysis
Quantitative assessment is also carried out to evaluate the proposed LGSD-DCPCNN and other existing techniques. Figures 7 and 8 show comparative results of LGSD-DCPCNN with other saliency detection methods considering the PR curve, and ROC curve, respectively on four benchmark salient object detection datasets mentioned in Table 3. Figure 7 shows that the LGSD-DCPCNN approach gives the comparable or higher performance in terms of PR curve on all the datasets with mentioned salient object detection methods.
Moreover, the LGSD-DCPCNN method outperforms other methods and gives the best performance for almost all the datasets in terms of ROC curves shown in Figure 8. A comparative analysis based on MAE, F-measure (Fm), and AUC scores is also performed and presented in Table 5. The proposed LGSD-DCPCNN approach achieves the highest AUC score on all the datasets, which demonstrates the superiority of the proposed approach in accurately discriminating the background and foreground areas in salient object detection tasks. Table 5 demonstrate that the LGSD-DCPCNN method also achieves the MAE and Fm values in top-performing methods i.e. at first, second, or the third position as compared to mentioned saliency detection techniques for all the dataset. By comparing the results in Table 5 the following points can be summarized: • The methods CA (Goferman, Zelnik-Manor, and Tal 2011),SEG (Rahtu et al. 2010), and GR  give poor performance as compared to proposed method.
• The proposed method gives the highest AUC score on all the datasets as compared to other mentioned salient object detection techniques.
• On all the datasets, the proposed method gains second or third highest performance in terms of Fm and MAE value as compared to existing salient object detection methods. But the overall performance of the proposed method is better than all the mentioned salient object detection techniques. • For DUTOMRON dataset: the proposed method gains AUC value � 1.46% higher compared to MC (Jiang et al. 2013) and 24.6% higher as compared to BSDL (Pang et al. 2020b) and FCB  methods; the proposed method gains Fm value � 4.9% higher as compared to TMR (Sun et al. 2022) and 29.38% higher as compared to FCB  and MR  methods; the proposed method achieves MAE value � 2.11% lower as compared to BSDL (Pang et al. 2020b) and 31.7% lower as compared to LPS  and MR (C.  methods. • For HKUIS dataset: the proposed method gains AUC value � 0.96% higher compared to MC (Jiang et al. 2013) and 11.57% higher as compared to LPS  method; the proposed method gains Fm value � 1.15% higher as compared to DPSG (Zhou et al. 2017) and 35.78% higher as compared to MR  method; the proposed method achieves MAE value � 0.61% lower as compared to DPSG (Zhou et al. 2017) and 12.2% lower as compared to MC (Jiang et al. 2013) method.
• For ECSSD dataset: the proposed method gains AUC value � 0.5% higher compared to CDHL (F. Wang and Peng 2021b) and 17.95% higher as compared to FCB (G.H. Liu and Yang 2019) method; the proposed method gains Fm value � 0.3% higher as compared to TMR (Sun et al. 2022) and LGF (Tong et al. 2015b) and 22.2% higher as compared to CDHL (F. Wang and Peng 2021b); the proposed method achieves MAE value � 1.33% lower as compared to LSP (Zhang et al., 2018) and 34.67% lower as compared to MC (Jiang et al. 2013) method.
• For SOD dataset: the proposed method gains AUC value � 0.4% higher compared to MC (Jiang et al. 2013) and 17.47% higher as compared to FCB (Liu and Yang 2019) method; the proposed method gains Fm value � 6.56% higher as compared to SMD (Peng et al. 2016) and 25% higher as compared to FCB (Liu and Yang 2019) method; the proposed method achieves MAE value � 2.17% lower as compared to SMD (Peng et al. 2016) and 13.48% lower as compared to MC (Jiang et al. 2013) and LPS (Li et al., 2015a) methods.

Performance Comparison with Deep-Learning-Based Techniques
Furthermore, the proposed method's overall performance is compared to deep-learning-based techniques which are, KSR (Wang et al. 2016a Table 6 gives the comparative analysis of these deep-learning-based techniques with the proposed approach based on MAE and AUC scores. It is evident from Table 6 that the LGSD-DCPCNNbased approach gets equivalent or improved outcomes. It can be inferred clearly from Table 6 that the suggested LGSD-DCPCNN-based approach performs better for the DUT-OMRON, SOD, and HKUIS datasets than most of the deep learning-based approaches. In fact, despite the dataset's renown for difficult images with dense backgrounds, low contrast images, and images with multiple objects, the proposed method can produce considerably superior results. The proposed approach is capable of detecting objects in images by preserving boundaries in a more efficient manner which is evident from Figure 9 where most of the DL-based techniques fail to produce better qualitative results. Traditional computer vision approaches can typically solve problems faster and with fewer lines of code than deep-learning algorithms (DL), hence DL is often superfluous. Deep neural net (DNN) features are particular to the training dataset and, if poorly created, are unlikely to function well for images other than the training set. On the other hand, traditional computer vision techniques are completely transparent, allowing you to assess whether your idea would perform outside of a training scenario. If anything goes wrong, the parameters can be changed to function properly for a larger range of images. One of the issues in deep learning algorithms is their poor capacity to learn visual relations or determine whether any items in an image are the same or different which is very important in a saliency detection task. The most recent deep learning algorithms may achieve far higher accuracy but at the cost of billions of additional math operations and a higher computing power demand as compared to ML-based bottom-up saliency approaches. So, we can not say that hand-crafted feature-based techniques are obsolete. In the field of computer vision, the integration of handcrafted features and deep features is yielding promising results, which can be considered a future scope of the proposed work.

Feature Selection for Global Saliency Map
To further understand the significance of each of the four feature attributes i.e. color-based features, color histogram-based features, texture-based features, and location-based features mentioned in Table 1, we constructed four applicable approaches, each deleting one of the features, and compare the results for global saliency map generation in Figure 10. The findings show that every feature category has its own set of favorable conditions that the other three feature categories are unable to address. So, it is important to consider all the features in the global saliency map generation of the proposed work.

Run Time Analysis
This section shows how long the proposed method takes to produce a saliency map. The test is performed on a 300 � 400 size image on a 64-bit PC with an i7-4770 3.40 GHz processor and 32.0 GB RAM. MATLAB 2017a is used to run all of the routines. Local and global saliency map generation takes 1.23 s and 0.345 s. The time for merging global and local saliency maps using DCPCNN is 0.732 s. Our trained random forest model is 3.8 MB in size, making it a lowweight model which can be considered as efficient to deploy on hardware for any salient object detection application. By adopting a shallow random forest, the size of the trained random forest regression can be further lowered.

Limitations and Future Aspects
The generation of the final saliency map using DCPCNN is mainly dependent on local and global saliency maps. For generating a local saliency map using an MLbased bottom-up saliency approach based, the global saliency map is considered as a baseline map. So, when the global saliency map does not produce good results the proposed method fails to provide the appropriate saliency map as can be seen from Figure 11. For the images in Figure 11, the object and background parts are very difficult to differentiate. The baseline global saliency map can recognize a few salient objects, but many background disturbances cannot be efficiently eliminated. It is surprising to learn that our proposed LGSD-DCPCNN approach can adequately reflect the contrast between the salient object and the background. Based on the aforementioned observations, we conclude that the LGSD-DCPCNN method performs remarkably even when the baseline saliency map has a poor outcome. Nonetheless, good performance is difficult to achieve when the first saliency map fails to reveal any important saliency information. This difficulty arises in most of the ML-based bottom-up saliency approaches as they require initial knowledge to get the training data. This problem can be tackled using weakly supervised ML-based bottom-up saliency approaches, which may be considered as the future scope of the current work. The discussed limitation can also be overcome using different distance metrics for higher-dimensional data like fractional norms where L < 1 and by using some feature reduction technique to reduce the dimension of the 81dimensional feature vector in global saliency map generation which can be considered as the future scope of the proposed work.

Conclusion
This paper presents a novel salient object detection technique that integrates global and local image features using the LGSD-DCPCNN model. A pixel related GMM based superpixel segmentation is employed initially for the input image to speed up the computations. The feature vector of 81 dimensions which contains color, statistical, and texture-based features along with the objectness and center prior, is used to obtain the global saliency maps. While color, spatial, and textural distance features are used to generate the local saliency map using random forest regression which considers the global saliency map as an initial map to find the nearest background and foreground superpixels. The proposed method effectively merges the local and global information taking into account the human visual consistent features of the global and local saliency maps. The use of DCPCNN takes into account the neighborhood pixel variations and helps to preserve the object boundaries without introducing blurring and artifacts. The outcome of substantial experiments carried out on a variety of datasets demonstrates that the suggested combination of global and local saliency maps outperforms other existing saliency detection methods in terms of AUC, F-measure, and MAE scores as well as it gives a comparable performance with many of the deep learning techniques. The proposed method for detecting salient objects proves its superiority in detecting multiple salient objects by preserving the fine details and boundaries of the objects which is evident from the qualitative analysis of the proposed algorithm.

Disclosure Statement
No potential conflict of interest was reported by the author(s).

Funding
The authors have no funding to report.