An improved color consistency optimization method based on the reference image contaminated by clouds

ABSTRACT Optimizing color consistency across multiple images is a crucial step in creating accurate digital orthophoto maps (DOMs). However, current color balance methods that rely on a reference image are susceptible to cloud and cloud shadow interference, making it challenging to ensure color fidelity and a uniform color transition between images. To address these issues, an improved method for color consistency optimization has been proposed to enhance image quality using optimized low-resolution reference images. Initially, the original image is utilized to reconstruct areas affected by clouds or cloud shadows on the reference image. For seamless cloning, a Poisson blending algorithm is employed to minimize color differences between reconstructed and other regions. Subsequently, based on a weighting approach, the high-frequency information obtained through Gaussian and bilateral filtering is superimposed to smooth the image boundary and ensure color continuity between images. Finally, local linear models are constructed to correct image color based on the optimized reference and down-sampled images. To validate the robustness of this approach, we tested it on two challenging datasets covering a wide area. Compared to state-of-the-art methods, our approach offers significant advantages in both quantitative indicators and visual quality.


Introduction
Digital Orthophoto Maps (DOMs) are crucial in land cover classification, change detection, and national geographic feature extraction (Deng et al. 2018;Jalal et al. 2019;Yoo and Lee 2016).The generation of DOM usually involves the seamless fusion of multiple images with extended coverage.However, remote sensing images are typically captured by different sensors at varying angles and under different imaging conditions, resulting in substantial differences in radiation across images.Color inconsistencies may severely affect the visual quality of mosaic images.To overcome this challenge and obtain seamless mosaic images, extensive research has been conducted by many scholars.
Currently, two primary methods are employed for color correction, namely color transfer and color consistency optimization (Xia, Yao, and Gao 2019).The color transfer method can be divided into two steps: first, a reference image is selected manually or automatically, and then image hues are transferred from the reference image to other images following the shortest path algorithm.Therefore, the focus of this method lies in the selection of reference images and the determination of color transfer paths.Unlike color transfer, the key to color consistency optimization lies in optimizing color consistency across multiple images globally, which is implemented by minimizing a designed energy function.
In some advanced color transfer technologies, in addition to adjusting image colors, the quality of the original image, such as texture details, is also well preserved.Su et al. (2014) improved image quality by utilizing multi-scale detail processing and selflearning filtering to minimize the normalized Kullback-Leibler distance.This allowed them to achieve color fidelity, a seamless appearance of detail, and suppress corrosive artifacts.Similarly, Wang et al. (2017) accurately transferred image tones based on the L0-norm gradient-preserving algorithm and similarity-preserving color mapping to maintain the similarity and detail of images as much as possible.In addition, some scholars have made different attempts to select reference images used in color transfer methods.For instance, Zhang et al. (2017) used absolute radiometric calibration images as reference images.Xie et al. (2018) selected the image subset instead of an image as a reference image from the weighted image graph.Pastucha et al. (2022) used image position relationships to group similar images and then selected the largest group as the reference group.Although these algorithms achieve more robust color transfer by using more accurate references, their results are vulnerable to external noise interference and may not produce high-quality output images.Therefore, some color transfer methods based on reliable features and robust correspondence have been proposed.For example, Hwang et al. (2019) strengthened the probability modeling and spatial constraints of color transfer in 3D color space to eliminate the impact of mismatch, spatially varying illumination, and noise on the resulting image.In their research, He et al. (2019) adopted semantically meaningful dense correspondence to improve the accuracy of color transfer between images with perceptually similar semantic structures.In addition, Oskarsson (2021) proposed a feature-based method to perform robust color transfer fitting for data containing coarse outliers.
As mentioned earlier, the reference image and transfer path are critical to the color transfer algorithms.However, the scientific selection of reference images remains an unresolved issue.Additionally, the shortest transfer path cannot avoid accumulated error, which limits the application of color transfer algorithms in large-scale datasets.The color consistency optimization approach effectively avoids the selection of reference images and error accumulation by utilizing energy function optimization for color correction.
In some mainstream color consistency optimization methods, much attention has been paid to developing algorithms that integrate the benefits of local and global optimizations (Liu et al. 2020(Liu et al. , 2021;;Yu et al. 2017;Zhang et al. 2020).These approaches adopt a whole-to-part idea, which effectively considers the color consistency of both the whole and the part.However, they are susceptible to ground feature alterations and geometric deviations between images.To eliminate the negative impact of pixel changes, some methods use pseudo-invariant features (PIF) or robust models to fit the color correspondence between images to improve overall color consistency (Li et al. 2022;Moghimi et al. 2022).The existing PIF recognition methods, such as those based on multiple rules (Moghimi et al. 2021;Xu et al. 2021) and key point descriptors (Kim and Han 2021; Moghimi et al. 2021Moghimi et al. , 2022;;Varish et al. 2023), are mostly data-oriented in their design.Therefore, the applicability of these methods to various types of data still needs to be studied.Due to incomplete visual quality control, the images generated by color correction algorithms based on PIF may not be of high quality.In order to optimize color consistency while maintaining image quality, some scholars have conducted in-depth research.For instance, Niu et al. (2019) used a coarse-to-fine strategy and guided filtering to achieve global and local color consistency while retaining the original image's structural information.Liu et al. (2019) proposed a gradient preservation-based color correction approach for image structure consistency that uses local mapping to correct the approximate region and combines region mapping, feature extraction, and gradient optimization to optimize image color difference while protecting image structure.Furthermore, Xia et al. (2019) and Xia et al. (2017) used the parametric quadratic spline model to transform the color consistency constraint into an effective parameter expression and designed an energy function to achieve the maintenance of image quality and optimization of color consistency between images.To enhance image contrast and reduce color inconsistencies, Li et al. (2022) used original color information to increase image contrast, which produces high-quality corrected images even when the image contrast is extremely low.
Currently, using external color references is a mainstream method for correcting the color of images that cover a wide range of areas.For example, Zhou (2015) conducted gamma correction on the image by leveraging the calculated local average of the original image and the external reference image.Yu et al. (2016) proposed a color balance approach that employs a color reference library.Cui et al. (2021) simulated the image color distribution by extracting low-frequency information from an external lowresolution reference image and achieved automatic color correction using a model that combines defogging and radiometric correction.These methods make full use of the advantages of color references and avoid the problems of "two-body" (Chen et al. 2014) and color error propagation that exist in traditional color consistency optimization approaches.However, they may encounter issues when dealing with interfering factors, such as clouds and cloud shadows in the reference image.
From the aforementioned studies, it is evident that several algorithms have been proposed and substantial progress has been made in eliminating color differences and improving image quality.However, the effectiveness of these algorithms, which rely on external references, is vulnerable to incorrect color mapping due to pixel variations between the reference and original images.Therefore, the objective of this paper is to introduce an improved color consistency optimization method based on a low-resolution reference image that is contaminated by clouds.
Our research contributions can be summarized as follows: (1) We creatively use Poisson blending to reduce the content changes between the reference and original images caused by various factors in order to obtain a more accurate reference color and then achieve the fidelity of the color of the features in the image.It provides a new way of thinking to solve the problem of incorrect color mapping in current color correction methods based on reference images.(2) By using the proposed method of image boundary filling and image high-frequency information weighting, the high-frequency information of the image boundary is effectively smoothed, and a uniform transition of color between images is achieved.
The remaining parts of this article are organized as follows: Section 2 describes the proposed algorithm in detail.Section 3 provides details on the experiments conducted.Finally, Section 4 presents the conclusions.

Proposed method
As shown in Figure 1, the proposed method consists of two steps aimed at optimizing color inconsistencies between images.The first step involves the use of two optimization modules to enhance the quality of the low-resolution reference image and the down-sampled image obtained from the original image.In the second step, the resulting images are generated based on the optimized reference and down-sampling images.Below, we provide a detailed description of these two steps.

Optimize the reference image
Our objective at this stage is to acquire a reference image with a high frequency that is comparable to the original image.To achieve this, we employ a technique that superimposes the high frequency H srcdown of the down-sampled image I srcdown with the low frequency L ref of the reference image I ref .
This process results in an image I dstdown , which exhibits a low frequency that aligns precisely with the color distribution of the reference image, while its high frequency is derived from the original image.To obtain superior quality I dstdown , we adopt two optimization modules to process L ref and H srcdown , respectively.In the following section, we present a detailed account of the processing flow for these two modules.

Optimize the L ref based on Poisson blending
Clouds and their accompanying shadows are an unavoidable presence in low-resolution reference images (Wang et al. 2023).If left unaddressed, these disturbances will contaminate the low-frequency information extracted from the reference image, which in turn affects the quality of the resulting image.In order to demonstrate the impact of clouds and their shadows on the optimization of color consistency, reference images containing these interferences are used in this paper.The subsequent experimental results are shown in Figure 2.
Figure 2 shows that when clouds and their shadows appear in the reference image, even if these interfering factors do not exist in the corresponding regions of the original image, they appear in the corresponding locations of the generated image after color consistency optimization.After careful analysis, it was found that the clouds and cloud shadows on the reference and input images made the pixels change between the reference and input images.These varying pixels result in the wrong color mapping, which in turn makes the resultant image exhibit unnatural colors.For example, regions in the resultant image without clouds exhibit cloudy tones, while regions with clouds do not.
To solve this problem, we employ Poisson blending (Fang et al. 2019;Hu et al. 2020) to reconstruct the content of the original image on the reference image to reduce the varying pixels and thus improve the accuracy of color mapping.It is worth noting that the significance of our adoption of Poisson blending is not the single removal of clouds and cloud shadows, but the reduction of varying pixels.Therefore, the proposed method can be further applied to deal with the problem of inter-image content variations   reconstructed on the reference image, the color correction results will exhibit discordant colors in areas where clouds are present.Specifically, highlighted clouds will be processed as darker colors.With Poisson blending, there is no significant content change between the output image and the original image, which effectively improves the quality of the color correction results.
Why is Poisson blending used to minimize content changes between images?This is because Poisson blending is effective in achieving seamless cloning between images, and even if there are large color differences between the reference image and the original image, the method is able to seamlessly reconstruct the content of the original image on the reference image.In the next section, we briefly introduce the processing flow of Poisson blending by taking the example of reconstructing the content of the original image in the region containing clouds and cloud shadows in the reference image.The masks of clouds and cloud shadows required by the algorithm have been extracted by manual methods, and the schematic diagram of the algorithm is shown in Figure 4.
Initially, we perform a convolution operation on the patch and target images using the convolution kernels k 1 = [0, −1, 1] and k 1 T to compute the gradient maps in the x and y directions.Subsequently, the gradients within the cloud and cloud shadow regions in the target image are removed based on the obtained mask image, while the gradients within the corresponding regions of the patch image are retained.The results of this mask operation are then superimposed to generate the gradient map of the image to be reconstructed.After that, we perform a convolution operation using the convolution kernels k 2 = [−1, 1, 0] and k 2 T on the gradient map in order to compute the partial derivative maps of the gradient map in the x and y directions.The partial derivative maps in both directions are then superimposed to generate the divergence of the image to be reconstructed.Finally, the following Poisson equation is built and solved based on the divergence and the original image (Pérez, Gangnet, and Blake 2003): where Δ is the Laplace operator; divV represents the divergence of gradient field V(α, β), i.e. divV = ∂ α/∂ x The solution of the Poisson equation is the image to be reconstructed.
In order to deepen our understanding of the process of establishing and solving Poisson's equation, we will use diagrams to illustrate it in detail.As shown in Figure 5, the pixel values at the boundaries of the contaminated region are known.After determining the divergence of the image to be reconstructed, we can use the divergence formula to generate the following equation: Since all other quantities in the system of equations except are known, the system of equations has only one set of solutions.Therefore, the values of the tainted pixels can be calculated by solving the equations.In summary, regardless of the size of the contaminated region, the Poisson equation can be established to reconstruct the contaminated region as long as the pixel values at the boundaries of the contaminated region (Dirichlet boundary conditions) and the pixel divergence values of the contaminated region are known.
It is worth noting that, according to the Poisson blending principle, the quality of the reconstructed image is sensitive to the pixel values around the contaminated region.In order to obtain more robust results, we use the expansion operation in morphology to extend the boundary of the extraction mask so that the contaminated area can be completely covered.Poisson blending is then used to reconstruct the content of the original image on the reference image.Finally, Gaussian filtering is applied to the reference image to separate out the low-frequency information in the image.Since Gaussian filtering is scale and rotation invariant and is effective in removing most of the noise that follows a normal distribution, we employ it to extract the low-frequency information of the image.

Optimize the H scrdown based on Gaussian and bilateral filtering
In order to ensure that the resolution of the reference image and the original image are the same, we use an average down-sampling technique, which is described as follows: Assuming the width and height of the original image are W and H, based on the resolution relationship between the original and down-sampled images, the width and height of the down-sampled image can be calculated, denoted as w and h.Then, the original image is divided into multiple image blocks with a size of R × C, where To demonstrate this problem, we conducted a set of experiments as shown in Figure 6.
From Figure 6, it can be clearly seen that the image located in the red box region shows obvious stitching traces.To alleviate this problem, a boundary smoothing strategy including Gaussian filtering and bilateral filtering is proposed.The method has two core elements.One, modifying the invalid values around the image.The second is weighting the high-frequency information extracted through Gaussian and bilateral filtering.
Inevitably, there are some invalid zero values around the processed DOM image.Due to the presence of these zero values, the resultant values of the convolution operation using Gaussian filtering for pixels located at the DOM boundaries tend to be low.This will further result in the extracted highfrequency information having large gradient values at the corresponding positions.To alleviate this problem, in this paper, the zero value is modified based on the idea of boundary padding to reduce the gradient value of pixels located at the image boundary.Boundary filling can be categorized into four types: zero filling, constant filling, mirror filling, and repeat filling.It is worth noting that constant fill and zero fill are ineffective at reducing the gradient values of the pixels at the boundary.Therefore, the image can be preprocessed by mirror fill or repeat fill before using the two filtering methods to extract the highfrequency information of the image.The process of modifying the invalid values around the image based on the idea of repeat filling will be described in detail below.There are two steps in total.Suppose the size of a remote sensing image is M × N. In step 1, each row of pixels is traversed in left-toright, top-to-bottom order, starting from row 0 until the M-1th row of pixels is traversed.While traversing each row of pixels, the column numbers of the pixels with zero values are recorded first.These column numbers can form n mutually disjoint sets.For example, , where p j > q jÀ 1 þ 1; j 2 2; 3; � � � ; n f g.Then, based on the following rules, modify the value of the pixel whose column number belongs to set Q i .
(a) If p i ¼ 0 and q i ¼ N À 1 hold, then record the row number of the current row and don't modify the pixel value of the current row.
After that, the pixels with column numbers p i ; p i þ 1; � � � ; and p i þq i 2 � � are replaced with the pixel whose column number is S, and the pixels with column numbers andq i are replaced with the pixel whose column number is E. Notably, we often replace the invalid pixels with the mean value of the non-zero pixels around a pixel with column number S or E in order to prevent noise interference.
In step 2, since the row numbers recorded in the first step can also be formed into n nonintersecting sets, therefore, based on the idea in the first step, the values of S and E can be calculated first, after which the pixels of row p i ; p i þ 1; � � � ; and p i þq i 2 � � are replaced by the pixels in row S, and the pixels of row Although the step of modifying the invalid values around the image proves to be effective in reducing the gradient values of the pixels at the image boundaries, the robustness of this technique is limited and is not sufficient to deal with complex image stitching scenarios.Since bilateral filtering has the effect of preserving the edges of the image, the gradient values in the high-frequency information obtained based on this method are small.Reasonable weighting of the high-frequency information with small gradient values and the high-frequency information extracted by Gaussian filtering can help to further smooth the gradient values of the pixels located at the image boundaries.The steps for determining the weight values are as follows: First, a weight matrix of the same size as the high-frequency information is generated, and all the matrix values are initialized to 1. Subsequently, the weight matrix is convolved using the convolution kernel used in the Gaussian filter.The result of this process is considered the final weight value.Finally, the optimized highfrequency information H O is obtained by the following equation: where � represents the point multiplication of the matrix; H G and H B represent high-frequency information extracted by Gaussian and bilateral filtering, respectively; W is the weight matrix.
The important parameters of Gaussian and bilateral filtering involved in this subsection are described below.The key parameters of Gaussian filtering are the size of the convolution kernel and the standard deviation of the Gaussian distribution.Among them, the convolution kernel size k size is set to one-twentieth of the image diagonal length, and the standard deviation σ is calculated according to the following equation: In addition, the key parameters of bilateral filtering include the size of the convolution kernel and the standard deviation of the Gaussian distribution in the coordinate space and color space, denoted as K size , σ space , and σ color , respectively, where K size ¼ k size , σ space ¼ 50, and σ color ¼ 12:5.They are all empirical values.

Color correction
During relative radiometric correction, it is generally assumed that the pixel brightness values of remote sensing images acquired at different times conform to a linear relationship: where I dst r; c ð Þ and I src r; c ð Þ represent the pixel values located at (r, c) in the target and original images, respectively; r and c are the row and column numbers where the pixels are located, respectively; and a and b are parameters in the linear model.It is worth noting that the down-sampling result, I dstdown , of the target image mentioned in this section refers to the optimized reference image discussed in Section 2.1.Given the wide coverage of satellite images, it is challenging to fit the color differences of various regions in an image using a single linear model.To address this issue, we propose a linear model construction method based on image blocks.Specifically, for each image block of size w × h, we assume that the pixels in the block adhere to a linear model, which can be expressed by equation ( 6).
Divide both sides of equation ( 6) by the number of pixels (w × h) to obtain equation ( 7).
where � I dst r; c ð Þ and � I src r; c ð Þ represent the mean intensities of the image blocks in the target and original images, respectively.Moreover, when dark-tone objects are present in the image block, the block's brightness mainly originates from diffuse light.In theory, the block's brightness hardly changes after color correction.That is, when I src r; c ð Þ → 0, then I dst r; c ð Þ → 0, and substituting I dst r; c ð Þ → 0 into formula (6) yields b → 0. Thus, for the image block, it holds that: The gain coefficient a in the linear model can be calculated from the corresponding pixels of I srcdown and I dstdown , as I srcdown is obtained by mean downsampling the original image.To avoid interference from brightness burst points and abnormal values caused by bright ground objects, the gain coefficients of pixels in I srcdown and I dstdown that do not conform to the 3σ principle are set to 1.The abnormal values in the gain coefficient map are further eliminated based on the 3σ principle, where any abnormal value is set to 1. Furthermore, for a source image, its gain coefficient map can be obtained through bilinear interpolation up-sampling, which is denoted as A. Similarly, based on bilinear interpolation, I dstdown and I srcdown can be up-sampled to the size of the source image.As the up-sampled image is relatively smooth and fits the original image well, the low-frequency information for the target image and the source image can be obtained from the up-sampled images L dst and L src , respectively.Equation ( 9) can be derived from equation ( 5).
Since b is a small quantity relative to A × L src , based on equations ( 8) and ( 9), we can list equation ( 10).
Formula ( 10) enables the computation of the color correction outcome for each image.The resultant image comprises the low and high frequencies derived from the color-corrected reference image and stretched original image, respectively.It is important to highlight that color image processing requires prior conversion of the color space from RGB to YCbCr and subsequent band-by-band image correction.Figure 7 depicts two examples that visually illustrate the effectiveness of our proposed approach.The left column showcases a comparison between the color-corrected result image (c) generated based on the reference image (b) that contains clouds and cloud shadows and the original image (a).In the red box area of (c), we can observe the appearance of clouds and cloud shadows.Moreover, in the blue box area, there are significant changes in ground features between (b) and (a), which causes the color of the blue box area in (c) to appear unnatural.On the other hand, the resulting image (d) generated by our improved approach using the low-resolution reference image shows significant advantages over (c).The right column of Figure 7 demonstrates another example where there are apparent stitching traces in the red box area of (c).Our method addresses this issue effectively, as evidenced by the results presented in (d).Overall, our proposed approach can effectively smooth the image boundary and reduce the impact of clouds and their shadows in the reference image.

Experimental datasets
The study compares the performance of various approaches for optimizing color consistency in two datasets, TAIHU and CHINA, which were provided by the Land Satellite Remote Sensing Application Center, Ministry of Natural Resources of P.R. China.Prior to color correction, all images in the datasets were converted into a unified coordinate system.Subsequently, the method described in Hong et al. (2022) was utilized to transform the images into 8-bit images, whose color consistency was then optimized to visualize the algorithmic results.Due to the time-consuming nature of optimizing source images in the CHINA dataset, the images were down-sampled using local averaging.This resulted in a change in the image resolution from 2.1 m to 50 m.
The CHINA dataset comprised 6,766 ZY3-01/02 images, while the TAIHU dataset contained 41 ZY3-01 images, collectively covering a significant portion of China.ZY3-01 and ZY3-02 represent the first and second satellites in China's Resource 3 series, respectively.Given the varied ground object types and complex imaging conditions, the images acquired between 2012 and 2017 exhibit obvious color inconsistency, thereby establishing representative datasets for measuring the effectiveness of diverse algorithms.For further details, Table 1 contains detailed and specific information.Figure 8(a) provides a schematic image of the study area, whereas Figure 8(c) depicts the external reference image obtained from the LocaSpace Viewer software, with a resolution of 363 × 306 m.In addition, the masks of clouds and cloud shadows used in the method are extracted manually.

Evaluation metrics
This study employs three objective indicators to quantitatively evaluate the effect of color consistency optimization methods.The first measure is the quality  considering color Euclidean distance (QCCED), which assesses the impact of different approaches on image quality and the color difference between images.The QCCED value is directly proportional to the color consistency; thus, higher values indicate better color consistency.The second indicator is structural similarity (SSIM), which evaluates the similarity of the structure, contrast, and brightness between two images, with a range between 0 and 1.Higher values indicate less structural difference between the two images.The last indicator is the one-dimensional image entropy (OIE), which calculates the information richness of aggregated features in the gray-level distribution.A higher level of information richness is indicative of better image quality.It is crucial to note that during the quantitative assessment of color images, the CED value in the QCCED indicator should be computed in the CIELAB color space.
(1) Quality Considering Color Euclidean Distance (QCCED): For two overlapped images I a and I b , this metric is calculated as: where c is a constant with a value of 1.I a and I b are images with overlap after color correction.MG(•) is the average gradient of the image.CED(•) denotes the color Euclidean distance between overlapping areas of two images.
(2) Structural Similarity (SSIM) The calculation formula is as follows: where S I a ; (3) 1-D Image Entropy (OIE) This indicator is defined as: where p k denotes the frequency of a pixel with a value of k.For more detailed information on the above indicators, please access the paper (Hong et al. 2022).

Comparative experiments
To assess the effectiveness of our proposed algorithm, we compared it with two state-of-the-art methodologies.The first approach, global tilt adjustment (GTA) (Yu et al. 2017), compensates for neighboring image contrast, color, and brightness.This method has been integrated into the OrthoVista module in Inpho 7.0 software.The second method, block adjustmentbased radiometric normalization (BARN) (Zhang et al. 2020), optimizes global color difference using block adjustment, classifies pixels roughly based on the normalized difference vegetation index (NDVI), and further reduces local color inconsistencies in images using a block adjustment strategy for similar pixels.Since the experimental data lacks the nearinfrared band, we modified the BARN method and retained only the strategy for optimizing global color difference.Furthermore, for the CHINA dataset with a large number of images, we grouped images and used the BARN strategy to select a reference image for each group.These two methods are currently the mainstream color correction methods and hold significant comparative value.We evaluated the experimental results both qualitatively and quantitatively.
Figure 9 displays the complete mosaic image of the TAIHU dataset after processing with different methods.The image in (b), processed using the BARN method, exhibits a uniform hue but noticeable stitching traces and color differences in the area marked by red boxes.Furthermore, this method necessitates the selection of a reference image, making the hue of the entire mosaic image linked to the reference image.The color inconsistencies between images corrected using the GTA approach are substantially reduced.Nevertheless, the GTA approach relies on the image itself, resulting in significant color differences in the region marked by red boxes in the processed image (c).However, visual analysis suggests that the color consistency of (c) is markedly improved in comparison to (b).In general, the image (d), processed using the proposed method, has a consistent color distribution and optimal color consistency equivalent to that of the reference image.
Table 2 presents the quantitative results obtained from various algorithms on the TAIHU dataset.The statistical analysis indicates that the QCCED and CED indicators of images processed by our proposed method are significantly superior to those processed by the GTA and BARN methods, which is in line with the visual assessment.However, there is a slight discrepancy in the OIE indicator when compared with the GTA approach.In contrast to the BARN methodology, our approach exhibits a noticeable gap in the SSIM indicator.This is due to the fact that the color distribution of the selected low-resolution reference image is significantly different from the original image, which results in reduced structural similarity between the resulting and original images.
Figure 10 presents the evaluation of diverse methods on the CHINA dataset, which encompasses a vast portion of China.At a macroscale, it can be observed that, despite the presence of considerable color disparities between the source images, all approaches have achieved considerable enhancements.To conduct an objective appraisal, we have extracted the images from regions A, B, and C in Figure 10, which exhibit substantial color variations due to sensor distortion and imaging discrepancies.These three sets of experiments have allowed us to accurately evaluate the performance of various algorithms on datasets that cover different regions.The quantitative outcomes of diverse methodologies in the three regions are provided in Table 3.In Figures 11-13, we used a pseudo-color scheme instead of a grayscale scheme to better show the color differences between images.
Figure 11 depicts the outcome of applying various color correction approaches to area A of the CHINA dataset.Prior to the processing, noticeable disparities in contrast and brightness among source images led  The results obtained by processing images in region B using different approaches are presented in Figure 12.It is evident that the overall visual effect of Figure 12(d) is superior to that of Figure 12(b, c). Figure 12(b) exhibits a notable difference in brightness in the blue box, and apparent stitching traces still exist in the red   Figure 13 illustrates the results of color correction methods applied to the region C dataset.On the  whole, all methods have shown significant improvements.However, Figure 13(b, c) still exhibit color inconsistencies in the area marked by the red box, indicating the limitations of these approaches.As previously discussed, the brightness differences in the region highlighted by the blue box in Figure 13(b) are caused by significant brightness differences between the reference images.Overall, our method demonstrates the most visually satisfying outcome, as depicted in Figure 13(d).This observation is supported by the quantitative results presented in Table 3.
In summary, the performance of the BARN methodology is dependent on the reference image chosen, and this selection is especially crucial when processing images with wide coverage.Conversely, when processing simple scene images, the GTA method is generally more effective.However, since the GTA method requires the optimization of all original images simultaneously, it may not produce the most optimal results for a specific region.Moreover, the color information in the image overlap region often fails to conform to the simple models at different stages.As a result, neither of these two methodologies can achieve satisfactory results for images captured under complex imaging conditions.By utilizing the optimized low-resolution reference image, our approach exhibits excellent comprehensive capabilities and produces visually satisfactory outcomes.
To further visualize the effect of the proposed method, Figure 14 shows the experimental results obtained using sample images from the TAIHU dataset and regions A, B, and C.These images are selected from the areas marked in red boxes in Figures 9,11 ,12 ,and 13.From Figure 14, it can be seen that there are obvious stitching marks between the original images due to color differences.After processing with the BARN and GTA algorithms, there are still very obvious stitching seams between the images.Thanks to the edge smoothing strategy based on Gaussian bilateral filtering, our method effectively realizes the uniform color transition between images, and the overall color consistency performance is optimal.

Conclusion
In order to address the issue of existing algorithms being susceptible to cloud and shadow disturbances in reference images, we propose a color consistency optimization approach that utilizes optimized low-resolution reference images to enhance the quality of remote sensing images.Our methodology comprises three main steps.Firstly, we use Poisson blending to repair areas affected by clouds and cloud shadows, with good results even when using original images with radiation differences as patches.Secondly, we employ a boundary smoothing strategy based on Gaussian and bilateral filtering to achieve uniform color transitions between images.Finally, we use

Figure 1 .
Figure 1.The flow chart of the proposed method, with two blue modules highlighted as the major improvements.

Figure 2 .Figure 3 .
Figure 2.An example illustrating the influence of clouds and their shadows on the color correction results.The red box is used to indicate the area that contains clouds and cloud shadows.(c) the result is generated by the unimproved algorithm using texture information extracted from (a) and tonal information obtained from (b).

Figure 4 .
Figure 4.The diagrammatic sketch of an approach for repairing areas polluted by clouds and cloud shadows based on Poisson blending.The target image and the patch represent the low-resolution reference image and the original image to be color corrected, respectively.The goal is to restore the contaminated areas on the reference image using information from the original image.

Figure 5 .
Figure 5. Illustration of the Poisson equation's establishment and solving.The pixels a, b, c, and d are contaminated by the cloud and its shadow.f 1 is described as an unknown intensity function in the contaminated region.f 2 is defined as a known intensity function on the unpolluted area of image T.
Finally, we calculate the mean of pixels within each image block and use this value as the pixel value of the down-sampled image.Since the pixels located at the image boundaries have larger gradient values, H srcdown extracted directly from I srcdown by Gaussian filtering has larger values at the corresponding locations.This can lead to obvious splicing traces between the color-corrected images.

Figure 6 .
Figure6.An example is given to illustrate the effect of extracting high-frequency information from an image directly using Gaussian filtering on the results after color correction.

Figure 7 .
Figure 7. Input images (a); low-resolution reference images from the LocalSpace viewer software (b); (c) and (d) are images processed by methods without and with two optimization modules, respectively.The red boxes highlight areas with mosaic traces, clouds, or cloud shadows.The blue box highlights the areas with significant changes in ground features between (a) and (b).

Figure 8 .
Figure 8. Research data and areas (a) a schematic image of the study area; (b) original and reference images superimposed together; (c) a low-resolution reference image.
and L I a ; I b ð Þ are the structure, contrast, and luminance similarities between images I a and I b , respectively.

Figure 9 .
Figure 9.Comparison of the results of diverse methods on the TAIHU dataset.(a) input images; (b) and (c) results obtained by the BARN and GTA methodologies, respectively; (d) the result generated by our approach.To exhibit inconsistencies, the red boxes highlight areas with mosaic traces and color differences.

Figure 10 .
Figure 10.Experimental results on the CHINA dataset.(a) input images; (b) and (c) results corrected by the BARN and GTA methods, respectively; (d) the result generated by our approach.To accurately evaluate algorithms, we selected the results located in the A, B, and C regions from the CHINA dataset for quantitative analysis.
box.Although Figure 12(c) does not have an obvious uneven brightness distribution, the visible color transition between images is still present, as demonstrated in the red box.In summary, Figure 12(d) outperforms the other methods in terms of visual effects.In terms of quantitative evaluation, although Figure 12(c) performs better than Figure 12(d) in the CED indicator, Figure 12(d) achieves the best performance in the QCCED indicator.This indicates that the clarity of Figure 12(d) is considerably better than that of Figure 12(c).

Figure 11 .
Figure 11.Comparison of results obtained upon using images in area A. (a) input images; (b) and (c) results obtained by the BARN and GTA methodologies, respectively; (d) the result generated by our approach.To display inconsistencies, a pseudo-color scheme is utilized.The red boxes indicate a clear color difference, whereas the blue boxes indicate overexposed areas.

Figure 12 .
Figure 12.Experimental results obtained upon using images in area B. (a) input images; (b) and (c) results corrected by the BARN and GTA methodologies, respectively; (d) the result generated by our approach.A pseudo-color scheme is employed to depict the color discrepancies in the results.The red boxes indicate obvious color discontinuity, and the blue boxes highlight overexposed areas.

Figure 13 .
Figure 13.Experimental results obtained using images in area C. (a) input images; (b) and (c) results corrected by the BARN and GTA methodologies, respectively; (d) the result generated by our approach.To exhibit differences, a pseudo-color scheme is used to display the results.The red boxes represent areas with significant color discrepancies, whereas the blue boxes highlight underexposed regions.

Figure 14 .
Figure14.Experimental results obtained using sample images from the TAIHU and CHINA datasets.(a) from left to right, there are the original images from the TAIHU dataset, followed by the images generated by BARN, GTA, and our method; (b), (c), and (d) are images from regions A, B, and C, respectively; from top to bottom, there are the input images, followed by the images corrected by BARN, GTA, and our approach.

Table 1 .
Detailed and specific information about the datasets.

Table 2 .
Quantitative results of diverse methodologies on the TAIHU dataset.

Table 3 .
Quantitative results from diverse methodologies in three areas of the CHINA dataset.