Quantifying trade-offs in satellite hardware configurations using a super-resolution framework with realistic image degradation

ABSTRACT When designing and operating Earth observation satellites, trade-offs must often be made between different hardware components, operational considerations, etc… For example, considerations include how much of the mass budget should be spent on imaging lenses?, or how long should image exposure times be?. Resulting limitations in image resolution and quality can be partially compensated for using super-resolution (SR) techniques. However, deep SR networks recently applied to satellite imagery are often trained and tested on data that lacks the typical image degrading noise present in real satellite images. In this work, we combine a method for generating realistically degraded satellite images with a deep SR network, in the context of different satellite hardware configurations and geographical types. We use this framework to assess deep SR performance given realistic remote sensing payloads across different terrain types, by evaluating payload- and terrain-dependent SR performance in reconstructing realistically degraded images. The framework allows us to model the effect of alternative satellite hardware configurations on resulting SR image quality, providing insight into optimal satellite operations and payload design in the context of SR-based image quality enhancements.


Introduction
Earth observation satellites have stringent restrictions on their payloads and operational activities.Satellite manufacturers must consider how to construct the best image capturing system possible given restrictive mass and cost considerations (Curry, La Tour, and Slagowski 2015).For example, larger lenses can obtain higher quality images, but are more massive and generally more expensive.Operational considerations include how long exposure times should be used to enhance the signal-to-noise ratio without spending too long on a given target.If satellite image quality can be enhanced in post-processing beyond the quality limitations dictated by the hardware, this has obvious benefits and implications for satellite design and operations.
One way to enhance image quality in post-processing is to leverage modern deep Super-Resolution (SR) networks.SR generates a High-Resolution (HR) output from a Low-Resolution (LR) input, with networks typically trained on LR/HR pairs.With deep SR networks becoming standard as a lightweight approach to image enhancement (Wang, Bayram, and Sertel 2022), satellite payload design that optimally accommodates these networks is essential to the future of remote sensing.Here we develop a technique that uses the quality of super-resolved images under different image degradation conditions to provide insight into the optimal design of satellite payloads.
Deep SR networks applied to satellite imagery have often been trained and tested on data that lacks realistic image degrading noise present in real satellite images, often using bicubic downsampling on HR images to generate LR-HR image pairs.However, recent deep SR studies have begun to incorporate more realistic image degradation procedures.Bell-Kligler et al. proposed a new method of using an unsupervised generative adversarial network (GAN) to estimate a realistic image blur kernel directly from an image (Bell-Kligler, Assaf, and Michal 2019).Park et al. extended this to improve blur kernel estimation for larger or more anisotropic kernels (Park, Kim, and Gi Kang 2023).Ji et al. combined the Bell-Kligler et al. blur kernel estimation method with realistic noise injection (Xiaozhong et al. 2020).Their realistic degradation model was later adopted by Zhang et al. in the context of remote sensing imagery (Zhang et al. 2022), who found that a more realistic degradation model has a substantive effect on SR image reproduction quality.In this work, we specify a realistic degradation model which has the same key operations of degradation models from previous studies -i.e., blurring, downsampling, and noisehowever our model is explicitly parameterized according to the important physical properties of satellite imaging systems.This allows us to adjust satellite design parameters via the model and explore subsequent effects on deep SR image reconstruction quality.Generating LR images from a physically motivated model in this way has the limitation that we do not expect to estimate the degradation noise or blur kernel as accurately as methods that estimate these properties directly from each image.However, our focus in this study is not to produce the highest possible SR image quality, but to gain insight into satellite imaging systems in the context of SR image reconstruction.
In this study, we specify a pipeline for generating realistically degraded satellite images using a model with modifiable parameters that describe satellite hardware constraints.We then combine this realistic image degradation model with deep SR, in order to simulate the effect that different satellite hardware configurations have on the ability to super-resolve a set of remote sensing images.Specifically, we apply realistic degradation to remote sensing images from the National Agricultural Imagery Program (NAIP) (see Figure 1) and super resolve them using the Enhanced Deep Super-Resolution Network (EDSR) (Lim et al. 2017).We then use standard Image Quality Assessment (IQA) metrics to assess super resolution performance, analysing the recovered signal-to-noise in response to image degrading effects.Our assessment explores the optimal design of remote sensing payloads for SR image reconstruction on satellite image data, with the goal of identifying potential trade-offs between image resolution and cost, thus informing the future construction of remote sensing devices.Finally, we explore a range of terrain image types (see Figure 1) to assess the versatility and robustness of deep SR reconstruction performance on the diverse sets of visual features present in Earth Observation (EO) data.These contributions form the basis for an analysis framework of SR DNNs for remote sensing, using a model of realistic image degradation with modifiable satellite hardware parameters.This framework enables an informed and interpretable approach to SRfocused satellite payload design.

Method
We explore the performance of SR techniques on recovering image resolution following realistic, and non-linear, degradation steps on original NAIP 60 cm imagery.The NAIP images were selected by eye to feature a broad geographic range across the U.S.A. states of Utah, Kentucky and Massachusetts; qualitatively labelled as Beach, Forest, Rural, Rural with Urban, and Urban (Figure 1).After identifying a single high-resolution image of each geographic type, five crops within each image were selected by hand to incorporate terrain consistent with the given geographic type.The cropped images (five for each geographical type) were used as input to the image processing pipeline.
As shown in Figure 2, the pipeline first systematically degrades each cropped image via a set of realistic degradation parameters.In satellite imaging, SR image reconstruction is limited fundamentally by three main factors: quality of the satellite's focus instrument (ie.mirror), physical spacing between the individual cells within the Charge-Coupled Device (CCD), and available exposure time.Therefore, the degradation parameters we explore are the Ground Resolved Distance (GRD; effectively optical resolution of the satellite), Ground Sampling Distance (GSD; essentially decreasing the pixel density of the camera) and Signal-to-Noise Ratio (SNR) levels (representing a change in exposure time).This process is explained in Section 2.1.The second stage of our pipeline then super resolves degraded images using a deep SR network.In this study, we chose to assess the EDSR network (Lim et al. 2017) at each of its upscaling resolutions; �2, �3, �4 (an overview of the network is detailed in Section 2.2).EDSR was selected for this study as it was the highest performing general, deep SR network at the time of writing and featured multiple levels of magnification for assessment.Finally, we assess SR recovery performance of the resulting images using two standard image quality metrics; SSIM and PSNR (see Section 2.3).A schematic of the conceptual framework of our study is shown in Figure 3. Source code from our study is available upon request.

Image degradation pipeline
The degradation pipeline takes a single image and systematically adds noise according to three input parameters that relate to satellite hardware configurations or operations (GRD, GSD, and SNR).Zhang et al. recently developed a realistic image degradation model applied to satellite imagery (Zhang et al. 2022).Our degradation model is conceptually similar to theirs, however our model explicitly includes parameters that correspond to the aforementioned physical properties of the satellite hardware (GRD, GSD, and SNR), allowing us to explore satellite hardware design trade-offs in the context of deep SR image enhancement.Our degradation algorithm augments the  (2, 3, and 4).These degraded images are then super-resolved using the EDSR network and compared with the original cropped image.To simulate optical degradation, we take each cropped image obtained from NAIP imagery, I orig , and convolve it with the Point Spread Function (PSF) of the system.The PSF was generated by the inverse Fourier transform of the Modulation Transfer Function (MTF), simulated by the auto-correlation of the pupil function.An occluded aperture with inner-to-outer diameter ratio of 0:4 was assumed for the pupil.This PSF is scaled with a central MTF padded to match the underlying GRD according to where aperture diameter D ¼ NλH has N steps sampled around the fiducial height H ¼ 500km (distance to subject) and central wavelength λ ¼ 560nm.A 2D convolution (�) of the PSF is applied to the three colour channels of the original image I orig separately to produce the blurred image I blur ¼ I orig � PSF.This blurred image is resized to the ratio of the original GSD (set to be 1m) and sensor GSD sensor to produce an effective sensor image I sensor as an intermediary step.We simulate a varying signal-to-noise by sampling from a Poisson distribution for each pixel, with mean μ given by the sensor image's pixel intensity and an 8-bit accuracy for the NAIP, set by where the equivalent well capacity W;2 � ðSNR50Þ 2 .The resulting pixel intensity at each cell is normalized by W � 2 16 to give the sampled image I sample .Finally, this image is further downsampled to the desired GSD for comparison by resizing with the ratio GSD sensor /GSD product , where GSD sensor is 3m.This process generates a set of images that capture the spectrum of optical noise over the trade-space.
A given combination of GSD, GRD, and SNR50 thus provides a degraded image that is physically linked to underlying properties of the satellite imaging system.Quantifying SR reconstruction image quality after this degradation process will then allow satellite design and operational parameters to be assessed.

Deep super resolution
For the SR component of our assessment pipeline, we elected to use the single image, deep SR network EDSR (Lim et al. 2017), which has a range of accurate upscaling capabilities.This architecture was pre-trained to upscale input images at �2, �3 and �4 their original resolution.EDSR was trained on the DIV2K high-resolution image data set (Timofte et al. 2017) with the �2 network trained from scratch, and the �3 and �4 further optimized using the �2 as a baseline through transfer learning.We clarify that no training was undertaken in this study as our intention is to explore the applicability of pretrained SR networks to realistically degraded remote sensing image data.We focus on assessing a single SR network as our assessment is generalizable to any SR network that intakes an LR image and outputs an HR reconstruction.This is because our focus in this work is to provide a model-agnostic explainability framework for remote sensing researchers to assess SR networks for satellite payload design, not to assess the relative performance of different SR networks.
The EDSR architecture is a DNN containing residual blocks of similar network layers.Each block contains two convolution layers with a ReLU activation functions.The model we employed had 32 layers with each convolution layer containing 64 feature maps.The residual blocks feed into several upscaling layers of the network that progressively increase the dimensionality of the input image.The number of upscaling layers is dependent on the upscaling factor, with more layers added from �2 to �4.

Image quality assessment
As is customary in the SR literature, we use the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) IQA metrics to quantify the upscaled image quality (Zhihao, Chen, and Hoi 2020).The PSNR/SSIM metrics are each calculated for the (degraded and then) super-resolved output image relative to the unmodified groundtruth input image.This allows us to assess the image reconstruction performance of a given SR technique in the context of the many different physically-motivated degradation parameter values we investigate in this work (i.e., different combinations of GSD, GRD, and SNR50 values).Further IQA metrics were also tested in this study (e.g., ERGAS, UIQI) but were found to have too little dispersion in results to draw meaningful conclusions from, compared to the richer SSIM and PSNR metrics.

Results
We investigated the PSNR & SSIM distributions of the super-resolved images relative to the original cropped images.These original images were first degraded as described in Section 2.1, and then super-resolved as described in Section 2.2.An overview of the degraded crops, and the output from the SR pipeline, is demonstrated in Figure 4.The visible recovery of cars from images degraded even by a factor 4 in GSD (at high SNR50 values; bottom row of c) is a qualitative demonstration of successful information recovery via EDSR.Supplementary images of degraded and super-resolved images for all terrain types and crop numbers are available at this https://smpetrie.github.io/superres/GitHublink.
We assess image reconstruction performance by grouping the output of each individual scaling factor used in the SR pipeline (�2, �3, �4) and geographical subset.We consider all SNR50 values associated within each subset and create a box-plot over these values for each metric.The PSNR (or SSIM) distribution can be seen in the left (or right) column of Figure 5.

Variability in image reconstruction quality, measured via PSNR
As can be seen in Figure 5a there is little change between the PSNR metric for different geographical types or GSD scaling values, and little spread in PSNR values across the range of SNR50 considered.Thus, if we were to use the PSNR metric to assess deep SR reconstruction performance, we would find little variability in reconstruction quality between geographical and GSD subsets and SNR50 values.However, PSNR only depends on relative pixel intensity and does not provide information on the likely information loss, which should not only depend on signal-to-noise but also on the relationship between the GSD (the on the ground distance between pixels) and the GRD (the minimum physical size that can be resolved by the satellite).As a result of the negligible change in PSNR values across this trade-space, we argue it makes for a poor IQA for our investigation, and we do not consider it further in our analysis.

Variability in image reconstruction quality, measured via SSIM
Unlike the near-invariant PSNR distribution in Section 3.1, the computed SSIM values span almost half of the possible range as shown in the right column of Figure 5.The median values associated with each geographical label decrease modestly with increasing GSD values, but with a large spread within each sample.We also note that the SSIM values show greater variance than the PSNR distributions in Figure 5. Thus, SSIM displays a greater capacity than PSNR for exploring variability in SR image reconstruction quality given different terrain types and satellite hardware constraints.In order to explore the SSIM distribution associated with each geographical label, we look at the individual image reconstruction performance for each of the five cropped images within each subset.We compare the SSIM distribution across SNR50 values for a given geography and Crop ID in Figure 6, with each boxplot pair showing GSD scaling by �2 (left, light blue) and �4 (right, dark blue).GSD scaling appears to have more of an effect on resulting SR-based image reconstruction quality for some geographical types than others.For example, the Urban SSIM distributions are quite similar for GSD values of 1.2 (�2) and 2.4 (�4), for all 5 crops.In contrast, the Rural & Urban SSIM distributions have higher means and larger dispersion for GSD = 1.2 (�2) relative to GSD = 2.4 (�4), for all 5 crops.
We next explore the relative performance of GSD and GRD as a function of SNR50.In Figure 7 we see that, for a given SNR, the recovery performance of the SR technique as measured by SSIM depends more prominently on GRD than GSD.The SNR50 also shows diminishing returns; i.e., comparing SNR50 ¼ 10 to 50 (7a to 7b) the SSIM approximately doubles for all GSD/GRD combinations, but there is little increase in SSIM from SNR50 ¼ 50 to 100 (7b to 7c).This suggests the signalto-noise ratio could effectively be halved with negligible reduction in satellite performance in the context of deep SR image reconstruction.In addition, it is plausible that a custom trained EDSR network could also increase the SSIM value at which this plateau occurs, or even further increase the reconstruction quality of images with SNR50 > 50.
To explore whether our results would differ with a deep SR network trained on remote sensing data, we also ran our entire degradation-SR pipeline with https:// github.com/Shaosifan/TransENet/TransENet,a CNN-and transformer-based network trained on remote sensing imagery (Lei, Shi, and Mo 2022).We found almost identical results to those obtained using the EDSR network, and therefore do not show the results here.On the supplementary https://smpetrie.github.io/superres/GitHub page for this study, we have included reproductions of Figure 6 for the two models as Figures S1 (EDSR

Conclusion
In this paper, we developed a method to combine realistically degraded satellite image degradation with deep SR in the context of modifiable satellite hardware configurations and different geographical types.The resulting framework allowed us to model how modifications to satellite hardware configurations affect resulting image quality.This enhanced the interpretability of these deep neural networks and provided insight into optimal satellite payloads.In particular, we explored the performance of a deep SR network in super-resolving data that has been realistically degraded via multiple parameters relevant for satellite hardware and operations: the GSD, the GRD and the overall signal-to-noise of the image (i.e., exposure time).
In evaluating the network with standard IQA metrics, we found that PSNR (along with UIQI and ERGAS) is essentially insensitive to modifications and unsuited to this kind of study, unlike SSIM whose greater variability allowed for greater interpretability.Analyses with SSIM indicated that SR image recovery performance is more strongly associated with GRD variations than GSD variations, suggesting for satellite payload design that the quality of focusing optics is more important than physical pixel CCD size.
The deep SR network had a plateauing response to the signal-to-noise, with the greatest improvement seen as SNR was increased from low to medium values (SNR50 ¼ 10 to 50), but super-resolving performance was mostly insensitive to further signal increase.Thus, it appears that satellites may be able to dramatically reduce exposure times if an SR network is employed thereafter to remove the added noise.

Figure 2 .
Figure 2. Example schematic of an image crop of the urban geography type with systematic degradation in GSD, GRD and SNR space applied to three different resolution degradation factors(2, 3, and 4).These degraded images are then super-resolved using the EDSR network and compared with the original cropped image.

Figure 3 .
Figure 3. Conceptual framework for this study.The framework allows satellite design trade-offs to be assessed in the context of deep SR pipelines.

Figure 4 .
Figure 4. Example performance of the EDSR pipeline for the Urban geographical type, zoomed in a factor 20 on a car park.Degraded images (I) are shown for a given GSD/SNR combination together with the associated SR pipeline output (O) for �2, �3 and �4 degradation (a, b and c, respectively).

Figure 5 .
Figure 5. Box plots demonstrating the recovery performance of the SR technique for various geographical types as assessed by the PSNR and SSIM metrics (a and b), with increasingly degraded GSD factors from top to bottom.The individual box plots are over the full SNR50 range, with median (orange line), quartile (box), and min/max values (whiskers).

Figure 6 .
Figure 6.Similar to the Figure 5 SSIM box plot, but now exploring the individual cropped patches within each geographical type (see Figure 1).Each boxplot pair indicates GSD scaling by �2 (left boxplot, light blue) and �4 (right boxplot, dark blue).

Figure 7 .
Figure 7. SSIM heat-maps exploring the recovery metric averaged across all geographical types and crops, across several GSD and GRD values, and across SNR50 values of 10, 50 and 100 (a, b and c, respectively).
architectures to super-resolve satellite images have surpassed the performance of traditional SR techniques.Research exploring the nuances of remote sensing training data to enhance the performance of these DNNs is less common.Examples of deep SR networks that leverage domain-specific training include the Deep Distillation Recursive Network (DDRN)