Let the loss impartial: a hierarchical unbiased loss for small object segmentation in high-resolution remote sensing images

ABSTRACT The progress in optical remote sensing technology presents both a possibility and challenge for small object segmentation task. However, the gap between human vision cognition and machine behavior still poses an inherent constrains to the interpretation of small but key objects in large-scale remote sensing scenes. This paper summarizes this gap as a bias of the machine against small object segmentation task, called scale-induced bias. The scale-induced bias causes the degradation in the performance of conventional remote sensing image segmentation methods. Therefore, this paper applies a straightforward but innovative insight to mitigate the scale-induced bias. Specifically, we propose a universal impartial loss, which leverages the hierarchical approach to alleviate two sub-problems separately. The pixel-level statistical methodology is applied to remove the bias between the background and small objects, and an emendation vector is introduced to alleviate the bias between small object categories. Extensive experiments explicitly manifest that our method is fully compatible with the existing segmentation structures, armed with the hierarchical unbiased loss, these structures will achieve satisfactory improvement. The proposed method is validated on two benchmark remote sensing image datasets, where it achieved a competitive performance and could narrow the gap between the human vision cognition and machine behavior.


Introduction
With the advancement of optical remote sensing data capture technology, numerous high-resolution remote sensing images (HRIs) are now being obtained from both satellite and airborne platforms (Mi & Chen, 2020;Tao et al., 2022).Higher resolution also means that humans and machines can perceive detailed feature in large-scale HRI, which presents both a challenge and an opportunity for the small object segmentation (SOS).SOS is a widely followed task in the remote sensing interpretation community, and it works to automatically extract key but small objects in large-scale remote sensing scenes.In recent years, extensive efforts (He et al., 2016;Long et al., 2015;Ronneberger et al., 2015) have been made to explore HRI segmentation and achieved breakthrough improvement.However, new challenges arise when the objects to be recognized and segmented are small.
Accurately segmenting small objects is an unusual task, which is due to the obvious gap between machines and human in perceiving small objects in large-scale remote sensing scenes.For example, when humans intentionally search for key objects in large-scale remote sensing scenes, such as aircrafts in airport or ships in port, the human visual perception system will selectively ignore background information and focus on these small but key objects.However, machines lack some of the inductive constraints, which leads directly to the inherent gap between the human cognition and machine behavior in solving SOS task.This inherent gap makes the SOS task inevitably suffer from performance bottlenecks.Specifically, as can be observed from the large-scale remote sensing scenes, the background usually includes more pixels than small objects (Li et al., 2021;Ma et al., 2022;Segl & Kaufmann, 2001).As presented in Figure 1, we believe that the scaleinduced bias has two levels of bias on the SOS task, the first is the bias between the background and the small object category, shown in Figure 1(a), and the second is the bias between the small object categories, shown in Figure 1(b).
When deep learning-based segmentation networks are joint with the conventional training approach to optimize the overall loss of the model, it is possible for the optimizer to derive small errors for pixels which belong to background category (Guo et al., 2019, Li, Huang et al., 2021;Rabbi et al., 2020).However, it is more crucial to minimize the errors for pixels which belong to small object category, and the segmentation result of small objects usually unsatisfactory, as small objects always contribute less to the overall segmentation loss.Moreover, we notice that small objects are sensitive to their surroundings (Guo et al., 2019), and the scale-induced bias could lead to severe performance degradation.
An ordinary strategy towards optimizing the segmentation result of small objects is to enhance the feature representation capability of the model.Fueled by the success of deep learning, many groups (Geiss et al., 2022;Kemker et al., 2018;Zhang, Wang et al., 2022) are dedicated to designing various robust and effective networks and these attempts have achieved satisfactory profit.However, these methods are excessively dependent on hardware resources and lack generalization ability.The stacked network structures are slightly weak in narrowing the gap between human perception and machine behavior.Notably, the ideal method to alleviate the scale-induced bias is to leverage the available data and models to better the training output.We think that the way of optimizing the training process, i.e. in the course of loss convergence, the weight is adjusted to optimize the output of small object categories, is also a feasible and effective resolution.
Inspired by the above discussion and analysis, this paper intuitively and empirically proposes a novel but simple insight to mitigate the scaleinduced bias.Specifically, we introduce a hierarchical unbiased loss function (HU-Loss) to tune the weights of each category in the training process.In HU-Loss, we apply the principal component analysis (PCA) method (Huang et al., 2022;Sabzi et al., 2013) to remove the bias between the background and small object categories, and then obtain the initial weight of each category in the loss function.PCA method can find feature representations in the data that are easier for people to understand and speed up the processing of valuable information.These features can highlight small object representation in large-scale remote sensing image data, and thus effectively alleviate the bias between the background and small object categories.Considering the bias between the small object categories, we further propose an emendation vector to alleviate the imbalance between each small object category.To sum up, our primary contributions can be summarized as follows: (1) We rethink the constraints and specialness of SOS task.The inherent gap between human vision cognition and machine behavior still arises challenges for SOS task.Not limited to designing stacked network structures, optimizing the training process is also an ideal way to improve segmentation accuracy, which sheds light on future works.(2) We tune the training process by optimizing the loss function.The proposed unbiased loss function utilizes a hierarchical approach to alleviate scale-induced bias, which can prevent the performance of the SOS task from collapsing.(3) We validate the proposed hierarchical unbiased loss function on some high-profile deep learning-based segmentation networks.Fortunately, compared to the baseline, these structures achieved an uplifting improvement when combined with the proposed loss function.

Semantic segmentation
Semantic segmentation is a foundational task in the computer vision community, and this task is devoted to labelling each pixel within the images according to a set of pre-given categories (Chen et al., 2022;Yuan et al., 2013).The conventional statistical pattern recognition approaches (Tuia et al., 2012;Yi et al., 2012;Zhang et al., 2014) are able to utilize low-level features to generate semantic representations.However, these methods are sensitive to artificial characteristics.In recent years, tremendous advancement has been made benefited from the convolutional neural network (CNN).The fully convolutional network (FCN) adapts to dense prediction task by replacing the convolutional layers with fully connected ones, and the FCN and its extensions (Long et al., 2015;Schuegraf & Bittner, 2019;Tian et al., 2021) represent a milestone breakthrough in this filed.Multiple network designs have brought some delicate techniques, for instance, skip connection (Li, Zheng, Duan et al., 2022;Ronneberger et al., 2015), atrous convolution (Chen et al., 2017(Chen et al., , 2018;;Zhao et al., 2017), attention mechanism (He et al., 2022;Vaswani et al., 2017;Woo et al., 2018), and encoderdecoder structure (Chen et al., 2021;Long et al., 2015;Wang et al., 2021).In this paper, we also consider the generalization ability of the proposed method with respect to some basic semantic segmentation networks.
In recent years, Transformer has demonstrated its great potential in global information modelling (Song et al., 2023;Wang et al., 2023).In SOS task, small objects are insensitive to global features due to the tiny size of objects and the weak correlation between small objects.Therefore, local features have a greater influence on small objects.

Small object segmentation
The development of satellite and sensor imaging technology makes it possible to capture high resolution remote sensing images.These images contain clearer and more accurate information about ground objects, which poses a feasibility to SOS task in large-scale remote sensing scene (Chong et al., 2022;Neupane et al., 2021).Compared with the conventional segmentation task, SOS task is more likely to result in object identification problems.Many attempts in this field are, directly utilized or modified from available CNN structures, without taking into account the intricacy of remote sensing data and SOS task.Therefore, the restricted view limits the performance of SOS task to a certain extent.To alleviate this issue, extensive efforts have been made to design novel network architectures.For example, Chong et al. (2022) introduced that the small objects have a greater chance of being completely obscured by masks.In order to distinguish these small objects and refine their boundary, a context union edge network was proposed, and harvested state-of-the-art results.Ma et al. (2022)

Methodology
Existing advanced network designs over-reliance on hardware resources and lack of generalization capability, which results in the weakness in resolving scaleinduced bias.The proposed HU-Loss will effectively mitigate the scale-induced bias and provide better generalization capability.In this section, we introduce the hierarchical idea and the new adapted loss function in Section 3.1 and Section 3.2, respectively.

Hierarchical solution
As shown in Figure 2, the scale-induced bias consists of two layers, which presents new challenges for solutions.There are huge challenges and restrictions in solving multi-level biases simultaneously, such as the same operation is not necessarily applicable to different sample scales and each of the two biases faces a different scale of problems.
Fortunately, the hierarchical solution will address exactly two layers of bias.The hierarchical idea decomposes the intricate work and then assigns subparts to each specialized unit, which allows layers to be independent of each other, and changes in one layer do not affect the other (An et al., 2020;Bongiorno et al., 2022;Kuang et al., 2021).Specifically, we divide the whole system into two subsystems, which are used to remove biases at different levels.One subsystem is dedicated to the elimination of the bias between the background and the small object category, and the bias between the small object categories will be tackled by another one.In Section 3.2, we will detail the proposed loss function based on the above hierarchical solution.

HU-loss function
The cross-entropy (CE) is derived in the field of machine learning as the difference between the predicted probability distribution generated in training and the true probability distribution (Abdollahi et al., 2021;Abid et al., 2021).However, the SOS task raises new challenges to the conventional CE loss function.The traditional CE loss function can equally treat each category within a large-scale remote sensing scene, which may lead the optimizer to derive small errors for pixels which belong to background category.
The proposed HU-Loss function gives weights to each category, which is able to alleviate the scaleinduced bias.The proposed Loss hu can be derived as where 1≤ c ≤ n denotes the total category number, p c ∈[0,1] is the network's estimated probability for category c with label y c = 1, and y c ∈{0,1} is a symbolic function that designates the ground-truth class.w c represents the weight of category c.As presented in Figure 3 (a) is the optimization process using CE Loss, and (b) is the process using HU-Loss.Although they both achieve convergence, (a) ignores the small but key object A. Obviously, it can be seen that the convergence of the loss in (a) is  mainly due to the fact that large-scale white background regions are identified; however, the key object A contributes less to the overall loss.In fact, (a) is extremely hard to capture small objects, which posed a great challenge for SOS task.In contrast, following the introduction of the HU-Loss in (b), the misjudgment of the small object A can be directly reflected by the elevated loss value, which further enables the optimizer to improve the optimization output.In the 3rd, 4th and 5th steps of the optimization process, if small object A is overlooked, it can cause fluctuations in the loss regression process, which enables the optimizer to adjust accordingly in the next stage.By doing so, armed with the proposed HU-Loss, these models are able to pay more attention to the small objects.
An overview of the workflow for calculating the hierarchical unbiased weight w c is shown in Figure 2. First, we make pixel-level statistics for the training samples and the statistics M can be used as input for the first layer.The first layer is utilized to eliminate the bias between the background and the small object category.In this layer, the PCA method is employed to analyze M and obtain initial weight w′ s and w′ b .In our model, we consider the different categories as different feature dimensions, and PCA-based method can analyze and define the weight value of each feature dimension.w′ s , w′ b and M can further serve as the input of the second layer.
The output of the first layer can be defined as a set W′ = {w′ 1 , . . ., w′ n }, and the n th category is the background.We determine w′ s and w′ b according to (2) In the second layer, we propose an emendation vector Ψ to remove the bias between the small object categories and obtain the final weight w c .The statistical result of M in the second layer can be expressed as where λ(Δ) represents the statistical process, and p c is the proportion of category c in all small object categories.One of the primary operations is to define Ψ, and we derive ψ c as where ψ c is the emendation component of Ψ for each small object category, and θ(Δ) means to sum the elements in the set.The output of the second layer can be defined as a set W = {w 1 , . . ., w n }, and we determine w c according to The final weights, w 1 , . . ., w n , can be combined with equation (1) to determine the HU-Loss function.
Higher entropy results in higher uncertainty of the output of the network, indicating that the network does not learn well.In contrast, lower entropy results in lower uncertainty of the network output and a more accurate estimation.After assigning weights to each category, the training process can also pay more attention to small objects in the process of reducing loss.

Dataset and evaluation index
In order to assess the effectiveness of the proposed HU-Loss, the validation is conducted with two benchmark large-scale remote sensing scenes datasets.

iSAID
This dataset was revised from a large-scale detection dataset, and it is dedicated to the small object semantic segmentation task.The iSAID dataset (Ma et al., 2022) uses the pictures in the DOTA dataset (Xia et al., 2018) for pixel-level annotation, which corrects the labeling errors in the DOTA dataset.This dataset fully reflects the common features and scale distribution differences in remote sensing images.The size of the images ranges from 12,029 × 5014 to 455 × 387.Due to the limitation of cache memory, we divide the original images into images with size of 512 × 512 during training.

ISPRS Vaihingen
The ISPRS Vaihingen dataset has 33 images, each with three bands, corresponding to near-infrared, red, and green wavelengths, and the ground sampling is 9 cm (Zhao et al., 2021).This dataset was captured by the airborne camera and was previously used for HRI semantic segmentation task.After further processing, houses and vehicles were considered small objects, while vegetation and road surfaces were considered backgrounds.
Accuracy and efficiency are significant indices for the evaluation of SOS task.Therefore, OA, IoU and mIoU are selected as objective indexes (Li, Zheng, Zhang et al., 2022;Zhang, Jiang et al., 2022).OA represents the proportion of all pixels in a prediction map that can match the corresponding category in the ground truth, and intersection over union (IoU) is the standard index of the semantic segmentation.mIoU is the mean of IoU over all categories, which can be calculated as where x cd is the number of pixels of category c predicted as category d, and n is the number of categories.
However, the mIoU is not fully suitable for all SOS task, because the inherent bias also has some effect on evaluation metrics, the simple categories, i.e. the background, tend to mislead the overall evaluation, and we prefer to accurately evaluate the result of SOS task.Motivated by this, this paper experimentally proposes a new evaluation metric, called small object unbiased intersection over union (sIoU), which is dedicated to assessing the result of SOS.In sIoU, we ignore the background to calculate only the small object category.However, the bias within the small objects still exists, so we introduce the emendation vector Ψ again, i.e, Eq. 5.The sIoU is expressed as

Comparison experiments iSAID
In this section, we employ some popular segmentation networks (e.g.U-Net (Ronneberger et al., 2015), FCN8s (Long et al., 2015), PSPNet (Zhao et al., 2017), and Segmenter (Strudel et al., 2021)) to validate the effectiveness and generalization ability of the proposed HU-Loss function.Under the premise of the same network structure, this paper will compare the experimental results generated by adopting the conventional CE loss function and the proposed HU-Loss function in training, respectively.Furthermore, the selected networks are all based on different backbones, which can explicitly illustrate that the proposed HU-Loss has outstanding robustness and the generalization ability.The actual segmentation results in six categories (i.e.background, small vehicle, large vehicle, plane, helicopter and ship).
In order to facilitate the expression, the abbreviations are as follows: BG-Background, SV-Small Vehicle, LV-Large Vehicle, HL-Helicopter.
The results of the quantitative analysis of different models are presented in Table 1.As shown in Table 1, directly employing existing networks for the small object segmentation task will obtain unsatisfactory results.The initial input of U-Net is a 3-channel image with four scale depths.ResNet and VGG backbone are selected in FCN8s and PSPNet, respectively.ResNet backbone has better accuracy and speed of inference, while VGG has a simpler structure, and the combination of several small filters is superior to one large filter.U-Net achieves worst result, which is 11.63% lower than the baseline with HU-Loss.Although the FCN8s performs pretty well, the FCN8s with HU-Loss achieves gains of 11.29% in mIoU.It is noticed that the gains of the PSPNet in mIoU is slightly less profitable than the other two networks, but the gains of 4.02% in mIoU are a satisfactory result.Notably, in terms of OA, the addition of HU-Loss brings improvement.However, in PSPNet, the baseline with HU-Loss does not achieve gains due to some categories being adversely affected.The performance of different models in each category is listed in Table 2. U-Net performs poorly due to its shallow layers, especially the helicopter category.However, the U-Net with HU-Loss achieves relatively good results, the gains of 29.11% in the helicopter category are an inspiring result.The FCN8s with HU-Loss compared to FCN8s is a pretty comprehensive improvement.It can be seen that FCN8s has improved in all small object categories after combining with HU-Loss.The PSPNet with HU-loss has also made some progress, but it suffers some adverse effects in large and small vehicle categories after binding HU-Loss.The expansion of the receptive field will make the network fail in perceiving some small objects, which can cause these objects are difficult to be distinguished in large-scale remote sensing scene.This issue will be a part of our future work.In SOS task, small objects are insensitive to global features due to the tiny size of objects and the weak correlation between small objects.Therefore, local features have greater influence on small objects, which leads to unsatisfactory effect of Transformers-based models for this task.
To intuitively present the differences with or without HU-Loss, we, respectively, select two typical results from airport scene and harbor scene.The results of the qualitative analysis are shown in Figures 4 and 5.
As shown in Figure 4, it can be seen that small objects are separately distributed.The plane category is relatively easy to recognize, and this segmentation network with HU-Loss achieves better and more refined performance.However, the small and large vehicle categories are difficult to distinguish, which leads to severe performance degradation.It is noticed that the small and large vehicles are very small in the large-scale remote sensing scene and often overlooked by conventional methods.With the adoption of HU-Loss, the recognition of these small objects can be improved.Furthermore, we further discuss the detailed a challenge, since helicopters are scarce in this scene.The U-Net directly misjudges all helicopters.Fortunately, the U-Net with HU-Loss identifies the helicopter within this scene, which explicitly shows that HU-Loss is a significant refinement to the conventional approach.
As shown in Figure 5, we present the segmentation results in harbor scene.Unlike planes, ships are more diverse in appearance, and the distribution of these ships is irregular, which arises new challenges to the segmentation of ships.FCN8s misclassifies the ships into large vehicles because the ships and large vehicles have similar colors.The FCN8s with HU-Loss can accurately segment the ships from the large-scale remote sensing scene, and HU-Loss plays a key role in it.The ship category is scarce in the training samples, which reveals the reality that the plentiful biased samples mislead the model optimization.However, the U-Net with HU-Loss achieves the relatively poor performance, which will be investigated as part of our future work.

ISPRS vaihingen
In previous comparison experiments on iSAID dataset, the conventional structures with HU-Loss are able to recognize tricky small objects, while the baseline networks are hardly able to do so.Further, due to the different properties of each remote sensing dataset, resolution, band combinations, etc., some baseline networks may have made more satisfactory output on the dataset, but there is still room for improvement.Not limited to substantial improvement of experimental results, further refinement of experimental outputs that has a better foundation is also one of the important manifestations of the generalization ability of the method.Therefore, we further conduct experiments on the ISPRS Vaihingen dataset due to its popularity, the categories and band combinations.
Tables 3 and 4 present the results of the quantitative analysis on Vaihingen dataset.From an overall perspective, these structures are used for the comparison experiments all exhibited different degrees of improvement after combining with HU-Loss.U-Net and FCN8s are improved in all indexes after combining HU-Loss, where FCN8s gains 2.26% in sIoU, and U-Net gains 8.81% in sIoU, which is a considerable experimental output.After combining PSPNet with HU-Loss, there is a satisfactory improvement in both mIoU and sIoU.However, it fluctuates slightly in OA, which is still due to the effect of the atrous convolution on small objects.The performance of segmenter is slightly lower than that of other methods.As mentioned in Section 4.2.1, global features and local features have different impacts on SOS tasks.Segmenter w/HU-Loss still made considerable improvements in the car category due to their tiny size.How to overcome the effect of the atrous convolution and the Transformers-based methods on small objects will be a problem that we need to continue to study in the future.
Specifically, as shown in Table 4, it is noticed that almost each category has been satisfactorily enhanced.In particular, the vehicle category gains the most significant enhancement, due to the fact that the proposed method allowed the model to give the vehicles more attention during training.U-Net with HU-Loss gains 9.17% in vehicle category, which is an uplifting improvement.However, after PSPNet is combined with HU-Loss, the background category performs poorly, which will be a part of our future work.
As presented in Figure 6, we show the qualitative results on ISPRS Vaihingen.Overall, the individual models are able to identify most of the houses and vehicles.There are many cases of misclassification of houses in FCN8s and PSPNet.Where in (g) and (h), the baseline network, when combined with HU-Loss, allows for more accurate differentiation of the houses, as marked by the yellow boxes.Meanwhile, as marked by the yellow boxes in (c), (d), (e) and (f), FCN8s and U-Net are more accurate in distinguishing tiny objects in large-scale remote sensing scenarios.These tiny objects, i.e. vehicles, though small, are indeed the key objects for the SOS task.Through qualitative analysis, we can know that our proposed method is effective.
In summary, for the original experimental output with a better foundation, the proposed HU-Loss is still able to provide continuous refinement, which is important for the further improvement of the accuracy of SOS task.

The analysis of small objects and background division effects
In the previous analysis, due to the uniqueness of SOS task, the key to optimizing the output of SOS task is to make the model focus more on the small objects rather than the background, which will provide an effective mimicry to match the human visual sensitivity to small objects.To verify the contribution of HU-Loss to such imitation, we analyze the output features of FCN8s for the port scene as an example.Specifically, we select the small object channels in the output feature map for visualization, and in the visualization result, the brighter the area indicates the higher probability of belonging to the small objects.On the contrary, the darker the area is, the lower the probability.The visualization results are shown in Figure 7.
It is obvious to see that the overall result is much improved after the introduction of HU-Loss.Although the results of baseline (c) and baseline with HU-Loss (d) are able to identify small objects, baseline models still are prone to focus on the background and degrade the performance.In addition, in the top right of the scene, the baseline confuses the background with the ship, which leads directly to a misjudgment of these small objects.Furthermore, the baseline feature map visualization results are brighter overall, which means that baseline model does not do a good job of separating the small objects from the background.Unlike the general HRI segmentation task, SOS is more susceptible to the background interference, which leads to the small objects' segmentation collapse.In contrast, the baseline combined with HU-Loss achieves a better performance.In Figure 7(d), it can be seen that the small objects are marked and the background is effectively restricted, i.e. the background area is darker, which is the more ideal state in SOS task, where the model pays more attention to small objects.
Overall, the proposed HU-Loss will aid the original model to more effectively identify small objects in large-scale remote sensing images, and at the same time can provide some constraints to the background.In this way, the original structures, equipped with HU-Loss, will be better able to perform SOS task.

The analysis of experimental process
In this section, we further discuss the detailed progress of the experiment, which will verify the effectiveness of the proposed HU-Loss.The iSAID dataset is used as an example to conduct this analysis, and the conventional mIoU and the newly proposed sIoU in the present paper are still the indicators of the analysis.We will illustrate the regression process for small objects and the stability of each method combined with the HU-Loss.The detailed experimental results are shown in Figure 8.
As shown in Figure 8, it is obvious that at almost every node, the experimental output from the baseline with HU-Loss is better than the without, which explicitly indicates that the proposed HU-Loss is able to be well incorporated with these deep learning-based architectures and plays a positive role.In addition, we can find that as the number of iterations increases, the experimental outputs become better, which means that HU-Loss does not make the original network degraded or unstable.It is noticed that the fluctuation of the PSPNet is somewhat large.This is due to the fact that the atrous convolution in the PSPNet will ignore many small objects, which makes itself will be fitted faster and very unstable in the process afterwards.Moreover, the proposed HU-Loss can make sIoU continuously fit mIoU, which is a good indication that HU-Loss will narrow the gap between background and small objects in loss, i.e.HU-Loss can make the regression process pay more attention to small objects.
In conclusion, the proposed HU-Loss will play a positive and significant role when it incorporated with conventional deep learning structures, and HU-Loss can enable these networks to alleviate the bias against small objects during training and thus effectively improve the effectiveness of SOS task, which significantly narrow the gap between human visual perception and machine behavior.

Conclusion
In this paper, we proposed an unbiased loss function to alleviate the scale-induced bias in SOS task.The hierarchical solution was employed to mitigate the two sub-problems in the scale-induced bias.Moreover, we extended the proposed method to a more challenging generalized setting and produced uplifting improvements compared to the baseline.
In the future, we hope that our work may shed fresh light on small objects interpretation and enlighten the design of loss constraints for weak object detection, weak information extraction, special feature recognition, etc.At the same time, how to balance the segmentation accuracy between background and small objects will be the issue to be addressed in the further step of our work.In addition, how to make the model based on Transformers more suitable for SOS task will also be a problem that we need to continue investigating about in the future.

Figure 1 .
Figure 1.The explanation of the scale-induced bias.(a) the bias between the background and the small object category.(b) the bias between the small object categories, and the value is the proportion of each category in the small objects.

Figure 2 .
Figure 2. Workflow of calculating the hierarchical unbiased weight.(a) the first layer, which is used to eliminate the bias between the background and the small object category.This layer yields the initial weights w′ s and w′ b .(b) the second layer, which is used to eliminate the bias between the small object categories.In this layer, the emendation vector Ψ is introduced to obtain the final weights, w 1 ,. ..., w n , for each category.

Figure 3 .
Figure 3. Sketches of the two loss function optimization processes.(a) CE Loss, and (b) HU-Loss.The pink and gray objects in the figure are small objects that are identified.

Figure 7 .
Figure 7.The feature map visualization results of FCN8s for the port scene.(a) original image, (b) reference binary chart (black is the background, white is the small objects), (c) FCN8s, and (d) FCN8s w/HU-Loss.

Table 1 .
Performance comparison of the different models on iSAID.Baseline means the original network with CE Loss, and green (+XX) and red (−YY), respectively, represent the performance improvement or decrease caused by HU-Loss.

Table 2 .
Performance comparison of the different models in per class on iSAID.Baseline means the original network with CE Loss, and green (+XX) and red (−YY), respectively, represent the performance improvement or decrease caused by HU-Loss.

Table 3 .
Performance comparison of the different models on ISPRS Vaihingen.Baseline means the original network with CE Loss, and green (+XX) and red (−YY), respectively, represent the performance improvement or decrease caused by HU-Loss.

Table 4 .
Performance comparison of the different models in per class on ISPRS Vaihingen.Baseline means the original network with CE Loss, and green (+XX) and red (−YY), respectively, represent the performance improvement or decrease caused by HU-Loss.