Ship detection and identification in SDGSAT-1 glimmer images based on the glimmer YOLO model

ABSTRACT Remote sensing technology has been widely used for marine monitoring. However, due to the limitations of sensor technologies and data sources, effective monitoring of marine ships at night remains challenging. To address these challenges, our study developed SDGST, a high-resolution glimmer marine ship dataset from SDGSAT-1 satellite and proposed a ship detection and identification method based on the YOLOv5s model, the Glimmer YOLO model. Considering the characteristics of glimmer images, our model has made several effective improvements to the original YOLOv5s model. In particular, the improved model incorporates a new layer for detecting small targets and integrates the CA (Coordinate Attention) mechanism. To enhance the original feature fusion strategy, we introduced BiFPN (Bi-directional Feature Pyramid Network). We also adopted the EIOU Loss function and replaced the initially defined anchors with clustering results, thus improving detection performance. The mean Average Precision (mAP%) reaches 96.7%, which is a 5.1% improvement over the YOLOv5s model. Notably, it significantly improves the detection of small ships. This model demonstrates superior performance in ship detection under glimmer conditions compared to the original YOLOv5s model and other popular target detection models, and may serve as a valuable reference for achieving high-precision nighttime marine monitoring.


Introduction
China boasts a vast marine territory and rich marine resources.All-day monitoring of maritime activities, such as fisheries and shipping, is of great significance for ocean exploration and the national economy.With the rapid development of remote sensing and target detection technologies, marine ship detection using high-resolution remote sensing images plays a vital role in various fields.Extensive studies have been done on marine ship detection and identification using synthetic aperture radar (SAR) images, optical remote sensing images and infrared remote sensing images (Cao et al. 2020;Dong, Liu, and Xu 2018;Liu, Ma, and Chen 2018;Liu et al. 1.The resolution of conventional glimmer remote sensing sensors cannot meet the requirements for precise detection of marine targets.At present, the commonly used glimmer remote sensing sensors mainly include the Operational Linescan System (OLS) sensor carried by the U.S. Defense Meteorological Satellite (DMSP), with a resolution of 2700 m; the Visible Infrared Imaging Radiometer Suite (VIIRS) carried by the Suomi National Polar-orbiting Partnership (NPP) satellite, with a resolution of about 750 m; and the LJ1-01 satellite developed by Wuhan University, which carries a high-sensitivity night-light camera with a resolution of 130 m.The resolution of these sensors is not sufficient to accurately detect and identify marine targets, as the length and width of ships range from tens to hundreds of meters.2. Lack of glimmer imaging datasets for marine targets.Compared to other types of remote sensing detectors, the development of glimmer sensors is relatively recent and few in number.This has resulted in a lack of high-resolution glimmer imaging datasets for marine targets, posing a challenge for further research.3. Glimmer images are susceptible to disturbances, interference from clouds and atmospheric noise, which can affect the identification of other targets.
To solve the above problems, we have developed a high-resolution representative marine target dataset, SDGST, based on SDGSAT-1 glimmer remote sensing data.We also have designed a marine target detection model, Glimmer YOLO, which utilizes glimmer images and is based on the YOLOv5s model.Our main contributions are as follows: 1. We developed the SDGST dataset, a representative marine target dataset with 10 m resolution, on the basis of SDGSAT-1 glimmer images.The SDGST dataset consists of 2083 images at 10 m resolution, containing a total of 7553 marine ship targets.This effort addresses the shortage of glimmer imaging datasets and serves as a valuable prior dataset for the detection and identification of marine ships under low light conditions.2. We have made a number of improvements to the YOLOv5s model, including a new detection layer and an attention mechanism.We have also improved the feature fusion strategy and loss function computation method and designed the Glimmer YOLO model to replace preset anchors with clustered anchors.These improvements aim to reduce the model's receptive field, enhance the feature representation, and improve the accuracy of loss backpropagation, thus making the model more suitable for marine ship detection and identification in glimmer images.3. Using the Glimmer YOLO model, we achieved the detection and identification of a variety of marine targets in multiple nighttime scenarios.Notably, this method greately improves the detection of small targets, resulting in a multi-class mean Average Precision (mAP) of 96.7%.Its detection performance exceeds that of the original YOLOv5s model and other popular target detection models and can serve as a valuable reference to the monitoring and management of nighttime marine activities.
The rest of the paper is structured as follows.In Section 2, we review previous relevant research in the field of marine target detection.Section 3 provides and overview of the data used, describing the SDGST dataset, and introduces the Glimmer YOLO method.In Section 4, we perform experiments and analyze the results.Section 5 discusses the results of the Glimmer YOLO model in comparison to other models, as well as the ablation experiments.The last section concludes the paper.

Relevant research
In recent years, the development of marine target detection methods has progressed through two main stages.The first stage is the early traditional ship detection methods, while the second stage features the rapid development of deep learning-based methods, which rely on massive data and multiple features to identify targets.
Traditional target detection algorithms mainly identify ships by examining their shape and texture features or use thresholding and statistical analysis techniques for detection.Methods based on shape and texture features rely on manual extraction of features, which increases computational complexity and often leads to high false and missed detection rates.These methods are commonly used in the optical remote sensing and Synthetic Aperture Radar (SAR) domains.(Kumar and Selvi 2011).Traditional glimmer remote sensing data, such as DMSP/OLS data and NPP/VIIRS data, tend to have relatively low resolution and large differences between the background and the target.When using these data for ship detection based on ship lighting features, many scholars and experts have adopted the threshold method to distinguish between the background and the target.Waluda suggested that the nighttime light radiation obtained through remote sensing primarily comes from ship lights and reflected light from the sea surface, and also employed the threshold method to distinguish between the background and the target (Waluda et al. 2004).Saitoh proposed that the light radiation detected by remote sensing at night comes from both ship lights and reflected lights off the sea surface (Saitoh et al. 2020).In the study, he adopted DN ≥ 30 (i.e. the brightest 50%) of the DMSP/OLS visible band to identify ship light pixels, and applied a threshold of DN ≥ 46.8 to detect fishing boats in the Pacific Ocean.Elvidge et al. proposed an offshore light fishing boat identification method based on their radiative features of NPP/VIIRS nighttime remote sensing images, including peak detection and fixedthreshold segmentation (Elvidge et al. 2015).Liang et al., using LJ-01 data as an example, combined threshold segmentation method with CFAR method to obtain ship positions (Zhong et al. 2020).Based on image processing analysis, these methods are relatively simple to compute.However, the extracted features can greatly affect the accuracy of classification and lack robustness when dealing with multiple targets, which leads to poor detection performance.
Marine target detection based on the deep learning theory of artificial neural networks can be broadly categorized into two main groups: one is two-stage target detection algorithms based on candidate regions, such as R-CNN (Girshick et al. 2014), Fast R-CNN (Girshick 2015), and Faster R-CNN (Ren et al. 2015).In this detection process, the Region Proposal Network (RPN) first identifies the Region of Interest (ROI) that may contain the target, followed by further refinement of classification.While this method has achieved higher detection accuracy, it also comes with increased time complexity.The other is regression-based single-stage target detection algorithms, such as the YOLO series (Redmon et al. 2016), which directly estimate candidate targets without relying on region suggestions.This approach maintains high detection efficiency at the expense of some detection accuracy.These two methods, along with a range of improved models, have been widely used for ship detection and identification using optical remote sensing and SAR.Nie et al. (Nie et al. 2018) have achieved the detection of warships and merchant ships in optical remote sensing images based on the Mask R-CNN model.Zhang et al. added a residual convolution module to the Faster R-CNN model to enhance the feature representation capacity to optimize the ship detection (Zhang, Xie, and Zhang 2022).Wang et al. used the SSD model in two-stage detection for ship detection and improved the detection performance by using transfer learning technique (Wang, Wang, and Zhang 2018).Zhang et al. proposed a highspeed SAR ship detection method using a Depthwise Separable Convolutional Neural Network (DS-CNN), which reduces the parameters of the traditional CNN convolution and improves detection speed (Zhang et al. 2019).Xu et al. proposed the GWFEF-Net that combines dual-polarization feature enrichment to take full advantage of the dual-polarization features of SAR images (Xu et al. 2022a).Chen et al. integrated a lightweight attention expansion module, DAM, into the YOLOv3 model to enhance its focus on ship features (Chen, Shi, and Deng 2021).Huang et al. introduced the Receptive Field Block (RFB) into the YOLOv4 model to expand the receptive field and improve small targets detection (Huang et al. 2023).They optimized the YOLOv4 backbone network and designed a new receptive field expansion module, DSPP, to improve the model robustness to target movement (Han et al. 2021).Xu et al. applied the YOLOv5 model to SAR image detection and designed a lightweight cross-stage partial module, L-CSP, to reduce the computational load and achieve accurate ship detection performance (Xu, Zhang, and Zhang 2022).Applying deep learning to target detection enhances accuracy in complex backgrounds, supporting real-time, precise and robust detection.However, deep learning has not yet been widely applied in such field due to the low resolution of glimmer images and the scarcity of extracted features.Also, ship detection based on deep learning usually requires a large amount of training data.Therefore, we have developed a high-resolution glimmer remote sensing marine ship dataset and proposed a ship detection and identification method applicable to it.

SDGSAT-1 satellite
Developed by the Earth Big Data Science Project of the Chinese Academy of Sciences, SDGSAT-1 is the first satellite launched by the International Research Center of Big Data for Sustainable Development Goals (CBAS).It adopts a sun-synchronous orbit with a reference orbit altitude of 505 km and an inclination of 97.5°.It is equipped with a glimmer imager GIU, a multispectral imager MII, and a high-resolution and wide-format thermal infrared spectrometer TIS.It offers multiple observation modes, including 'thermal infrared + multi-spectral', 'thermal infrared + glimmer' and 'single-payload'.These modes can realize all-day, multi-payload coordinated observation.Each payload can capture data across a 300 km swath with a revisit cycle of about 11 days for ground targets.SDGSAT-1 glimmer imager has one panchromatic band and three-color bands with spatial resolutions of 10 and 40 m, respectively.Specific technical and performance indicators are shown in Table 1.

Dataset production
We collected 10 m resolution glimmer image data from November 2021 to October 2022, comprising 230 scenes in the Guangdong and Shanghai harbor areas along the coast of China.The data were sourced from the SDG Big Data Platform of CBAS and have undergone geometric correction, along with radiometric calibration.
In creating the dataset, we first cropped the 10 m resolution SDGSAT-1 glimmer imagery to a size of 640 × 640-pixels using a sliding window.We then used the Labelimg tool to annotate the marine targets in various scenarios, such as a pure ocean background, a riverside area, a place close to the coast, and marine targets under different weather conditions such as sunny or cloudy.To accurately label ship types, we utilized the Automatic Identification System (AIS) data for marine ships, which were collected concurrently as the imaging.These point-tracking data include both dynamic and static ship information in real-time (e.g.ship type, MMSI number, length, width, speed, and heading) and serve as the benchmark from which we can accurately identify the real type of ship.We categorized ships into groups such as small ships, container ships, and fishing boats based on their dimensions (length and width).Small ships, due to their small size, do not have distinctive outlines and tend to appear as small dots in images with relatively low light luminosity.Container ships, in contrast, have distinct contours, and their average length significantly exceeds that of small ships.Fishing boats, despite being smaller than small ships, have distinctive shapes due to their fishing lights, resulting in a broader range of light in images.The SDGST dataset we developed includes 2038 sub-images, covering 7553 diverse marine targets, as detailed in Table 2 below.
As the glimmer images are susceptible to clouds or other noise, we considered this factor in the labeling process by selecting images with and without noise effects evenly.To make the dataset applicable to various background scenarios, we chose images with pure marine backgrounds or land-sea interface backgrounds.

YOLOv5 model
YOLOv5, proposed by Glenn Jocher, is the latest generation of the YOLO series algorithms and its framework is shown in Figure 1.Similar to earlier versions of YOLO, YOLOv5's overall framework can be divided into three parts: Backbone, Neck and YOLO Head.The Backbone, also known as the CSP Darknet, is the main feature extraction network of YOLOv5.Here, the input image undergoes initial feature extraction, in which the features extracted from the feature layer constitute the input image feature set.The original YOLOv5 algorithm extracts three feature layers in the Backbone part, followed by feature fusion in the Neck part.To achieve this, YOLOv5 adopts a strategy that combines Feature Pyramid Network (FPN) and Path Aggregation Network (PAN) to facilitate multi-layer feature fusion.Enhanced and effective feature layers are deployed in the YOLO Head for classification and regression.As a classifier and regressor in YOLOv5, the YOLO Head evaluates feature points to determine if they correspond to targets.In short, the overall workflow of the YOLOv5 network consists of feature extraction, feature enhancement, and prediction of targets corresponding to feature points.Among the YOLOv5 series, YOLOv5s stands out as the leanest model and provides best performance on devices with limited computational resources.

General framework of glimmer YOLO
The Glimmer YOLO model was proposed to address the challenge of detecting small marine targets., we have added a new small target detection layer to the feature extraction that enriches the model with more extensive position information.This addition can help ease the challenge of detecting small targets.Also, we included the Channel Attention (CA) mechanism module (Hou, Zhou, and Feng 2021) to enhance the presence of targets of interest on the feature map.Further, we used the K-means clustering method to obtain size clustering results for smaller targets in the dataset.We replaced the default smaller anchors in YOLOv5 with preselected anchors that are better suited to our data.In the feature fusion part, we utilized the weighted Bidirectional Feature Pyramid Network (BiFPN) (Tan, Pang, and Le 2020) fusion method to enhance multi-level feature fusion in YOLOv5.Also, we modified the loss function by replacing the position loss in the loss function from Complete Intersection over Union (CIOU) loss (Zheng et al. 2021) to the more advanced Efficient Intersection over Union (EIOU) loss (Zhang et al. 2022).This was done to improve Glimmer YOLO's training performance.Our proposed Glimmer YOLO has achieved a Mean Average Precision (mAP) of 96.7%, which is 5.1% higher over YOLOv5s.Small marine targets are particularly well detected.Its framework is shown in Figure 2.

Coordinate attention
Coordinate Attention (CA) is a novel attention mechanism proposed by Hou et al. (Hou, Zhou, and Feng 2021).Due to computational limitations, the most common mechanism for target detection remains the Squeeze-and-Excitation (SE) mechanism (Hu, Shen, and Sun 2018).However, it only considers channel information and ignores position information, which is even more important for detecting small targets.A number of other attention mechanisms that consider position information have been proposed, such as CBAM (Woo et al. 2018), BAM (Park et al. 2018).However, due to the limitations of the convolutions they use, these mechanisms have higher computational complexity and require more computational resources.The CA mechanism solves the drawbacks of the above attentional mechanisms by capturing both cross-channel information and position information, which makes our targets of interest more visible on the feature map.Moreover, the CA mechanism is flexible and lightweight, and can be instantly incorporated into the network.The SE and the CA mechanisms are shown in Figure 3. C is the number of input feature channels, whereas H and W are the length and width of the input features, respectively.X Avg Pool refers to the global pooling operation in the horizontal direction, and Y Avg Pool refers to the global pooling operation in the vertical direction.Unlike the SE mechanism, the CA mechanism decomposes the global pooling according to Eq. 1.
For a given input X, we encode the channel along the horizontal and vertical coordinates using two types of pooling kernels, (H, 1) and (1, W), respectively.The output of channel 'c' at height 'h' can be expressed as Eq.2: Similarly, the output of channel 'c' at width 'w' can be expressed as Eq.3: After the two transformations, we obtain two feature outputs perceived in both directions, which is different from the SE mechanism that generates a single feature vector.Then, the results of the above transformations are used to generate Coordinate Attention, and convolution and activation function operations are applied to the spatial information in both horizontal and vertical directions to yield the output.We've added the CA module to Glimmer YOLO to enhance the prominence of various marine target features during feature extraction.Experiment results have confirmed that this strategy is highly effective.

Small target detection layer
One of the reasons why YOLOv5 is unfavorable for small target detection is that the size of small samples is small, while YOLOv5 has a relatively large down-sampling multiplier for feature extraction, which makes it difficult for the deeper feature maps to learn the feature information of small targets.Therefore, we've added a new output to the feature fusion part after the first C3Conv in the Backbone part, which is a feature layer with a smaller down sampling multiplier and a smaller receptive field with richer feature information for small targets.In Figure 4, we can see that the new feature output is fully involved in the feature fusion process in the Neck part, and finally output a new detection layer.In total, four detection layers are sent to the YOLO Head part for prediction.While adding a new small target detection layer may result in an increase certain parameters, it is very simple and efficient for the detection of small targets.Given that most of the targets in remote sensing images are small in size, we find it an acceptable trade-off between detection speed and overall detection performance improvement.

BiFPN feature fusion strategy
Since its introduction, FPN (Lin et al. 2017) has been widely used for multi-scale feature fusion.In the feature fusion part, the YOLOv5 network combines FPN and PANet to facilitate feature fusion across different layers.While FPN treats features from different resolutions equally, the authors of BiFPN argue that they contribute differently to the fused features.To address this, Tan et al. proposed the Weighted Bidirectional Feature Pyramid Network (Tan, Pang, and Le 2020), which aims to learn the importance of features from different layers and assign appropriate weights accordingly.We implemented this structure in the YOLOv5 network, as shown in Figure 5.
As shown in Figure 5, different color circles represent different connection methods.YOLOv5 uses the torch.catfunction to connect and merge two feature maps, while BiFPN utilizes a unique weighted fusion method.It is worth noting that BiFPN adds a new skip connection between input and output nodes at the same scale, enabling the connection of three input features.This approach helps to fuse more features without significantly increasing the computational load. (4) When the number of layers is 'h', the feature connections corresponding to the same layer are calculated as shown in Eq. 4 and Eq. 5. Here, 'mid' means that the high-level features are connected and fused with the low-level features by up-sampling, while 'out' means that the fused low-level features are re-down-sampled, which are then connected and fused with 'mid' features as well as the input features as outputs.We adopted BiFPN to optimize the feature fusion strategy in the neck so that each output feature map of the Glimmer YOLO model can better reflect the actual features of the target.

K-means anchor
Proper preset anchors are critical to detection performance of YOLO and RCNN series.While the default anchors of YOLOv5 match the size pattern of most targets, there are many small targets, such as scattered fishing boats, under low-light conditions, for which the size and aspect ratio of default anchors may not be well-suited.To make the preset anchors more reasonable, we employed the K-means method to cluster all target anchors in the SDGST dataset and divided them into 12 categories, corresponding to 4 detection layers in the Glimmer YOLO, as shown in Figure 6 below.
Through preliminary experiments, we found that the default anchor boxes performed well on larger-size targets (e.g.large ships and offshore platforms) in the SSGST dataset.Therefore, we chose to replace the anchors associated with low-level features in the original anchor set, which are responsible for predicting small targets, while the anchors associated with high-level features left unchanged.

EIOU loss
YOLOv5's loss function consists of three parts: position loss, confidence loss, and category loss.The position loss utilizes the CIOU loss function, which quantifies the disparities between the actual anchor boxes and the predicted anchor boxes through such calculations as bounding box regression (BBR) (Zheng et al. 2021).The CIOU Loss takes into account three geometric factors: the overlap between the anchor boxes, the centroid distance between anchor boxes, and the aspect ratio of anchor boxes.It solves certain problems found in earlier IOU Loss (Yu et al. 2016) and GIOU Loss (Rezatofighi et al. 2019).However, in CIOU Loss, the aspect ratio is represented as a relative value, which has some ambiguity and fails to reflect the true differences in width, height, and their respective confidences.The problem arises when the model converges to a state where the predicted frame's aspect ratio aligns with that of the real boxes.As shown in Figure 7, in this state, the width and height of the predicted boxes cannot be increased or decreased at the same time, which hinders further regression optimization (Zhang et al. 2022),.Based on this, Zhang et al. proposed an effective Efficient Intersection over Union Loss method, which solves the problems of CIOU Loss by splitting the influencing factors of the aspect ratio of the predicted and real boxes on the basis of CIOU Loss.It calculates the length and width of the predicted and actual boxes separately.Figure 7 shows how CIOU Loss and EIOU Loss are calculated.The loss function for the EIOU regression is shown below.As can be seen in Eq. 6, the EIOU Loss consists of 3 parts: IOU Loss, Centroid Distance Loss, and Width-Height Loss, where b and b gt represent the centroids of the predicted and actual boxes, while h, w, h gt , and w gt are the lengths and widths of the predicted and the actual boxes.r 2 (b, b gt ) denotes the squared Euclidean distance between the two centroids, while r 2 (h, h gt ) and r 2 (w, w gt ) represent the squared differences in the lengths and widths between the predicted and the actual boxes.c 2 stands for the squared diagonal of the minimum bounding frame that covers both the predicted and the actual boxes, and c 2 w and c 2 h are the squared widths and lengths of this minimum bounding frame.The EIOU Loss is calculated in a manner that enables faster model convergence and provides a superior localization performance than CIOU Loss.As there are many small targets present in the SDGST dataset, we have adopted EIOU Loss, known for its better localization capacities to enhance the detection performance on these small targets.

Data preprocessing
Given that deep learning models require a large amount of data, we performed data enhancement on the SDGST dataset, in order to further enrich the dataset and increase its complexity and diversity.All the targets in the SDGST dataset are marine targets and we adopted mirroring and multi- angle rotation for data enhancement to simulate various poses of ships on the sea, as depicted in Figure 8 below.The SDGST dataset initially contained 2038 images and 7553 targets.After data enhancement, the dataset was expanded to contain 12,228 images and 45318 targets, as detailed in Table 3 below.Importing a larger dataset into the training process can improve the detection and generalization capabilities of the model.
Initially, we compared the results of proposed model with and without data enhancement and found that proper data enhancement can effectively improve the prediction performance.

Experimental setup
All experiments were conducted on a workstation with 128GB RAM and NVIDIA GeForce RTX 3080 GPU based on the Pytorch framework and CUDA 11.6 server.All images in the dataset are 640*640 pixels and are divided into train-val dataset and test dataset in a 9:1ratio.Meanwhile, the train-val dataset is also divided into train dataset and val dataset in the same ratio.In this way, we can use the val dataset to evaluate the training results in real-time and use the test dataset for final evaluation of the model after completing training.It should be noted that the data enhancement described in 4.1 is performed after the initial division of the dataset.During training, we utilized the Stochastic Gradient Descent (SGD) optimizer to update the model parameters with an initial learning rate set to 0.01, weight decay set to 0.0005, momentum set to 0.937, and batch size set to 16 for a total of 200 epochs.These settings are applied to all experiments.

Evaluation indicators
To provide objective and accurate evaluation of our proposed model, we adopted multiple metrics as evaluation criteria.Given that our detection tasks involves multiple categories, we chose to use the mean Average Precision (mAP) to evaluate the detection performance.mAP is the average value of the Average Precision (AP) for multiple categories, where AP is defined as follows: where precision (P) denotes the ratio of actual positive samples for which the detection results are positive, while recall (R) denotes the ratio of actual positive samples for which the detection results are also positive.Their formulas are as follows: True Positive (TP) indicates the number of correctly classified positive samples, that is, the number of positive samples in both detection results and actual; False Positive (FP) indicates the number of negative samples misclassified as positive, i.e. the number of positive samples in detection results but the actually negative; False Negative (FN) represents the number of positive samples misclassified as negative, i.e. the number of negative samples in detection results but actual results are positive.These indicators are used to calculate the AP, which can fully reflect the accuracy and recall of a category with its value between 0 and 1.The closer to 1, the better the detection performance of the model.For multiple categories, each has a different AP value.mAP is used to reflect the overall evaluation results of multiple categories as an indicator of the model's detection performance.In particular, we set the IOU threshold to 0.5 to help the model determine whether a target is detected or not.Further, we used the boxes per second (FPS) metric to evaluate the detection speed of the model and chose floating point operations (FLOPs) and the number of parameters to measure the complexity of the model.

Experimental results
We trained the YOLOv5s model and the Glimmer YOLO model, with the training curves shown in Figure 9.At the 200th epoch of training, both TRAIN LOSS and VAL LOSS of both models fluctuated only slightly.This indicates that the models had converged and become stable.This version of each model was selected at this epoch as final results to prevent over fitting.
We then tested both models on the TEST dataset, and the precision and recall rates for the three categories are shown in Table 4 below, in which the detection performance of small ships improved the most, with an AP increase of 7.9%.Glimmer YOLO increased the recall of small ships while ensuring the precision rate, thus improving the detection and identification of small targets and the large targets' detection did not decrease, which is noteworthy.The overall mAP of our model reaches 96.7%, indicating that it achieves good detection results for different marine targets in glimmer images, and the experimental results meet our expectations.To demonstrate the detection capability of the model, we used the YOLOv5s model and Glimmer YOLO model to show the identification performance for different ship targets under various conditions and backgrounds, including single-category, multi-category, with or without noise effects, and whether they are located at sea-land boundaries, etc., as shown in Figure 10.
As can be seen, for small ship targets, the YOLOv5s model suffers from the problem of missed detection, especially in areas with dense distribution of ships, as shown in Figure 10(a)-(c).Our improved Glimmer YOLO model has a new small target detection layer and an attention mechanism, which focuses on enhancing the detection of small targets.as can be observed in the figures, whether it is a small ship or a smaller container ship, the Glimmer YOLO model increases the detection recall of such small targets, as shown in Figure 10(k)-(m).For container ships and fishing boats, both of the models perform very well and can successfully detect ship targets against different backgrounds.However, as remote sensing images were acquired at different times, the lighting range of fishing boats varies greatly due to factors such as noise, weather, and the influence of the imaging system.There are also targets (e.g.fishing boats) at the edge region of the image, which the YOLOv5s model sometimes fails to detect, which may lead to missed or false detections.,The Glimmer YOLO model has undergone weighted feature fusion and has a more advanced loss function, and the trained model has improved generalization and identification capabilities.This model is able to detects these targets, thereby increasing the recall rate for fishing boats, as shown in Figure 10 (g), (q), (j) and (t).To sum up, the Glimmer YOLO model maintains good detection performance for large ships while enhancing the detection performance for small targets, which raises the precision rate and recall rate.Meanwhile, the parameters of the Glimmer YOLO model only increased by 6% compared to YOLOv5s, and the FPS decreased by 15%, which is acceptable to us in order to realize high-precision ship detection.

Comparison with other popular models
To show the advantages of the Glimmer YOLO model, we compared it with a number of popular target detection models, including single-stage detection methods: YOLOv8, YOLOv3 (Redmon and Farhadi 2018), SSD (Liu et al. 2016), Retina Net (Lin et al. 2017) and two-stage detection method, Faster RCNN.The detection results of each model are shown in Table 5.The mAP of Glimmer YOLO is significantly improved compared to the original YOLOv5 and other models, and the training results also show higher stability.

Ablation experiments
In order to show the improvements of each module over the original YOLOv5 model in more detail, we carried out ablation experiments.Using the original YOLOv5s model as baseline, we in turn added the new detection layer, CA attention mechanism, BiFPN feature fusion strategy, EIOU loss function, and K-means clustering Anchors as improved modules and conducted experiments to obtain the following results shown in Tables 6 and 7.
As can be seen from Table 6, the addition of the small target detection layer brings about the greatest improvement in overall detection, with a 2% increase in mAP, but this also leads to an increase in some parameters and GFLOPs, and a decrease in the FPS of the model.When adding the CA attention mechanism, we compared other attention mechanisms such as the SE mechanism and the CBAM mechanism, which are not as effective as the CA mechanism, and it does not increase the number of parameters of the model as well as the complexity.BiFPN enhances the model's detection performance while decreasing the number of parameters and GFLOPs, which also slightly reduces the model's complexity.The optimization of the loss function and the replacement of K-means anchors do not affect the computational performance or complexity of the model.

Conclusions
In this study, we constructed the SDG dataset using high-resolution glimmer remote sensing data from SDGSAT-1, which is a valuable resource comprising 2083 images and a total of 7553 marine ship targets.This dataset fills a significant gap in the current research field, where high-quality glimmer ship data has long been lacking.Based on the YOLOv5 model, we introduced several improvements, including the new detection layer, CA attention mechanism, BiFPN feature fusion, improvement to the loss function, and anchor replacement method, resulting in the proposed Glimmer YOLO model.Our model, tailored to the unique characteristics of glimmer data, demonstrates excellent detection performance using the SDGST dataset, achieving an impressive mAP of 96.7%.Compared to the original YOLOv5 model and other popular target detection models, our model exhibits a substantial improvement in detection accuracy, particularly for small ships.This enables us to accurately detect a wide range of small, medium and large marine objects in glimmer images, which remains a challenge for other models and datasets.Our comprehensive comparison and ablation experiments validate the effectiveness of our proposed model.We believe that our model is expected to be used for real-time localization and monitoring of nighttime marine targets, contributing to the development of a comprehensive marine monitoring system.However, we recognize that the SDGST dataset needs to be further extended due to its current size limitations.
In future research, we will focus on enriching and enhancing the glimmer dataset and plan to carry out further experiments to differentiate between finer-grained ship categories to fully exploit the advantages of SDGSAT-1's high-resolution glimmer data.

Figure 4 .
Figure 4. Glimmer targets in different backgrounds.Small ships are marked with green boxes, container ships with blue boxes, and fishing ships with yellow boxes.Panel (a-c) represent a pure ocean background, (d-f) represent a sea-land interface background, and (g-i) represent a noise background.

Figure 8 .
Figure 8. Ship dataset enhancement.(a) is the original image, and (b-f) are the images after mirroring and rotating at different angles.

Figure 10 .
Figure 10.Detection and identification results of different ship targets by Glimmer YOLO model and YOLOv5s model.(a)-(j) Glimmer YOLO model detection and identification results; (k)-(t) YOLOv5s model detection and identification results.
Yang et al. proposed a new linear function for selecting candidate ships based on intensity and texture features (Yang et al. 2013); Qi et al. utilized visual saliency signals to extract candidate ship region and extracted the Histogram of Oriented Gradient (HOG) features for ship detection(Qi et al. 2015); Kumar et al. constructed a color-texture hybrid space for ship extraction by combining features such as ship color and texture

Table 1 .
Technical specifications and performance indicators of SDGSAT-1 glimmer imager.

Table 2 .
Number of Targets in the SDGST Dataset.

Table 3 .
Number of Targets in the SDGST Dataset after Data Enhancement.

Table 4 .
Detection accuracy, recall, and AP of Glimmer YOLO for different targets.

Table 5 .
Comparison results of different models with multiple indicators.

Table 7 .
Changes in the indicators of ablation experiments.