Improving the mapping of coastal invasive species using UAV imagery and deep learning

ABSTRACT Spartina anglica C. E. Hubb. is an invasive species of saltmarsh and mudflats in Ireland. It spreads quickly and can form extensive meadows, which can lead to the extinction of native plant and animal species. Traditional ground-based field surveys would be expensive for the scale of monitoring needed to formulate site-specific management strategies. Advanced mapping techniques, such as Unoccupied Aerial Vehicle (UAV) remote sensing and deep learning (DL), offer an opportunity to automatically map invasive species’ occurrence. However, training a DL model requires lots of labelled data which can be prohibitively difficult and expensive to obtain. We implemented a DL semantic segmentation technique on UAV imagery of Spartina-invaded habitats and tested a range of hyperparameters—model architectures, encoder backbones and input image patch sizes—to determine an effective segmentation network structure. We also investigated applying data augmentation and pseudo-labelling techniques to increase the size of the labelled dataset. U-Net architecture with Inception-v3 as its backbone trained on 128 × 128-pixel image patches offered the best model performance: mean Intersection-Over-Union (mIOU) score = 0.832. The model trained on the combined augmented and pseudo-labelled data achieved an mIOU score of 0.712 on a test dataset, while there was a decrease of 0.158 in model performance when only the original labelled data were used. This result suggests the potential for using these techniques in creating more robust models. The proposed methodology demonstrates that the combination of UAV imagery and deep learning could be a promising tool for mapping the distribution of invasive plant species.


Introduction
Invasive alien species (IAS) are organisms found in an environment outside their natural range and cause negative impacts on this new environment (IUCN 2000).They are considered amongst the major drivers of global biodiversity loss as they can alter ecosystem functions and services and cause species richness decline (Mollot, Pantel, and Romanuk 2017).The adverse impacts of IAS have been recognized by international organizations, such as the International Union for Conservation of Nature (IUCN 2000) and the European Union (EU) through the EU regulation on the prevention and management of the introduction and spread of IAS (European Commission 2014).In Ireland's most recent EU Habitats Directive report, the presence of IAS is one of the major threats to and pressures on habitats, affecting more than 40% of all reported habitats (National Parks and Wildlife Service 2019).Furthermore, the estimated cost of IAS to the Irish economy was €202 million in 2013 (Kelly et al. 2013).
In general, assessment of IAS requires accurate maps to monitor their location and expansion over time.These maps can also be used to identify hotspots where management efforts can be focused, as well as measure the success of control programmes.The information is commonly obtained by traditional ground-based field surveys, which can be costly, spatially restricted and involve considerable time and effort.Furthermore, the rate of IAS introductions is increasing globally, which can limit the potential of manual methods alone for large-scale monitoring (Seebens et al. 2017).Consequently, an alternative is needed, a more efficient and repeatable technique to improve the way of capturing and monitoring the spatial distribution of IAS, especially those with the highest environmental and socioeconomic impact (Nentwig et al. 2018).
Spartina anglica C.E. Hubb (= Sporobolus anglicus (C.E.Hubb.)P. M. Peterson & Saarela), also known as common cordgrass, is on the list of the most problematic IAS in the EU (Nentwig et al. 2018) and globally (Lowe et al. 2000).This perennial grass, commonly found in saltmarsh and mudflat habitats, spreads by seed dispersal or vegetatively by rhizomes and can form extensive meadows (Figure 1) (Nehring and Adsersen 2006).In Ireland, S. anglica was first introduced in Cork Harbour to bind and stabilize sediment ( McCorry and Ryle 2009); it has then subsequently spread to other locations on the Irish coast.It invades and displaces EU Habitats Directive Annex I habitats, including tidal mudflats and sandflats, Atlantic salt meadows and Salicornia beds (National Parks and Wildlife Service 2019).This reduces feeding habitats for birds (Stokes, O'Neill, and McDonald 2006).Monitoring the location and extent of S. anglica is an important step in assessing its rate of expansion (Brophy et al. 2019).
Advances in mapping technology, such as remote sensing, offer a potential solution to monitoring S. anglica (Huang and Asner 2009).Medium-resolution satellite imagery, typically >10 m/pixel, has been used to detect S. anglica covering large areas (Li, Gao, and Wang 2010;Zuo et al. 2012).However, it is too coarse for detailed mapping, such as delineating S. anglica when mixed with native species or detecting expansion in the form of small clonal patches.Early detection of S. anglica expansion is considered the best approach to prevent further spread, to minimize the cost of controlling the plant and to increase the success of eradicating it (Brophy et al. 2019).Mapping small patches of S. anglica requires submeter spatial resolution imagery.
In recent years, Unoccupied Aerial Vehicles (UAVs) have become an invaluable remote sensing platform for mapping and monitoring vegetation (Belcore et al. 2021;Beyer et al. 2019;Kattenborn, Eichel, and Fassnacht 2019).UAVs enable the acquisition of spatially continuous data with centimetre-level resolution.They can also be operated at low altitudes (i.e.below clouds).This means that countries with high levels of cloud cover, such as Ireland, can benefit from using UAVs for monitoring (Manfreda et al. 2018).Hence, UAVs can be considered a suitable platform for mapping S. anglica patches.
Previous remote sensing studies for mapping S. anglica used traditional machinelearning classifiers like Random Forest (Proença et al. 2019;Van Beijma, Comber, and Lamb 2014).This classifier can process a large number of explanatory features that have been manually extracted (handcrafted) (Belgiu and Drăgut 2016).However, choosing relevant features can be time-consuming and involves expert knowledge of the characteristics of the plant species.Deep learning (DL) can eliminate the need for these prespecified handcrafted features (Li et al. 2018).DL is a subfield of machine learning that aims to learn complex patterns and automatically extract high-level features from data rather than manually selecting them (Goodfellow, Bengio, and Courville 2016;Li et al. 2018).In many cases, DL requires large amounts of labelled training data to produce a robust model.The acquisition and building of manually labelled datasets can require considerable effort, costs and time.These requirements can be prohibitive, therefore, a DL approach may not always appear feasible.
Data augmentation and semi-supervised learning (SSL) are techniques that can reduce the reliance on large amounts of labelled data.Data augmentation does this by applying transformations to the existing data, thus increasing the amount and diversity of labelled data.Common augmentation techniques involve geometric transformations, including flips, rotations, translations and cropping.More advanced techniques to augment data, including colour space modifications, have recently been developed (Shorten and Khoshgoftaar 2019).In SSL, both labelled and unlabelled data are used to train a model instead of solely using labelled data.Pseudo-labelling (Lee 2013) is an approach to SSL that deals with unlabelled data.It uses a model initially trained on labelled data to make predictions for unlabelled data.These newly labelled data are integrated with those previously labelled, and the model is retrained.This process is then iterated.Since it is tedious and expensive, in terms of cost and time, to label images for DL tasks but cheap to gather unlabelled data, the model takes advantage of both the labelled and unlabelled data.However, the DL studies that have investigated the benefits of augmentation and pseudo-labelling techniques have mostly been related to medical images (Chlap et al. 2021;Mao et al. 2022) and other natural images of common objects (Perez and Wang 2017;Zheng et al. 2020).The effectiveness of applying these techniques to UAV-acquired images for mapping invasive species with an aim to improve accuracy and prevent overfitting, hence improving transferability, has not yet been fully explored.
Semantic segmentation is an image-processing technique that aims to assign a semantic label to every pixel in the image (Yu et al. 2018).This technique has gained significant attention due to its application of DL in many computer vision tasks, such as x-ray image segmentation for medical diagnosis (Ahmed et al. 2020) and road segmentation for self-driving cars (Sharma et al. 2019).It is also becoming popular for processing remote sensing data for many environmental applications, such as vegetation mapping (Balado et al. 2021;Kattenborn, Eichel, and Fassnacht 2019;Osco et al. 2021).In the context of IAS, semantic segmentation could be a foundation for applications such as food security and habitat degradation monitoring.However, due to the need to account for factors such as variable field conditions and plant growth stages, accurate segmentation of plant species presents a challenge in DL.
This study develops an automatic segmentation model for mapping S. anglica using UAV images.The objectives of this study are: (1) to select an effective segmentation network structure by comparing the performance differences of three network hyperparameters: model architecture, encoder backbone and input image patch size, (2) to propose a method of combining data augmentation and pseudo-labelling techniques for segmentation of UAV-acquired images, (3) to assess the performance of the proposed method, and (4) to evaluate the performance of the final segmentation model on a dataset acquired using a different sensor.The aim is that by using this approach, the mapping and monitoring of S. anglica and other invasive species can be substantially automated.

Materials and methods
The methodology is divided into four stages: data collection, data preparation, model development and selection and model evaluation (Figure 2).The first stage consists of acquiring UAV images and field data, and the second stage is transforming these datasets into a format suitable for training a deep learning model.The third stage includes examining the effects of applying different modifications (patch size, architecture, pretrained model, augmentation technique and pseudo-labelling) to the model to facilitate the model selection.The fourth stage evaluates the final model to see its generalization performance when applied to new and unseen data.

Study area
North Bull Island is a wedge-shaped, low-lying coastal 'barrier island' located in the northern part of Dublin Bay on the east coast of Ireland (Figure 3) (Mathew et al. 2019).It is part of the UNESCO-designated Dublin Bay Biosphere Reserve and the North Dublin Bay Special Area of Conservation site (SAC code 000206, approximately 1,474 ha in area) of the Natura 2000 network.This SAC includes four saltmarsh habitats listed under Annex I of the EU Habitats Directive (National Parks and Wildlife Service 2013).However, S. anglica is also present, forming extensive meadows and smaller patches.
In this study, two sites were chosen to assess how well the method would perform in delineating S. anglica.These sites were located on the sheltered (western) side of North Bull Island within the saltmarsh habitat.The first site (Site train ) was chosen for model training and selection, while the second site (Site test ) was for model evaluation.This way, there was no overlap between model training and evaluation datasets to avoid any bias in predictions.

Field data
Extensive field surveys were conducted by an experienced ecologist to record locations of S. anglica point samples using an Emlid Reach RS+ Global Navigation Satellite System (GNSS) device (https://emlid.com/reachrs/).Each point was located at the centre of a homogeneous area of S. anglica cover.The radius of the area around each point, typically 1-10 m, was estimated and recorded by the ecologist.The point data were then converted into polygons using the recorded radii.These polygons were used to guide the preparation of reference data (labels) in Section 2.3.

UAV images
Three field campaigns were conducted to acquire high-resolution (5 cm) UAV imagery, two for Site train and one for Site test .For Site train , both flights (August 2019 and October 2021) used a Micasense Altum multispectral sensor with five spectral bands: blue, green, red, red edge and near-infrared (MicaSense 2019).For Site test , a digital camera DJI Zenmuse P1 (dji.com/ie/zenmuse-p1/specs) was used to acquire imagery in October 2022.Due to different acquisition dates and different camera sensors used, there were variations in the captured images in terms of the appearance and colour of the vegetation (Figure 4).
Each set of UAV imagery was processed using the Structure-from-Motion technique in Pix4D software (Pix4D,Switzerland,version 4.3.33) to generate an orthomosaic.The processing involves keypoint extraction, keypoint matching between images, camera model optimization, geolocation based on GPS flight trajectories and dense point cloud generation.For this study, only red, green and blue (RGB) image bands were utilized in the modelling because RGB images are relatively easy to acquire using cheap hardware and would therefore represent a more financially viable option for invasive plant managers when applying this technique.
The Site train orthomosaics were subdivided into three sets: train, validation and unlabelled.The train and validation sets were used to facilitate the DL training process, while the unlabelled set was for the pseudo-labelling (see Section 2.5.2).On the other hand, the Site test orthomosaic was used to evaluate the generalization performance of the final model (test set).Table 1 summarizes the information on the imagery used in this study.The study used different sensors, hence the ability of the model to generalize across sensors was tested.

Data preparation
The reference data (labels) were prepared by manually delineating S. anglica boundaries (in the form of polygons) in a GIS environment (Figure 5 left).This delineation was based on the acquired field data and used image interpretation.A binary image was created where pixels belonging to S. anglica were assigned a value of 1, whereas pixels representing the background (i.e.exposed mud, macroalgae and other plant communities) were assigned a value of 0. Thus, the input for the modelling was a labelled image with two classes: S. anglica and the background (Figure 5 right).This labelling was done for the train, validation and test sets.

Training parameters and evaluation metric
Semantic segmentation was implemented using the Segmentation models library of Python 3.7 based on Keras 2.9 and TensorFlow 2.9 (Yakubovskiy 2019).The training phase involves passing the image patches into a deep learning network over a certain number of model iterations (epochs).Here, all models were trained in 100 epochs.The goal is to improve the result per epoch by optimizing model weights.The model weights were only considered and saved if the loss value was lower than the value derived in the previous epoch.The Adam algorithm (Kingma and Ba 2015), with a learning rate of 10 À 4 , was used in optimization.Binary focal loss (Lin et al. 2018) was selected as the loss function.Given the binary classification problem (absence of S. anglica = 0, presence of S. anglica = 1), the final layer activation was set to a sigmoid function.We used 50% as the threshold to transform the results into a binary classification image (0-50%: class 0, 50.01-100%: class 1).All computations were performed on a local workstation using a CUDAcompatible GPU with 12 GB of GDDR6 memory (NVIDIA GeForce RTX 3060).
For model comparison and selection, we used the Jaccard index, also known as the Intersection-Over-Union (IOU) metric.The Jaccard index (Jaccard 1908) is a well-known statistic used to measure the spatial similarity between two sample sets.This index is computed by dividing the area of overlap between the model-predicted image and the labelled image (predicted AND labelled) by the area of union between them (predicted OR labelled): The value ranges from 0 to 1, where 0 indicates no overlap between the two, and 1 indicates a complete overlap.This means that the higher the IOU score, the better the model is.The mean IOU (mIOU) score was computed for each class by averaging all IOU scores of that class.

Image patch size and network structure
Image patch size.The labelled train image was split into sub-images called patches, using a sliding kernel with 50% overlap.In this way, we created more training patches.Similarly, the labelled validation image was divided into patches but with no overlap.This experiment aimed to test the effect of image patch size on the model performance.Hence, we prepared two sets of patches of sizes: 128 × 128 and 256 × 256 pixels.As expected, fewer patches were generated with the larger patch size (Table 2).
Model architecture and backbone.A general semantic segmentation network structure consists of an encoder and a decoder.An encoder is used to extract features from an image by downsampling, while a decoder is used to upsample the extracted features to their original image dimension.In this study, three encoder-decoder model architectures were implemented: U-Net (Ronneberger, Fischer, and Brox 2015), LinkNet (Chaurasia and Culurciello 2018) and Feature Pyramid Network (FPN) (Lin et al. 2017), to determine the model network structure to be used in segmentation.For the feature encoder backbones, we considered ten existing pre-trained models initialized with ImageNet weights (Deng et al. 2009) (Table 3).Further, each network structure was trained on the two image patch sizes-128 × 128 and 256 × 256 pixels.Hence, 60 models were trained and compared in this section.Details of each pre-trained model can be found in the cited references (Table 3).
Using mIOU as the evaluation metric, this experiment showed that the U-Net architecture with Inception-v3 as its backbone (U-Net_Inception-v3) and trained on 128 × 128pixel image patches offered the best model performance.This will be further detailed in Section 3.1.Consequently, we used U-Net_Inception-v3 in all the subsequent processing.

Data augmentation and pseudo-labelling
Data augmentation.The train image patches were subjected to augmentation techniques using the Albumentations library (Buslaev et al. 2020) in Python.These techniques were grouped into two categories: position and colour space augmentations.In the first group, position augmentation, the orientation of the features on the image was altered.In this case, a rotation scheme was used.Rotation is a suitable augmentation technique for UAV images, as they can be viewed from any orientation above.In order to avoid border and resampling artefacts due to arbitrary rotation angles, an image patch (Figure 6a) was rotated around its centre using multiples of 90° (90°, 180° or 270°) (Figure 6b-d).This transformation also requires corresponding rotational changes to the labels.The second group, colour space augmentation, allows random changes in the brightness, hue and saturation of an image by applying the colour jitter technique (Figure 6e-h).The purpose of colour jitter is to simulate images acquired in different natural lighting conditions and at different acquisition dates, which will affect the appearance and colour of the vegetation.In contrast to the first group, this technique does not require any changes to the labels as there is no geometric change as there would be with a rotation.Pseudo-labelling.There were 28,564 unlabelled patches of 128 × 128 pixels generated for this study.The model trained using the augmented data was used to predict the labels (pseudo-labels) of all pixels of each unlabelled patch (Figure 7).The labelled dataset was then expanded based on the confidence of these pseudo-labels.To improve the overall performance of the final model, we used a threshold to select only those pseudo-labels that the model was very confident about.Since the model output prediction probabilities from 0 to 1 of S. anglica presence, all predictions with Pr y > 0:95 ð Þ þ Pr y < 0:05 ð Þ > 0:90 were selected and added to the new training set, and the remaining were designated as unlabelled data.The model was then retrained with the new training set consisting of the labelled data and the selected subset of pseudo-labelled data.This new model was then used to predict the remaining unlabelled dataset.This process was repeated until 95% of the unlabelled samples passed the threshold and was used to retrain the model.
For the experiments involving pseudo-labels, we slightly modified the validation set by splitting off samples (n = 100) from the test set and adding them to the validation set, resulting in a new set of validation data (validation aug_pseudo set).This was done because the domain gap (different sensor, site and seasonal variation) between the original validation set and the test set meant that optimizing hyperparameters for the original validation set may result in a model that does not generalize well to domain variations.Including validation samples from the target domain allows for a model with better generalization characteristics to be selected.Note that the remaining test set did not contain any of the samples that were split off for validation.The effects of augmentation and pseudo-labelling on the accuracy of the model will be discussed in Section 3.2.

Evaluation of the final model
The final model derived from the previous section was evaluated on the test set.To investigate the contribution of data augmentation and pseudo-labelling on the overall model performance on the test set, we ran the same experiment, but this time there was no data augmentation or pseudo-labelling implemented on the original train data.

Model performance comparison between architectures and backbones using different patch sizes
Comparing all the models produced in Section 2.5.1,U-Net_Inception-v3 applied on the 128 × 128-pixel image patches yielded the highest mIOU score of 0.832 (Table 4).This was followed by the LinkNet_Inception-v3 model applied on the 128 × 128-pixel image patches, which had a slightly lower mIOU score of 0.830.In general, no pattern was observed in which a specific image patch size led to better model performance.This means that patch sizes of 128 × 128 and 256 × 256 pixels, where 1 pixel = 5 cm, can both be used to recognize and capture contextual information of S. anglica on the images.Instead, the model performance varied depending on the architecture and encoder backbone used.
Aside from mIOU scores, training speed varied among models, ranging from approximately 25 minutes to almost 3 hours (Figure 8).Overall, in the cases of both patch sizes, the U-Net and LinkNet models were trained at similar speeds, whereas the FPN models were the slowest to be trained.Regarding the backbones used, ResNeXt-101 was the slowest to train, followed by ResNeXt-50, ResNet-152 and Inception-ResNet-v2.The remaining backbones  were trained at similar speeds, each approximately three times faster than the ResNeXt-101 training speed.Furthermore, models trained on 128 × 128-pixel image patches were consistently slower to train than those trained on the 256 × 256-pixel image patches, possibly due to the difference in the total number of resulting image patches, as described in Section 2.5.1.
Applying the trained model to a new set of patch-based images was extremely fast and could produce segmented results in less than a second.Figure 9 shows the average inference speed, or the time it took for each model to make predictions on a single image patch, computed over 100 image patches.Comparing all the models, the two VGG backbones exhibited the fastest inference speeds.The U-Net and LinkNet models usually offered faster inference speeds than the FPN models, although the difference was marginal.Despite the 128 × 128-pixel image patches comprising only a quarter as many pixels as the 256 × 256-pixel image patches, they were less than four times as fast.

Data augmentation and pseudo-labelling
Figure 10 shows the performance comparison between models with and without applying data augmentation and pseudo-labelling techniques.The baseline model (without data augmentation and pseudo-labels) achieved a 0.761 mIOU score on validation aug_pseudo data.Applying data augmentation techniques to the initially labelled data improved the model performance, with an increase of 0.070 when using rotations only and a further increase of 0.010 when colour jitter augmentation was applied to the rotated images.Augmentation increased the number of labelled patches fourfold, from 4,092 to 16,368 (+12,276).Pseudo-labelling also contributed to the performance improvement of the model.After iteratively retraining the model with augmented labelled data and unlabelled data with pseudo-labels, its performance slightly improved from 0.841 to 0.851 (Figure 10).However, this modest improvement required the addition of a further 27,570 pseudo-labelled training patches.Figure 11 shows sample predictions on unlabelled data after the first and second iterations.It can be observed that the predictions became more accurate after the second iteration.

Evaluation of the model on test data
The mIOU score when the final model was applied to test data was 0.712 (Table 5).Results also showed a total mIOU score decrease of 0.158 in the model performance when we eliminated both the augmentation and pseudo-labelling techniques from the process (Table 5).Figure 12 shows sample test images and the comparison between their reference labels and the labels predicted by the model.

Discussion
Our study demonstrates that deep learning-based segmentation can accurately map the distribution of S. anglica in UAV imagery.The results are in line with previous studies that have shown the potential of deep learning for high-resolution mapping of invasive species using UAV imagery (Gonçalves et al. 2022;James, Bradshaw, and McMahon 2020;Qian et al. 2020).
The U-Net_Inception-v3 model was selected based on its performance (highest mIOU score of 0.832).It also offered the best balance between training speed (fastest to train with approximately 34 minutes for 100 epochs on a GPU) and accuracy.However, this model exhibited an average inference speed of 61.19 ms per input image-17.94ms slower than the model with the fastest inference speed (LinkNet_VGG-16).The difference in inference speed between the two models is very small; this may not be a problem in most monitoring programmes.While LinkNet_VGG-16 had the fastest inference speed of 43.25 ms, its mIOU score (0.692) was 0.14 lower than U-Net_Inception-v3 (Table 4).Monitoring of an invasive species requires a model with good segmentation results so that the model can accurately determine the spatial extent and rate of expansion of that species.
The application of augmentation and pseudo-labelling techniques helped improve model robustness by increasing the diversity of the training data.The integration of these techniques by James et al. (2020) enhanced vegetation mapping.These techniques are advantageous as manual delineation of irregular plant canopies on the imagery is timeconsuming and tedious.According to our analysis, after applying data augmentation, there was a substantial increase in the mIOU score, however, when a large amount of unlabelled data was subsequently included, there was only a slight improvement in the model performance (Figure 10).This result could be due to a large amount of variance added to the training data by applying data augmentation techniques.For remote sensing, however, acquiring RGB images using low-cost UAVs to generate unlabelled data is relatively simple.Hence, pseudo-labelling can have an important practical significance in improving the segmentation performance as domain-specific labelled samples may be limited.
In general, the time of the year when the image was acquired can affect the model performance.Plant colouration and morphology will change with different seasons of  Inaccuracies in the segmentation could be due to the location of the training and test datasets and potential errors in the delineation of S. anglica boundaries.The train images were acquired from a site where areas dominated by S. anglica were intermixed with Salicornia beds.The test images were acquired from a site where areas dominated by S. anglica adjoined Atlantic salt meadows dominated by common saltmarsh-grass (Puccinellia maritima) and sea-purslane (Atriplex portulacoides).It is likely that some S. anglica patches were not correctly segmented due to variations in species composition surrounding S. anglica between the training and test sites.Even though the features of S. anglica on UAV imagery are quite distinct, performing manual delineation by image interpretation can be subjective and introduce positional errors.This can happen along the edges of plant canopies where image pixels are spectrally mixed with other plants or other environmental background noise, such as mud.In this study, the chance of incorrect segmentation increased along the edges of S. anglica (see Figures 11 and 12).These inaccuracies might affect the training process, thus lowering the predictive accuracy.
A requirement of the presented methodology is a computer that supports the workload associated with deep learning.Computational resources and dataset size both impact the time it takes to train a model.The increasing amount of available remote sensing data means that the demand for computing resources to train models in a reasonable time is also increasing (Chi et al. 2016).Therefore, there is a need for the use of GPUs, which can process multiple computations simultaneously, as these can significantly speed up deep learning operations.As these models are retrained with more datasets and applied to larger spatial extents, there may be a need for either cloud-based services (e.g.Google Colaboratory (Carneiro et al. 2018)) or national centres for high-performance computing (e.g.European High-Performance Computing), which can provide access to computing resources for deep learning projects to accelerate training and inference speeds with minimal computing resources from the users.Moreover, recent deep learning architectures proposed for the segmentation of large, high-resolution images could be explored.An example is MFVNet (Li et al. 2023) which utilizes multiple fields of view to efficiently extract information on images of large spatial extents.In this way, the mapping of S. anglica or other target species could be extended to a regional or national scale.
In Ireland, S. anglica is found in many coastal counties and can grow on mudflats and saltmarshes.Saltmarshes occur in a range of geomorphological contexts: estuaries, bays, fringes, sand flats and lagoons (Curtis and Sheehy Skeffington 1998).Considering this variation in situations where S. anglica can establish, there are opportunities to explore updating our model to generalize well with other locations when more data becomes available.
The techniques used here provided robust results; however, the model accuracy could be improved further.Future studies can consider using more extensive combinations of augmentation techniques and more domain adaptation techniques, such as CycleGAN (Zhu et al. 2017) and Contrastive Unpaired Translation (Park et al. 2020), which can potentially improve the generalization ability of a deep learning model.Cross-domain segmentation techniques that specifically deal with variations in remote sensing data (e.g. when acquired by different sensors or at different geographic locations) could also be explored (Li et al. 2021).Furthermore, future studies can consider using weighted cross entropy with weights reducing for image pixels nearer to the edges of plant canopies (James and Bradshaw, 2019).In this way, potential errors caused by manual delineation of the indistinct nature of plant canopy edges can be minimized.

Conclusion
Developing baseline data on the current extent of invasive species, as well as timely and effective monitoring of their expansion, is crucial for biodiversity conservation and the sustainable management of habitats.Advances in UAV remote sensing and deep learning have provided a huge potential for accurately mapping these invasive species.The results of this study show that the choice of model network structure and the use of techniques which can enhance the size and quality of training data are important decisions when creating robust deep learning models for mapping invasive species.
This study aimed to improve the method for mapping the spatial distribution of S. anglica invading saltmarsh habitats by using deep learning-based semantic segmentation applied to high-resolution UAV imagery.The results indicated that the U-Net architecture with Inception-v3 as the encoder backbone trained on 128 × 128-pixel image patches was the best model in terms of model performance.Applying data augmentation to the initially labelled data increased the mIOU score by 0.08, with a further small improvement of 0.01 after adding pseudo-labels.These techniques improved model robustness.The final model evaluated on a separate test dataset achieved an mIOU score of 0.712, indicating good generalization ability.
For practical purposes, the segmentation model developed in the study can be utilized for spatiotemporal mapping of S. anglica distribution.These high-resolution maps are useful to understand the pattern and rate of S. anglica invasion and potentially to evaluate the effectiveness of control efforts.The proposed methodology is transferable and can be adapted for mapping other invasive species.

Figure 1 .
Figure 1.Spartina anglica (at the back) displaces low-growing native Salicornia plants (in the foreground) (photo taken at North Bull Island, Ireland; August 2021).

Figure 2 .
Figure 2. Workflow using a semantic segmentation deep learning algorithm.

Figure 3 .
Figure 3. Map location of the study area on North Bull Island, Ireland (left) and close-ups of the training and test study sites (right).

Figure 4 .
Figure 4. Comparison between the UAV images taken at Site train (left) and Site test (right).

Figure 5 .
Figure 5. Sample RGB image and delineated polygons of S. anglica boundaries (left).Binary image representing the presence of S. anglica (white pixels) and its background (black pixels) (right).

Figure 6 .
Figure 6.Original image patch (a) and generated images after multiples of 90° rotation (b-d) and colour jitter (e-h) augmentations were applied.

Figure 8 .
Figure 8. Training speed comparison of each model for different combinations of backbone, model architecture and patch size.

Figure 9 .
Figure 9. Inference speed (in milliseconds) on a single image patch for different combinations of backbone, model architecture and patch size.

Figure 10 .
Figure 10.Number of labelled patches to train a model and its corresponding performance represented by mIOU score.

Figure 11 .
Figure 11.Sample RGB images (left) and the equivalent probability images as training iteration progresses.The colour scale indicates the probability that Spartina anglica is present at the location of each pixel (1.0 = present, 0.0 = absent).

Figure 12 .
Figure 12.Sample RGB test images and their corresponding segmentation reference labels and model predictions, with an image indicating incorrectly classified pixels (red).

Table 1 .
Information on the acquired UAV imagery.

Table 2 .
Information about the number of patches for the two patch sizes.

Table 3 .
List of encoder backbones used in this study.

Table 4 .
Mean IOU scores (mIOU)when using different encoder backbones in U-Net, LinkNet and FPN applied on two different image patch sizes.The text in bold indicates the model with the highest performance and its corresponding mIOU score.