Predicting grain protein concentration in winter wheat (Triticum aestivum L.) based on unpiloted aerial vehicle multispectral optical remote sensing

ABSTRACT Prediction models for crude protein concentration (CP) in winter wheat (Triticum aestivum L.) based on multispectral reflectance data from field trials in 2019 and 2020 in southern Sweden were developed and evaluated for independent trial sites. Reflectance data were collected using an unpiloted aerial vehicle (UAV)-borne camera with nine spectral bands having similar specification to nine bands of Sentinel-2 satellite data. Models were tested for application on near-real time Sentinel-2 imagery, on the prospect that CP prediction models can be made available in satellite-based decision support systems (DSS) for precision agriculture. Two different prediction methods were tested: linear regression and multivariate adaptive regression splines (MARS). Linear regression based on the best-performing vegetation index (the chlorophyll index) was found to be approximately as accurate as the best performing MARS model with multiple predictor variables in leave-one-trial-out cross-validation (R2 = 0.71, R2 = 0.70 and mean absolute error 0.64%, 0.60% CP respectively). Models applied on satellite data explained to a small degree between-field variations in CP (R2 = 0.36), however did not reproduce within-field variation accurately. The results of the different methods presented here show the differences between methods used and their potential for application in a DSS.


Introduction
Grain crude protein concentration (CP) is an important baking quality indicator in bread wheat (Triticum aestivum L.). The quantity and also quality, of CP (primarily the proteins glutenin and gliadin) affect gluten formation and the physical properties of bread dough (Gooding and Davies 1997). In many countries, wheat grain intended for milling and baking is sold for a higher price if a certain CP threshold is exceeded. In Sweden, that threshold is often 11.5% protein on a dry matter basis. CP concentration is therefore determined on all grain deliveries, using near infrared transmittance (NIT) sensing. It is also routinely determined in grain samples from winter wheat field trials, using the same technology.
Yield maps from monitors on combine harvesters are already used as a tool for farmers to evaluate crop management and for guidance in precision management in coming seasons, e.g. by splitting fields into management zones (Mulla 1993;Martínez-Casasnovas et al. 2018;Miao et al. 2018). Combine harvesters equipped with NIT sensors for CP mapping during harvest are also available (as described in Taylor et al. (2005) and Thylén and Algerbo (2001)), however, these are not yet as widely used as the yield mapping counterpart. Field zoning based on expected CP is an alternative method. With CP estimation before harvest, there is the option to split fields into harvesting zones with different expected CP in the current year. This would provide the option to exploit the spatial heterogeneity by selling some grain loads as bread wheat for a higher price and other loads as fodder wheat. Such models could also be useful for the grain industry, providing information on available quality at harvest. For example, Freeman et al. (2003) have shown how pre-harvest prediction of winter wheat grain yield and/or protein using the normalised difference vegetation index (NDVI; Rouse et al. 1974) could assist farmers in generating yield maps and reliable product marketing. However, the spatial pattern of CP within fields, and the relationship with for example yield, is complex and can vary substantially between years (e.g. Delin 2004).
Grain proteins are synthesised from nitrogen (N) that is translocated from other plant organs (50-70%) and from ongoing N-uptake during grain filling (30-50%) (Gooding 2009). It has been demonstrated in field trials that CP can be increased when additional N is applied in Zadok's growth stage (Zadok et al. 1974) DC37 or later (Finney et al. 1957;Gooding and Davies 1992;Hamnér et al. 2017;Rodrigues et al. 2018;Hu et al. 2021. Sieling and, even though N use efficiency may be lowered by very late applications (Gooding 2009). If CP could be predicted from spectral reflectance data sensed by cameras mounted on UAVs, satellites, ground vehicles or handheld instruments, CP predictions could serve as decision support for late-season N fertilisation aiming at meeting certain CP targets. Bastos et al. (2021) made a review of CP predictions, including remote and proximal sensing as well as on-combine sensors. They concluded that using on-combine protein measurements generated more accurate predictions than what could be achieved using proximal or remote sensing during the growing season. Prediction modelling of CP based on proximal and remote sensing data is often less successful than e.g. prediction models of yield (e.g. Freeman et al. 2003;Øvergaard et al. 2013; Barmeier et al. 2017). Barmeier et al. (2017) used a hyperspectral field sensor in field experiments during anthesis (DC 65) in malting barley, and used partial least squares regression (PLSR) for developing prediction models of protein content. Validation was done in independent field trials, but the models performed poorly (R 2 = 0.28), indicating the challenges in estimating protein content in harvested grain. Similar to Hansen et al. (2002) they did not find a clear effect of nitrogen fertiliser level on protein content. Prey and Schmidhalter (2019) tested a large number of vegetation indices and their correlation to grain N concentration, and found that only few parts of the electromagnetic spectrum within the visible-near infrared (NIR) region proved to be useful. Bands corresponding to around 780 nm in the lower part of the NIR region, combined with a band in the upper part of the red edge region performed best. Zhou et al. (2021) compared linear regression and some machine learning (ML; Liakos et al. 2018) methods for CP prediction in four fields in Japan, using data from a multispectral UAV based sensor. The results showed that it was difficult to predict protein accurately, although some ML methods seemed to perform better than the linear models. Bastos et al. (2021) reported that most studies on CP predictions in grain based on remote and proximal sensors probably overestimated the model accuracy and precision since the models were often not tested across different spatio-temporal scales. Börjesson and Söderström (2003) showed that the best time for making protein predictions for winter wheat and malting barley (Hordeum vulgare L.) when using spectral data from handheld sensors is at the end of anthesis (Zadok's DC69). Basnet et al. (2003) and Prey and Schmidhalter (2019) also suggested that canopy reflectance derived just after anthesis are best correlated with grain protein content, and Bastos et al. (2021) found that this was most commonly reported in the studies included in their review. The reason may be that it provides information on sources of N available for protein formation through late season N remobilisation to the grain (e.g. Hansen et al. 2002). Söderström et al. (2010) demonstrated that it is possible to map CP in malting barley based on remotely sensed crop canopy reflectance together with ancillary variables. Börjesson et al. (2019) demonstrated that CP prediction in winter wheat could be performed based on a combination of early (DC37) and late (DC73) satellite reflectance data. Their prediction models had mean absolute error (MAE) of <1% CP when tested on independent fields.
The seasonal dynamics of N supply, from fertilisers and from mineralisation of organic matter in the soil in relation to the dynamics of (other) yield-influencing environmental conditions, such as water availability, disease, temperature and radiation, affect the CP content in the harvested grain (see e.g. Gooding 2009). Small on-farm experiments (OFE; Lacoste et al. 2022) called zero-plots and max-plots can be used for determining optimal N fertilisation rates (see e.g. Lory and Scharf 2003;Raun et al. 2011). Zero-plots are strategically placed plots in commercial fields (often 10-100 m 2 ) left without any N fertiliser application, which are used to monitor N-limited crop growth as a proxy for soil N supply. Max-plots are plots with an N-fertilisation level high enough for N not to limit yield, and crop growth in max-plots can be used as a proxy for potential Nuptake or potential yield Piikki and Stenberg 2017). Since it is now relatively common to use zero-plots and sometimes also maxplots in cereal fields, it is interesting to investigate how these plots can be useful in CP prediction modelling. It has been shown, e.g. by Pettersson and Eckersten (2007), that there are differences between grain crop cultivars in terms of both canopy properties (e.g. structure or albedo) that affect vegetation indices and CP levels. Thus, it is possible that cultivar-specific models would perform better than general models. On the other hand, when general models are parameterised based on all cultivars used in a field trial, the resulting larger calibration dataset may give more robust models.
Satellite-based decision support systems (DSS) can reach many farmers and cover very large areas in comparison with handheld or tractor-based sensor systems. In agricultural research, an abundance of data on crop qualities are available from field trials. Sensors carried by unpiloted aerial vehicles (UAVs) are suitable for collecting data in single fields for different crop management purposes (Du et al. 2017). It could be beneficial to use UAVs to collect spectral data in field trials, generate prediction models for crop properties and then apply such models on satellite images used in DSS to upscale the research results in practical implementation.
The aim of the present study was to develop and evaluate prediction models for CP at harvest in winter wheat, based on optical remote sensing data from UAV-borne multispectral sensors acquired in development stages DC69-73, in field trials. In this study we used a selection of vegetation indices based on spectral bands in the red to NIR region which in previous studies have been useful for describing the protein content (e.g. Freeman et al. 2003;Pettersson et al. 2006;Prey and Schmidhalter 2019). For modelling we used both linear regression and multivariate adaptive regression splines (MARS; Friedman 1991). MARS is a non-parametric ML method that in previous studies has been useful when dealing with remote sensing data in relatively small datasets (e.g. Filippi et al. 2014;Söderström et al. 2015). Special attention was paid to validation in order to validate on independent data. Moreover, the models developed were applied on satellite images currently used in DSSs and performance was evaluated against field observations. The following tests were carried out: . Screening for best CP modelling approach, by leaveone-trial-out cross-validation of eight different modelling approaches comprising linear regression models or MARS models with different predictors (combinations of vegetation indices in trial plots with realistic N rates and chlorophyll index in zeroplots and max-plots) . Test of cultivar-specific models, by leave-one-trial-out cross-validation . Test of models on Sentinel-2 data, by comparing predictions with CP values from grain samples collected across five fields.

Sites
The study area was Skåne (Scania) county in southern Sweden (approx. 55-56°N, 12-14°E). Data were used from five field trials, conducted at different locations. Three of the trials were performed in 2019 (at Stora Markie, Tommarp and Alnarp) and two in 2020 (at Lund and Brantevik) ( Figure 1). All field trials were part of an ongoing national trial series (L7-0150, Nitrogen demand of different winter wheat cultivars 1 ). The winter wheat cultivars grown were different between years, and 10 different cultivars were tested in each year. Each trial had six split N treatments (the latest applied in Zadok's development stage DC37). Total N rates were: 0, 80, 140, 200, 260 and 320 kg N ha −1 . The trial was replicated four times. In total, one trial consisted of 240 plots, of approximately 2 × 10 m in size ( Figure 2). The focus in this study was on winter wheat cultivars intended for bread wheat, for which CP is critically important. Six bread wheat cultivars were grown in the field trials in 2019 and 2020 (cultivars: Etana, Hallfreda, Julius, Linus, Praktik and RGT Reform). The data for the satellite application were collected from five sites in the same region (A-E in Figure 1).

UAV measurements
Data were collected with a UAV octocopter (Explorian-8, Pitchup AB, Gothenburg, Sweden), equipped with a MAIA-S2 multispectral camera (Eoptis Srl, Trento, Italy), a global navigation satellite system (GNSS) unit and an incoming light sensor (ILS). The MAIA-S2 sensor has nine bands with centre wavelengths in the range 443-865 nm (Table 1). These bands correspond to nine of the bands in the Sentinel-2 satellite system (ESA, EU) (Nocerino et al. 2017). The UAV was flown autonomously (at 80 m height and flight speed 5 m s −1 ), using a pre-planned flight mission, in DC65-75, covering anthesis to medium milk development stages. Details of the flights are summarised in Table 2. The flights were mostly carried out between 12:00 and 16:30 h local time, when the solar incidence angle varied between about 40 and 55°. Each flight took about 10 minutes, during a period with uniform cloud conditions (clear sky or complete overcast), no precipitation the day before the flight, and not windier than a gentle breeze. At each trial site, 10 reflectance calibration panels (MosaicMill Oy, Vantaa, Finland) measuring 50 cm × 50 cm were placed on the ground (five along each end of the trial). These panels had known near-lambertian reflectance characteristics (2%, 9%, 23%, 44% and 75%) within the 400-900 nm range of the electromagnetic spectrum.
The UAV images were collected with at least 80% overlap both along and between flight lines. During a flight, the UAV's position was logged using GNSS and the incoming light from the ILS was logged at each photo point. Post-processing of images was performed using Multicam Stitcher Pro 1.1 provided by the manufacturer of the MAIA-S2 sensor. This software corrects for geometrical distortion and radial distortion of the raw images and stitches the images of each of the nine bands into one multispectral image. The software also incorporates data from the ILS to radiometrically calibrate the images. The output images had pixel size of about 3 × 3 cm. Orthomosaics were created from the output images using the web application Solvi (https://solvi.ag; Solvi AB, Gothenburg, Sweden). These were downloaded and further processed in ArcGIS (ESRI Inc., Redlands, CA, USA). Using the reflectance panels, a linear function was derived to empirically recalculate the digital numbers of the orthomosaics to reflectance. The median reflectance of each band and for each trial plot was calculated (excluding a 0.2-0.3 m buffer zone along the plot edges). Data on harvested yield (kg ha −1 ) and protein content (CP in % of dry matter) were extracted for each plot in the Nordic Field Trials System (NFTS; https://nfts.dlbr.dk; Danish Technological Institute and SEGES, Aarhus, Denmark). CP in this dataset was determined by FOSS Infratec1241 NIT equipment (FOSS, Hillerød, Denmark). These data were combined with the UAV data for the statistical modelling and analyses.

Statistical analyses and modelling
An initial screening process revealed potential problems with the data from the flight at Alnarp. In this particular field trial there was an unusually large amount of soil nitrogen available (the zero-plot yield was about 10 tonnes ha −1 ) which made treatments in this trial similar and not representative for Swedish conditions, therefore this trial was excluded from analysis. In addition, two plots in the Brantevik trial and 14 plots in the Lund trial were excluded, due to missing data in the NFTS.
Plots where the applied N rate was >0 kg N ha −1 and <320 kg ha −1 were selected for model calibration and validation. The plots with 0 kg N ha −1 and 320 kg N ha −1 were denoted as zero-plots and max-plots, respectively and were not used in general modelling. A remote sensing-based vegetation index from these plots was introduced as a predictor. Reflectance values of different replicates (blocks) were averaged. The initial dataset contained 1184 records, and cleaning, treatment subsetting and block aggregation resulted in 96 records for building predictive models.
The correlations among and between predictors and CP content were explored using the Spearman  correlation coefficient (r). The statistical procedures linear regression modelling (as explained in Hastie et al. 2009) and multivariate adaptive regression splines (MARS, Friedman 1991) were used in this study for building predictive models. Linear regression is a simple statistical procedure of fitting a linear model with one, or more (if multiple linear regression), predictor variables. MARS is a more flexible, non-parametric regression method which can predict using non-linear relationships with multiple predictors.

Vegetation indices
Seven vegetation indices (VIs) were calculated from the reflectance values in the different bands in the electromagnetic spectrum (Table 3). These were: (i) Optimised soil-adjusted vegetation index (OSAVI) (Huete 1988;Rondeaux et al. 1996), which was developed in an effort to minimise soil brightness influence by use of red and near-infrared (NIR) wavelengths. (ii) Red-edge inflection point (REIP), the computed wavelength where the crop canopy reflectance spectrum has its inflection point in the red-edge wavelength region, which is strongly related to chlorophyll content (Reusch 1997). A method to approximate this point presented by Guyot et al. (1988) was used in this study. (iii) Transformed chlorophyll absorption in reflectance index (TCARI) (Kim et al. 1994;Haboudane et al. 2002), which is one of several 'CARI' indices and indicates the relative abundance of chlorophyll. (iv) TC/OS, a ratio introduced by Haboudane et al. (2002) that is very sensitive to chlorophyll content variations and to variations in leaf area index (LAI: defined as half the total area of green elements of the canopy per unit horizontal ground area). This index is not sensitive to altitude. TC/OS has been found to show good results for protein predictions in malting barley (Pettersson et al. 2006). (v) Chlorophyll index (CI) (Gitelson et al. 2003), is another relevant simple ratio index. (vi). Normalised difference vegetation index (NDVI) (Rouse et al. 1974), is a very common NIRvisible-based ratio calculation introduced by Rouse et al. (1974). (vii) Normalised difference red-edge index (NDRE) (Barnes et al. 2000) is calculated in a similar manner to NDVI and includes a red-edge band instead of a red band, making it less sensitive to saturation if biomass is high compared with NDVI. The equations in Table 3 show how these VIs were calculated, with reflectance (ρ) followed by the MAIA-S2 band number.

Modelling strategies
The eight modelling strategies tested (Table 4) were combinations of two different model types, linear regression (a) and MARS modelling (b), and four different predictor sets (1-4). Linear regression models were based on a VI highly correlated with CP content, and MARS models were based on all seven VIs in Table  3. In some strategies, in addition to the VIs, the CI values in the zero-plots (denoted CI-zero) and/or the max-plots (denoted CI-max) were included as predictors. The index CI was pre-selected since this index was found to have the highest correlation with protein in this dataset. Irrespective of model type, the predicted values were constrained within reasonable limits. Predicted CP values below 8% were set to 8% and predicted CP values higher than 13.5% were set to 13.5%.  All seven VIs + CI-zero MARS LTO 3b All seven VIs + CI-max MARS LTO 4b All seven VIs + CI-zero + CI-max MARS LTO All models were evaluated by leave-one-trial-out cross-validation (LTO). Two strategies were also evaluated by comparison with independent crude protein (CP) observations in five production fields. VI, vegetation index; LR, linear regression; MLR, multiple linear regression; MARS, multivariate adaptive regression splines. CI-zero, chlorophyll index in zero-plot; CImax, chlorophyll index in max-plot.

Model cross-validation
To assess the prediction accuracy when applying a modelling strategy to new sites, leave-one-trial-out (LTO) cross-validation was performed. Records were repeatedly split into 'test data' (the record(s) for which a prediction was made) and 'training data' (the records used to parameterise the model). With each iteration, all data from one trial were assigned to the test set and the remaining records were assigned to the training set.
To determine prediction accuracy, two validation measures (mean absolute error, coefficient of determination) were calculated from the measured and predicted N-uptake values. Mean absolute error (MAE) is the average of the absolute prediction errors. Coefficient of determination (R 2 ) quantifies the prediction goodness-of-fit.

Model parameterisation
After LTO cross-validation, the final models were parameterised for each of the eight strategies using all data (no trials left out). In linear regression models (1a-4a in Table 4), all predictors obtained were included in the final model, however in the parameterisation of MARS model predictors with a little predictive power were discarded. Therefore, the final model may not include all predictors. Modelling was also carried out separately for each specific cultivar in the dataset. The cultivar-specific models were validated for the best-performing strategies in the general models.

Model field application
Model application was tested for general models from strategies 1a and 1b (Table 4) using satellite data for five winter wheat fields in Skåne county (locations A-E in Figure 1). These models were implemented on satellite data, Sentinel-2, L2A processing type, from June 25, 2020 (the cloud-free image expected to most closely correspond to DC 69-75). Field data were collected just before harvest in the year 2020. On each sampling location, one sample consisted of seven subsamples (for sampling stability), covering an area of approximately 3 × 3 m. There were eight samples obtained on each field. A total of 34 records remained after removal of sample points with incorrect location references, or those collected very close to field boundaries.

Software
The data were stored in a SQLserver database (Microsoft, Redmond, Washington, USA) accessed via SQL Server Management Studio (Microsoft, Redmond, Washington, USA) and analyses were carried out using R (R Core Team, 2021), including package 'Earth' (Milborrow 2021). ArcGIS 10.7 (Esri Inc., Redlands, California, USA) was used for spatial data analysis and display.

Results
In most compiled predictor sets, there were strong correlations between variables (Figure 3). For example, NDRE75 proved to be highly correlated with three other VIs, CI, TCOS and REIP. The VIs TCARI and TCOS were also very highly correlated, while TCARI and OSAVI showed very low correlation. All except two predictor pairs r was above 0.5. For the indices, the Spearman correlation ranged between −0.90 and 0.96. Wolters et al. (2021) have shown a good relationship with reflectance calculated to CI and N-uptake. In this study, the CI index also had the best correlation with grain CP (r = 0.87) content and was selected as the index to use in linear models.

Performance of general models
A summary of the results from LTO cross-validation for the six different bread wheat varieties is given in Table  5. Using the linear regression method and the best-performing index (CI), R 2 = 0.71 the MAE was 0.64% CP. Inclusion of CI values from zero-plots reduced the accuracy of the model slightly, to R 2 = 0.60 with MAE 0.71% CP. The max-plot CI value (CI-max) did not influence the prediction outcome substantially, R 2 = 0.71, MAE = 0.64% CP. A model with both the zero-plots and the max plot gave R 2 = 0.60, MAE = 0.71% CP for the linear method.
Leave-one-trial-out cross-validation with the MARS method gave lower accuracy in the results. The number of indices selected by the MARS method varied from three to five. Using all seven indices at the start of the procedure resulted in a model with R 2 = 0.50, MAE = 0.90% CP. The VIs: CI, OSAVI, TCOS and REIP were selected in this model. In the MARS method, the model was improved with the introduction of zero-plot CI values R 2 = 0.63, MAE = 0.70% CP. The opposite was the case when max-plots were used for modelling, R 2 = 0.36, MAE = 1.19% CP. The model performance improved again when VI values for both zero-plots and max-plots were introduced to the model R 2 = 0.70, MAE = 0.60% CP. In all models, for both methods, MAE was <1.20% CP.
Prediction results for all eight modelling strategies summarised in Table 5 are also presented in Figure 4. In two cases (1a, 3a), linear models were generally a better fit than the predictions with MARS modelling and in two cases (2b, 4b) the MARS models performed slightly better. In the prediction graphs for linear predictions, there was less variation between sites than in the MARS method graphs. For the site 'Tommarp', predicted values were lower than observed in the first three models (Figure 4).

Performance of cultivar-specific models
Results from LTO cross-validation of cultivar-specific models for strategies 1a and 4b (the best-performing strategies for general models for the two model types, Figure 4) are summarised in Table 6. Using the linear modelling method with CI (strategy 1a), it was possible to parameterise well-performing cultivar-specific models. Cultivar-specific linear models performed better than general models. For MARS models, more variation in the performance of different cultivar-specific models was found (Table 6). In both methods, the varieties 'RGT Reform' and 'Etana' stood out as good-performing varieties.

Satellite application
Model performance evaluation for a satellite dataset on five fields (strategy 1a-b) showed that the linear regression model based on one VI performed better (higher R 2 ) than the multivariate model based on four VIs ( Figure 5). However, it is clear from Figure 5 that the model mainly explained between-field variation in those cases. The CP variation within the fields was small and could not be predicted well by the model. In application of the linear model there was a general prediction bias, with most predictions being higher than the observed values.

Discussion
Field trials and decision support systems Field trials are important for agricultural research. From a practical application perspective, it is advantageous if results can quickly reach end-users, in the form of advice and recommendations . One part of the ongoing digitalisation of agriculture is increasing use of digital DSSs in precision agriculture. Some of these systems are using remote sensing multispectral data derived with satellites, or less commonly, UAVs. It is important to develop different types of relevant models that are suitable for implementation in DSSs. Thus, the aim in the present study was to develop a model for protein prediction at harvest in winter wheat using data from a UAV-borne sensor flown over a number of field trials during two seasons and transfer that model to a satellite data processing DSS and test it on a few fields. This approach, moving from . Leave-one-trial-out cross-validation for the eight modelling strategies (see Table 5). Predicted crude protein (CP) percentage vs. observed CP percentage.
trials to DSS application, revealed opportunities and challenges.

UAV data collection
Ideally, UAV data is collected in all field trials during the same crop development stage, at the same time of day, and under similar weather conditions (e.g. Souza de et al. 2021). In reality this is not always feasible, especially in a project with trial sites located far apart (Figure 1). We aimed for a relatively narrow crop development window (DC69-73; as suggested in earlier studies, e.g. Bastos et al. 2021;Börjesson and Söderström 2003; Prey and Schmidhalter 2019;) however, in one of the trials used in the modelling measurements were made at DC75. Note also that the DC specified is the manually assessed average DC of each trial, although there were some differences within each trial, both random and between varieties and N rates. Since weather conditions were different, the aim was at least for uniform weather conditions where possible during individual flights, although three different types of weather were encountered during the flights: overcast, sunny and hazy (Table 2).

Trial design
The purpose of the trial design ( Figure 2 & Table 2), was to test the response of different wheat varieties to different N rates. This means that the trials were, when possible, located in places where the crop variation was only driven by the treatment (i.e. different N rates and cultivars). In farmers' fields, this of course rarely is the case (Colaço and Bramley 2018). Conventional plot experiments are likely not the ideal method to evaluate variable rate technologies implemented to accommodate the effects of spatial variability in CP. Other factors may limit crop growth to different extents in different parts of a field, such as availability of other soil nutrients or water. A difficulty with the use of reflectance data is that the causes of variation in reflectance from the crop are usually not known. Studies separating e.g. water status from N status may provide valuable context in this case (Reese et al. 2010;Kusnierek and Korsaeth 2015). Table 6. Validation statistics from leave-one-trial-out crossvalidation of linear or multiple linear regression models based on the best single index for strategy 1a (see Table 4) and Multivariate Adapted Regression Splines (MARS) prediction results for strategy 4b.   Table 4) as applied on Sentinel-2 data from the five test fields (locations A-E in Figure 1).

Protein prediction models
Even with the well-known relationship between CP and N rate (Terman et al. 1969;Hamnér et al. 2017;Sieling and Kage 2021), it is challenging to model protein content in field crops (Colaço and Bramley 2018), even when distinct fertilisation steps are present in the dataset (Pettersson et al. 2006). In this study we did not find benefit in using a more complex CP prediction model (MARS model based on multiple indices including zero-plots and max-plots) (R 2 = 0.70; MAE = 0.60% CP) over using the best linear model (based on one vegetation index, CI) (R 2 = 0.71; MAE = 0.64% CP). It appeared that the ML model did not optimise with this small dataset, and the independent validation. Given that it can be difficult to compare validation statistics between different studies, these models seem to perform relatively well compared to a statistical summary of reported research in this field (Bastos et al. 2021). In terms of best-performing index (CI), our study is in line with earlier work (e.g. Reusch 2005; Prey and Schmidhalter 2019) which indicates that an index based on two bands in the NIR-red-edge region is better correlated to N concentration and N-uptake than commonly used indices, such as NDVI and MSAVI2. There were in general small or no improvements in model performance when zero-plots and/or max-plots were included as predictors in the models (Table 5). In practical agricultural production in Sweden, the use of zero-plots and max-plots as a means of improving supplementary N fertilisation rate at DC37-45 in winter wheat appears to be increasing (Hushållningssällskapet 2021). However, data collection was done here at a considerably later stage, when the benefit of zero-plots and max-plots for protein prediction models may be limited. . There was much variation in the model accuracy between the different models including these predictors, and there was no proven benefit in preparing zero-plots and max-plots for this type of CP prediction.
The linear cultivar-specific protein prediction models worked better than the general model (Tables 5 and 6). The LTO validation statistics showed good performance, with the best variety being RGT Reform (R 2 = 0.87; MAE = 0.40% CP). When modelling was performed for each variety individually, the MARS model performed less well, possibly due to the relatively small number of observations used for calibration. When the MARS method was used, the variety 'RGT Reform' again performed best (R 2 = 0.84; MAE = 0.46% CP). With a more extensive dataset, the results would likely have been more clear. The response to N varies between cultivars (Fowler 2003). Some cultivars tend to be both highyielding and at the same time have high CP content, whereas others can be high-yielding but the CP content will be lower. The yield-CP interaction differs among the cultivars tested in this study. For example, the cultivar Etana is a high-yielding highprotein cultivar, and in comparison the cultivar Hallfreda tends to have lower CP at the same yield, whereas the cultivar Praktik has a lower yield at the same CP (Hammarstedt 2021). This interaction is likely important when using canopy reflectance for modelling CP, and modelling yield variation is often easier than modelling CP (e.g. Barmeier et al. 2017).

Transfer of UAV data to satellite data
A question raised in the section on model field application, is whether it is possible to transfer models between platforms. Application of the UAV-based models on Sentinel-2 satellite data revealed that transfer of the model resulted in a low prediction accuracy and bias in the predictions. We used a UAV-borne sensor (MAIA-S2), which sensed bands with similar spectral characteristics as the intended satellite to be used in a DSS (Sentinel-2), however there are likely still discrepancies between the acquired reflectance values from the two sensing systems (shown in e.g. Bukowiecki et al. 2021;Peng et al., 2021;Rasmussen et al. 2021;Sarvia et al. 2021). Panels with known reflectance properties were used in the field trials and the digital numbers were calculated using an empirical relationship derived in each trial. Here we applied the protein prediction models on Sentinel-2 L2A data, which presumes that data from the two platforms are comparable. The prediction accuracy was not good when the models were applied, which indicates that the comparability in the field of data from MAIA-S2 and Sentinel-2 requires further investigation ( Figure  5). An issue with applying models in satellite based DSS is that images from suitable dates (the growth stages for which the model was parameterised) must be available. The protein prediction model in this study was developed based on data collected at crop development stages DC69-75, a relatively large DC range, however a satellite image was not always available for that period. If the model is to be applied over large areas and the crop growth stages in individual fields are not known, the model performance may be limited. Earlier research suggested that spectral models during anthesis could be affected by large phenological shifts during that period, and that the most stable relationships between N concentration and canopy reflectance could be found during milk-ripening stage (DC73; Prey and Schmidhalter 2019). In this case, the DC was not recorded in the fields. The Swedish Board of Agriculture has a website where weekly recordings of DC are reported (https://etjanst. sjv.se/povpub-gui/#/karta?produktionsinriktning= jordbruk, accessed March 10, 2022), and by following information from this page we noticed it is possible that in part of south-western Sweden the DC in winter wheat on June 25, 2020 was slightly higher than DC75.
Validation of the performance of a satellite-based model requires carefulness (e.g. Bastos et al. 2021). Collecting ground truth protein data in a growing wheat crop is difficult if the data are to be comparable to values sensed by a satellite such as Sentinel-2. The Sentinel-2 pixels are 400 m 2 (20 × 20 m) for the bands used in the best-performing models. Ground truth data in the field were subsamples collected within about 3 × 3 m areas. In this case, the sampling locations were not positioned according the outline of the pixels of the Sentinel-2 data. Even if this had been done, minor shifts in the georeferencing of the satellite image, practicalities during field work, and accuracy of positioning of the grain sampling location, still make it difficult to match a sample with a satellite image pixel. The sampling procedure is usually challenging and will inevitably have induced uncertainty in the results. In future research, the procedure of collecting ground truth data with modification will better match to the satellite data. The results reported in Figure 5 should be interpreted with the issues described above in mind. There appears to be some bias in the predictions (see linear model application in Figure 5), with the model over-predicting the protein content compared with the ground observations. If all observations are regarded collectively, the model made a poor prediction on the satellite dataset (R 2 = 0.36) and the within-field variation was not captured by the model. There was no field check done on the crop development stage for the different fields, and therefore the satellite data was selected from regional general information on reported crop growth stages in winter wheat, which will differ between fields. The simpler linear model performed better than the more flexible MARS model. The modelling results of the different methods presented here add to the general knowledge base on CP estimation and protein models will maybe be applicable in a DSS in the future, when a different approach has been tested. Predicting and mapping the variation of protein content in grain crops remains a challenge, as also reported by Bastos et al. (2021). Further work on finding functional methods using proximal and remote sensing is recommended. Note 1. The trial series forms part of Sverigeförsöken. It was conducted by Hushållningssällskapet and funded by Stiftelsen lantbruksforskning. Agronomic data from the trials are publicly available in the Nordic Field Trials System (NFTS; https://nfts.dlbr.dk; The Danish Technological Institute and SEGES, Aarhus, Denmark). More method details on the trials are also available in this database.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Data availability
Data used in this study are not openly available, for inquiry please contact the authors.