Detection of grassland mowing frequency using time series of vegetation indices from Sentinel-2 imagery

ABSTRACT Management intensity deeply influences meadow structure and functioning, therefore affecting grassland ecosystem services. Conservation and management measures, including European Common Agricultural Policy subsidies, should therefore be based on updated and publicly available data about management intensity. The mowing frequency is a crucial trait to describe meadows management intensity, but the potential of using vegetation indices from Sentinel-2 imagery for its retrieval has not been fully exploited. In this work we developed on the Google Earth Engine platform a four-phases algorithm to identify mowing frequency, including i) vegetation index time-series computing, ii) smoothing and resampling, iii) mowing detection, and iv) majority analysis. Mowing frequency during 2020 of 240 ha of grassland fields in the Italian Alps was used for algorithm optimization and evaluation. Six vegetation indexes (EVI, GVMI, MTCI, NDII, NDVI, RENDVI783.740) were tested as input to the proposed algorithm. The Normalized Difference Infrared Index (NDII) showed the best performance, resulting in mean absolute error of 0.07 and 93% overall accuracy on average at the four sites used for optimization, at pixel resolution. A slightly lower accuracy (mean absolute error = 0.10, overall accuracy = 90%) was obtained aggregating the maps to management parcels. The algorithm showed a good generalization ability, with a similar performance between global and local optimization and an average mean absolute error of 0.12 and an overall accuracy of 89% on average on the sites not used for parameters optimization. The lowest accuracies occurred in intensively managed grasslands surveyed by one satellite orbit only. This study demonstrates the suitability of the proposed algorithm to monitor very fragmented grasslands in complex mountain ecosystems. Google Earth Engine was used to develop the model and will enable researchers, agencies and practitioners to easily and quickly apply the code to map grassland mowing frequency for extensive grasslands protection and conservation, for mowing event verification, or for forage system characterization.


Introduction
Grasslands are one of the most widespread ecosystems and they are rapidly changing in extent, structure, and functioning (Zarei et al. 2020;Tasser et al. 2007;Scotton, Sicher, and Kasal 2014). Grasslands functioning and stability are crucial as grasslands cover one-third of the earth's terrestrial surface and 70% of the global agricultural area: they are the basis of many livestock production systems, and provide carbon storage, water purification, erosion control, biodiversity, and recreation (Reynolds and Frame 2005).
Meadow ecosystem services can strongly be affected by management intensity, commonly described using some parameters such as volume of cut grass, number of cuts per year, and nitrogen input levels as fertilizer or manure (Velthof et al. 2014). Among meadow regulating services, carbon storage is often under-considered despite grasslands globally store about 50% more carbon than forests due to their very wide geographic distribution (Conant 2010). Management intensity can affect this trait as intensively managed grasslands are plowed every few years, a practice that releases carbon into the atmosphere (Xiaojun et al. 2010). Many authors found that cut grasslands have a better impact on water quality and water resources than crops, but sewage fertilization can cause an abrupt increase in nitrate leaching (Benoit and Claude Simon 2004). Intensively managed meadows host a limited number of wild bee species (Johan et al. 2020;Klaus et al. 2021) and few rare and specialist plant species (Hilpold et al. 2018). Bird species richness is also negatively correlated with the abundance of intensive meadows, which host mostly generalist species (Assandri et al. 2019). Transformations in grassland management intensity have different drivers and severities in different regions (Oenema, De Klein, and Alfaro 2014).
Changes in the socio-economic drivers on the Alps and other mountain region are exposing meadows to an intensification of their use in lowland areas and an abandonment in marginal areas. In addition, the availability of cheaper feed and forages from lowlands opened the nutrient cycles in many mountain farming systems. Nitrogen loads increased in many areas, despite a decrease of Livestock Units (e.g. −17% between 1980 and 2000 in Europe (Streifeneder et al. 2007;Cocca et al. 2012)). Consequently, the amount of ecosystem services provided by grasslands is decreasing, so that several policies and measures were introduced in many countries to support grassland management and conservation.
Among these, the 2013 EU Common Agricultural Policy (CAP) measure aiming at the conservation of permanent grasslands subsidizes meadow mowing but does not distinguish among grassland management intensities. Many authors warn those untargeted subsidies may lead to further intensification and abandonment (Herzon et al. 2018;Pe'er et al. 2014Pe'er et al. , 2017. However, more targeted policy measures would require updated information about meadow management intensity that was almost unavailable until the last few years as field surveys are very expensive and management intensity data were not required to the farmers by the EU. In the last decades, remote Sensing (RS) has increasingly been used for ecosystem monitoring, as it provides accessible and reliable data at a very high spatial and temporal resolution. RS has been used for land-use classification, biomass estimation, disturbance detection, to monitor seasonal changes, and many other fundamental applications which enable an improved global change impact assessment and comprehension (Drusch et al. 2012). In grassland studies, the number of RS applications significantly increased in the last two decades (Reinermann, Asam, and Kuenzer 2020), spanning from botanical composition, structure and phenology to fodder quality and quantity and management regimes (Wachendorf, Fricke, and Möckel 2018;Kim et al. 2020;Hua, Sirguey, and Ohlemüller 2021;Yan et al. 2020). By finding significant relationships with proper spectral vegetation indices (VIs), researchers were able to create models to assess fundamental grasslands traits. Mowing detection using remote sensing is a very new research field as the first algorithm was developed only in 2010 by Courault et al. (2010) and was based on LAI and NDVI time series derived from FORMO-SAT2 imagery. Other continuous monitoring optical sensors like MODIS (Halabuk et al. 2015;Estel et al. 2018) have been used, although their temporal and spatial resolution limited the application in intensively managed landscapes and in relatively large grasslands parcels (larger than 1 ha). The high temporal resolution and the continuity of synthetic aperture radar (SAR) backscatter data contributed to the adoption of this technology in many studies (Taravat, Wagner, and Oppelt 2019;Siegmund et al. 2019;Grant et al. 2015;Tamm et al. 2016;Zalite et al. 2016;Kaupo et al. 2013;Voormansik et al. 2016), but slope orientation and roughness of the parcels demonstrated to still hinder the detection of mowing events (Mathilde De, Radoux, and Defourny 2021;Wachendorf, Fricke, and Möckel 2018).
Thanks to its higher temporal and spatial resolution, Sentinel-2 (S2) imagery can overcome some of the limitations imposed by the previous optical sensors and has already been used for mowing detection at both regional (Kolecka et al. 2018) and national scales, in combination with Landsat images (Griffiths et al. 2020;Schwieder et al. 2021) and with active sensors (Lobert et al. 2021). The algorithm developed by Kolecka et al. (2018) allowed the correct detection of 77% of mowing events and is based on the detection of drops in the NDVI time series. Cloudy pixels dates are one of the major issues in VI time series (VITS) analysis and can be tackled by cloud masking and by VITS smoothing (Halabuk et al. 2015;Jin and Bing 2013;Garioud et al. 2019). Since Griffiths et al. (2020) and Kolecka et al. (2018) developed their models, S2 cloud masking improved (Frantz et al. 2018) leaving space to further algorithms development and increased efficiency of smoothing processes that perform better on less noisy time series.
Grasslands are changing rapidly and enhanced tools to monitor their management intensity are urgently needed to target conservation measures and actions. Despite the recent improvements in cloud masking of S2 images which provide more reliable high-resolution data, there are still few newly developed algorithms estimating grasslands mowing frequency. A more accurate estimation model could be used to analyze grassland systems in terms of management intensity and to monitor extensively managed grasslands, which are typically associated with high conservation value and high abandonment risk, and therefore in compelling need of targeted subsidies.
Our study aims to develop a model for mowing frequency detection based on VITS analysis and integrating masking, smoothing, resampling, and drop detection processes. Since prior knowledge of management parcel geometries is often unavailable, mowing frequency should be estimated at both pixel and parcel resolution. To make the model affordable to agencies also in mountain areas the model is based on free S2 imagery, does not need local calibration and has been tested with reference data from fragmented and steep grassland areas. The algorithm was developed and can be run on Google Earth Engine platform (GEE) (Gorelick et al. 2017). This platform gives the possibility to build and optimize models testing various VIs with a high computational capacity and to provide local agencies with models which are replicable and easily applicable in different areas.

Study sites
The algorithm was developed and tested on 240 ha of grassland fields located in the Province of Trento (northeast Italy), at the southern border of the European Alps ( Figure 1). The local climate depends primarily on elevation, which ranges from 60 to 3769 m a. s. l., and only secondarily on latitude. The province is classified as temperate oceanic according to the Worldwide Bioclimatic Classification System (Sboarina et al. 2004). Average yearly snow cover duration is between 20 and 40 days at altitudes lower than 1350 m a.s.l., between 50 and 65 days at altitudes between 1350 m a.s.l. and 1600 m a.s.l. (Marcolini et al. 2017).
Grasslands represent one of the main land covers in the province as they occupy 17% of the total area and 81% of the utilized agricultural area. Over 80% of grassland area is managed as pasture and only less than 20% of them are mowed (ISTAT 2010). Pastures are mainly located on the steeper slopes and on high altitude sites and are grazed by cattle in the period between June and September. Mown meadows are distributed at the valley's bottoms (where three or four cuts are carried out per vegetative season), on valley sides (one or two cuts) and on high-altitude plateaus (only one cut). Due to a fragmented property structure, the management is very patchy. Mowingparcels are usually smaller than one hectare and the width of the parcels is often less than 30 m.

Field data
The reference data of mowing frequency used to optimize and validate the algorithm cover 240 grassland hectares at four sites (i.e. Lusia, Predazzo, Viote, Vigolana) and store information about the number of mowing events occurred in each meadow parcel during 2020 (Table 1). For the sites of Vigolana, and part of Lusia and Predazzo the information was acquired through farmers interviews. Farmers were asked to point on a map the exact location of the meadow they manage, to draw the parcels on a very high resolution RGB image (Bing Maps) and to indicate the number of mowing events they performed during 2020 on each management parcel. Dates of mowing were not asked because we were interested only in mowing frequency and not in temporal accuracy, which is anyway at least partially lost during smoothing and resampling processes. For the site of Viote and part of Lusia and Predazzo photo interpretation on RGB daily Planet imagery (Planet Team 2018) at 3 m spatial resolution and visual inspection of a break in the NDVI curve were used to manually define the mows. All Planet images covering at least partially the study areas during the growing season were downloaded, resulting in an observed day every 1.39 days, 1.43 days and 1.64 days at the Lusia, Predazzo and Viote sites, respectively. To limit the edge effect, we selected only parcels large enough to contain a square of side 20 m which is twice the highest spatial resolution of S2 NIR and visible spectral bands. The parts of the parcels polygons narrower than 10 m were also removed from the dataset. To avoid mixed pixels all the parcels were shrunk by 5 m using the buffer tool in Qgis (QGIS Development Team 2021). Pastures and grazed meadows were identified through farmers' and local experts' interviews and were not included in the dataset. The average size of the (unshrunken) parcels ranges from 3689 m 2 at the Vigolana site to 15,000 m 2 at the Viote site.
The Lusia and Viote sites are located at altitudes higher than 1200 m a. s. l. and are therefore managed very extensively, with zero to two mowing per year with uncut corresponding to meadows not mown in last few years and still not colonized by woody vegetation. The Predazzo and Vigolana sites, on the other hand, are located at lower altitudes and managed more intensively, with one to four mowings per year. The slopes in the four considered sites are quite shallow, with average parcels slope of 10°, 4°,9°,12° at the Lusia, Predazzo, Viote, Vigolana site, respectively.

Imagery data
Level 2A multispectral satellite data acquired by the Sentinel-2 (S2) constellation accessed through Earth Engine Data Catalog were used in this study. The S2 images are characterized by 13 bands distributed in the visible, near infrared and shortwave infrared parts of the spectrum. Four bands are characterized by a 10 m spatial resolution (bands 2, 3, 4, 8), six by a 20 m spatial resolution (bands 5, 6, 7, 8A, 11 and 12) and three by 60 m spatial resolution (bands 1, 9 and 10). The S2 mission manages two identical polar orbiting satellites which survey earth from an altitude of 786 km. Their revisiting time is five days at the equator but nearer to the poles the orbits overlaps and therefore the revisiting time is shorter in overlapping areas. The sites of Lusia and Predazzo are surveyed by one orbit (i.e. orbit 22) and their revisiting time is five days. The sites of Vigolana and Viote, instead, are surveyed by two orbits (22 and 65) and their revisiting time is between two and three days.
Only images acquired during the 2020 growing season were considered. To set the start and end of growing season we followed the references proposed by Gensler (1946). The start of grassland growing season can be set when the daily mean temperature determined on multiple year time series reaches 7.5 C° and the end of the growing season can be set at 5 C°. In our study we divided grassland parcels -according to their altitude-in low altitude grasslands (<1200 m a. s. l.) and highaltitude grasslands (>1200 m a. s. l.). We set the start of the growing season respectively on April 15 th and on May 15 th and the end of the growing season on November 15 th and on October 15 th .

Methods
In Figure 2 a scheme of the proposed mowing detection algorithm is shown. In the following subsections every step is described in detail. The entire workflow was implemented in GEE (Gorelick et al. 2017).
In summary, the algorithm is based on the analysis of the VITS. Between the beginning and the end of the vegetative period, there are one or more growth periods and mowings of the grass. During the grass growth, the value of the VI increases until the farmer performs the cut which causes a sudden decrease in the index value. The number of sudden decreases in the index value followed by the slow increase in the index value represents the mowing frequency that the algorithm is expected to determine. The algorithm development was carried out at the pixel level. In a final step, the best model developed at the pixel level was tested for accuracy at the parcel level.

Masking and vegetation index computing (phase 1)
S2 image pixels with cloud probability higher than 5% were discarded as cloudy pixels provide erroneous index values, which the algorithm could erroneously interpret as a mowing event. The adopted cloud probability was the one created with the S2-cloud-detector library and was provided as pixel property for each image in GEE. We masked out pixels with snow probability higher than 5% using MSK_SNWPRB band, distributed by ESA, and pixels identified as cirrus clouds and as shadows by the SCL band, distributed by ESA. Six VITS (EVI, GVMI, MTCI, NDII, NDVI, RENDVI 783.740 ) were computed in GEE based on the formulas described in Table 2. VIs were chosen based on recommendation proposed by Davidson, Wang, and Wilmshurst (2006), Imran et al. (2020) and Reinermann, Asam, and Kuenzer (2020) and are presented in Table 2. All the S2 bands used for VI computation were resampled in GEE using nearest neighbor method to the resolution of the NIR and red band (10 m).

Smoothing and resampling (phase 2)
Omissions in cloud masking result in erroneous VIs values and a simple drop-detecting algorithm could wrongly consider the drops as mowing events and therefore rise the commission error. A smoothing process is therefore needed (Hird and McDermid 2009). We applied a running-median smoother to the raw VITS to overcome abrupt drops in the time series caused by unmasked cloudy observations (Jin and Bing 2013).   A running-median smoother is resistant to outliers as it efficiently removes the invalid low values caused by unmasked cloudy observations. In the preliminary analyses of our study, we found that in low-intensively managed grasslands, VI values can fluctuate during the summer for reasons that are different from mowing and are usually referable to water stress and heat stress. As we are interested only in major drops caused by mowing events, we reduced the temporal resolution of the time series using a fixed Resampling Interval (RI) to obtain Resampled Time Series (RTS). RTS is computed by defining a dates' list starting from half RI after the start of the growing period date and prosecuting with dates at RI intervals until the end of the growing period. To each date in the list the mean of STS values falling in NDW before and after each point is assigned. Raw and derived vegetation index time series of a grassland pixel are displayed in Figure 3 as an example.

Mowing detection (phase 3)
Mowing events cause remarkable drops not only in the VITS (Stendardi et al. 2019), but also in the RTS, so we set the condition that a local minimum (i.e. an index value lower than previous value and following value) should reach a minimum drop (DROP) from the maximum in the last two points (MTS) of RTS to be interpreted as a mowing event. To obtain the Pixel Resolution Mowing Frequency (PiMF), the algorithm detects and counts the mowing events (defined as local minimum in the masked, smoothed, resampled time series) which cause a minimum drop from previous values. The two conditions are mathematically stated in equation 1, where RTS is the Resampled Time Series, MTS is the Maximum Time Series and DROP is an optimized parameter that defines the minimum percentage difference between MTS and RTS to identify a mowing event.

Majority analysis (phase 4)
To reduce the "salt and pepper" effect in the final mowing events map, we performed a majority analysis using a 3 × 3 pixels kernel, obtaining the Corrected pixel Mowing Frequency (CPiMF). This operation removes abnormal frequency values of some pixels (noise) from PiMF replacing them with values calculated from the majority of their neighboring cells. We have chosen a small kernel size (3x3 pixels) because the scale of management was often as small as a few pixels (Qian, Zhang, and Qiu 2005). Then, for each parcel we calculated the mode of the CPiMF of its pixels and obtained the Parcel Mowing Frequency (PaMF).

Design of experiment and accuracy assessment
To define the most accurate prediction algorithm and to measure its generalization capability, the following four experiments were carried out: (1) Experiment 1: parameters optimization and vegetation index choice. The accuracies obtainable using different VIs were tested, optimizing the three parameters (NDW, RI, DROP) at the four sites. Parameters' levels to be tested were chosen based on preliminary analyses which revealed the ranges in which the most promising accuracies could be obtained in our study sites. We included 7 values of DROP (from 0 to 0.35), 15 values of RI (from 6 to 20), 5 values of NDW (from 6 to 10). We defined the best general optimization as the combination of parameters that gives the lowest Mean Absolute Error (MAE). We computed the mean MAE across all four sites for each parameter combination and we chose the parameter combination that determined the lowest MAE, as visually described in Figure 4. The MAE measures the average error regardless of its sign and gives the magnitude of the error in the same unit as the prediction, in this case, the mowing frequency (Lobert et al. 2021;Congalton and Green 2009). The MAE was computed as follows: (1) where n is the number of pixels, Ŷ is the predicted mowing frequency and Y is the reference mowing frequency. Overall accuracy was also computed and reported. The overall accuracy is simply the sum of the major diagonal (i.e. the correctly classified sample units) divided by the total number of sample units in the confusion matrix (Congalton and Green 2009). We used the function ee.FeatureCollection. errorMatrix implemented within the GEE platform to obtain the error matrix, ee.
ConfusionMatrix.accuracy to obtain the Overall Accuracy, whereas the MAE was computed in GEE following equation 2. To understand how much accuracy is lost by generalizing the optimization of the algorithm, we compared the accuracies of locally optimized (separately in each site) algorithms and globally optimized (on all four sites together) algorithms ( Figure 4).
(2) Experiment 2: testing of algorithm phases. All phases of the algorithm were tested in order to see if they were necessary to remarkably increase its accuracy. The phases presented in Figure 2 were therefore combined as presented in Table 3.
At each phase combination, the parameters optimized in the previous phase combination were reoptimized, as their best values could change in the new phase combination. The optimization was performed finding the best parameter combination for all study sites (global optimization). In order to carry on such experiments, some adjustments were necessary to the algorithm code at the first phase combination. In the algorithm considering only phases 1 and 3, a daily time series was built by applying a linear interpolator between each valid observation. MTS were computed taking the maximum value in the N Days Backward (NDB). The best NDB was optimized, choosing between values in sequence from −30 to −5 with step 5.
(1) Experiment 3: comparison of pixel and parcel resolution accuracy. We calculated the mode of each parcel's CPiMFs obtained with the best globally optimized model. The accuracy of the resulting PaMF was compared to that of CPiMF.    (2) Experiment 4: generalization error estimation. We performed a spatially stratified k-fold cross validation, using the sites as stratification layer to decrease correlation between optimization and validation pixels. We iteratively optimized the parameters of the model on three sites out of four, and we measured the accuracy obtained at the fourth site. We averaged the four accuracies obtained on the "left-out" site and we compared it to the four accuracies obtained on the optimization dataset to estimate the generalization capabilities of the proposed model.

Masking processes
The number of available unmasked observations per pixel ( Figure 5) considerably varied in the four study sites, depending on the topographical and geographical location of the study site, on the number of cloudy days and on the length of the growing season. The topography -especially the altitude-affects cloud distribution and snow persistence and therefore also the spatial distribution of valid observations (i.e. cloud, shadows, and cirrus free). In addition, altitude is the main determinant for the length of the growing season and therefore for the total number of dates to be considered. The average interval between unmasked observations is 9.19 days, 8.33 days, 4.15 days, and 4.25 days, respectively at the Lusia, Predazzo, Viote, Vigolana site. Sites located in the west of the province (Vigolana and Viote) are revisited every two to three days, so their time series is denser than the time series of sites located in the east of the province (Predazzo and Lusia), surveyed by one orbit only. The percentage of unmasked observations (average of site pixels) is similar in all four sites, between 57% (Lusia site) and 61% (Viote site).

Parameters optimization and vegetation index choice
NDII was chosen as the VI of the final algorithm, as it performed better than all other VIs, resulting in a MAE of 0.07 (average of all sites; Figure 6). Also, GVMI and NDVI performed quite well, with a MAE of 0.09 and 0.12, respectively (average of all sites). In Figures 7, 8, the accuracies of NDII models with different DROP, NDW, RI across the four sites are reported. Viote and Vigolana sites generally showed a higher accuracy compared to Lusia and Predazzo and parameters optimization affected in different ways the results across sites. For NDW higher than 9 days there was a strong increase in MAE for the Lusia site, while MAE decreased in the Viote site. Predazzo site performed better with shorter RI, while Vigolana and Viote with longer RI. DROP did not strongly influence the accuracy on all sites except the Vigolana site, where there was an increase in MAE for DROP higher than 20%. The best global (for the four sites together) optimization for NDII was: DROP = 15%, NDW = 9 days, RI = 11 days. In all four sites the commission and omission error were quite balanced, resulting in a percentage of pixels with predicted mowing frequency higher than reference mowing frequency similar to the percentage of pixels with predicted mowing frequency lower than reference (Figure 9). In all four study sites, the overall accuracy was higher than 85% and the MAE was equal or lower than 0.16 ( Figure 6, Table 4). The MAE  obtained using locally optimized algorithms was slightly lower than MAE obtained using globally optimized algorithms in all four sites ( Figure 10).
In Figures 11 and 12, reference and predicted mowing frequency maps of the Predazzo and Viote sites are displayed. Predicted values were obtained using the best global optimization.

Testing of algorithm phases
In all the four sites, the algorithm which included all phases was by far the one that provided the lowest MAE (best results displayed in Figure 13).
In the optimization of the algorithm involving only phases 1 (VITS preparation) and 3 (drop and local minimum detection), a DROP of 75% from previous values and a NDB of ten days proved to be the best optimized parameters on average. The Viote and the Vigolana sites performed better with a higher DROP, with the lowest MAE (0.65 and 0.79 respectively) in correspondence to a DROP of 95%. The Lusia and the Predazzo sites performed better with a lower DROP, with the lowest MAE (0.25 and 0.39 respectively) in correspondence to a DROP of 60% and 65%. Adding the smoothing and resampling phases to the algorithm contributed to a substantial improvement in accuracy across all four sites, and the best results on average were found using a DROP of 15%, an NDW of 9 days and a RI of 12 days. The final algorithm which also includes the majority analysis (phase 4) gave the best results and the best global optimization parameters were DROP = 15%, NDW = 9, and RI = 11.

Comparison of pixel and parcel resolution accuracy
In the Viote and Vigolana sites, the aggregation of the parcel CpiMPs to PaMF through the mode rule did not impact the accuracy, while at the Lusia and Predazzo it determined a small increase in the MAE (Figure 14). On average at the four study sites we obtained a 90% overall accuracy and 0.10 MAE, and only 30 parcels out of 301 which were not correctly classified (Table 5).

Generalization error estimation
The average MAE obtained on the validation dataset (0.12) was almost double than the average MAE obtained on optimization dataset (0.07) but it is still very low, indicating that just approximately one pixel out of ten was wrongly classified (Figure 15). The third iteration (the one excluding the Viote site from the optimization dataset) gave the highest MAE on the optimization dataset and the lowest MAE on the validation dataset, whereas the second iteration (the one excluding the Predazzo site from the optimization dataset), on the opposite, gave the lowest MAE on the optimization dataset and the highest MAE on the validation dataset.

Discussion
Mapping mowing frequency over complex landscapes in mountain areas is crucial to inform conservation and management policies but is challenging as imagery with high spatial and temporal resolution is needed. In this study, we developed a new mowing detection algorithm based on S2 imagery in GEE, optimized and validated in four study sites. Results indicate that it can be successfully used, as the MAE of the complete model is 0.07, while the overall accuracy is 93% on average at the sites used for optimization and 0.12 and 89%, respectively, at sites not used for optimization.

Novel aspects
In addition to the use of S2 imagery for mowing detection, that has been exploited a few times so far, the major novel aspects of the present study are that very small and fragmented parcels were used as reference, that the algorithm works at pixel level and that the algorithm can be run using a provided ready to use code working on one planetary-scale cloud platform using free imagery.
Reference dataset consists of particularly small and fragmented hay meadows that are typical of mountain areas, whereas previous research work focused mainly on much more homogeneous landscapes, so their accuracy in complex landscapes was therefore not tested. In our study, 53% of unshrunken parcels Figure 13. MAE of algorithms with increasing complexity. NDII was used as vegetation index. For phase description see Table 3. The values indicate the highest accuracy obtained with global optimization. Pixel level. were smaller than 0.5 ha and 78% were smaller than 1 ha whereas in (Garioud et al. 2019) the parcel average size was 5.1 ha and in (Griffiths et al. 2020) all parcels were larger than 1 ha. Only Kolecka et al. (2018) included parcels with sizes comparable to the ones we included. The analysis concept in our study builds on the pixel level because in most of grassland systems management parcels, defined as grassland unfragmented parcels mowed on the same days, are not available a priori, as administrative boundaries often differ substantially from real field limits (Inglada et al. 2012). An algorithm working at pixel level can analyze a meadow system without prior information about management parcels and without losing the possibility to aggregate later the results at parcel level as we did at paragraph 3.4, which is very useful for various purposes such as subsidies granting and fertilization plan development. The accuracies obtained at parcel level were slightly lower than at the pixel level at the Lusia and Predazzo sites, where the size of the erroneously classified parcels was on average 35% and 50%, respectively, of a similar size to the correctly classified parcels. Also, in the other sites the wrongly predicted parcels are on average much smaller than the correctly predicted, 13% the size of correctly predicted parcels at the Viote site and 26% the size of correctly predicted parcels at the Vigolana site. This accuracy reduction was probably due to residual (after the initial edge pixels elimination by buffering procedure) edge effect of mixed pixels.
The development of our algorithm in GEE allowed us to access, process and display S2 data on only one platform and will allow algorithm's users to easily run the code on continuously updated imagery and in other regions, without needing to download and process the imagery. It is not possible to report exact computation times in GEE because they vary in each run, as the system handles resource allocation and parallelism. As an example, however, less than one minute is needed to compute and display the mowing frequency of hay-meadows in 20 km 2 , and 2 minutes for exporting the mowing frequency map in ".tiff" format.
Mowing frequency maps produced adopting the proposed algorithm can be a valuable and reliable tool to identify extensively managed meadows needing protection and conservation measures. They can be used also to remotely verify that a mowing event occurred, which is frequently required not only to obtain CAP subsidies, but also to characterize forage systems at a regional level, using mowing events as a proxy to estimate nitrogen removal and forage production (Griffiths et al. 2020).

Accuracy and generalization capability
The possibility to reliably apply the algorithm to other areas after appropriate testing is suggested by the results of the k-fold cross validation. The average MAE obtained on the sites excluded from parameters optimization process is 0.12, with an overall accuracy of 89% on average. Also, local and global optimization did not give significantly different results, indicating that the algorithm is very flexible and that globally optimized parameters perform well in various different situations.
Some previous studies using SAR data reported an overall accuracy in mowing detection of 86% (Taravat, Wagner, and Oppelt 2019) using artificial neural networks from a set of Sentinel-1 derived variables, but models were trained and tested on just ten intensively managed parcels. Grant et al. (2015) reached a detection rate of 74% and Mathilde De, Radoux, and Defourny (2021) correctly identified only 56% of grasslands. The highest overall accuracy reported using optical sensors is 85% and was obtained by Halabuk et al. (2015) as the best result of a cut-uncut classification in extensively managed grasslands. In a time series analysis approach Kolecka et al. (2018) using a drop-detection algorithm achieved an overall accuracy of 77% of correctly detected mowing. Estel et al. (2018), detecting local minima in a MODIS NDVI time-series, correctly identified 80% of mowing frequency. The combination of Sentinel-2 and Landsat-8 imagery resulted in denser time-series which were analyzed by Schwieder et al. (2021) using machine learning algorithms and leaded to a mean absolute percentage error between 35% (2020) and 40% (2018) whereas combining active and passive imagery Lobert et al. (2021) obtained a MAE of 0.369, 0.321, 0.420, 1.44 on grasslands with one to four mowing events per year, respectively. Our work, which benefits from S2 high temporal resolution, novel cloud masking and smoothing and resampling processes, reached a MAE of 0.12 and an overall accuracy of 89% on average on the sites excluded from parameters optimization process.

Parameters optimization and vegetation index choice
Building a mowing frequency reference dataset from optical imagery is not a straightforward task because of lack of temporal resolution caused by cloudy observations (Halabuk et al. 2015). In our experience the temporal resolution provided by Planet imagery was sufficient to detect mowing events, probably because farmers do not perform mowing in cloudy periods and wait a clear-sky window of at least 2 days to perform mowing. Furthermore, during summer clouds are much more common in the afternoon than in the morning (Whitcraft et al. 2015), when the Planet images are acquired.
NDVI is by far the most used index in previous studies about grasslands management and intensity and it describes the difference between reflectance in the red and near infrared regions (Reinermann, Asam, and Kuenzer 2020). In our study, however, NDII gave the best MAE, on average 0.05 points lower than NDVI. NDII is computed as the normalized difference between the red and the SWIR region, a wavelength that is sensitive to leaf water content. Observing NDVI and NDII profile we found that mowing events cause much more remarkable drops in NDII than in NDVI and that NDVI saturates before NDII as biomass increases. The canopy water content and canopy structure traits change strongly during the mowing event determining a wider range of NDII values. The wider range of values NDII can assume compared to VIs sensitive to chlorophyll content proved to result in a higher algorithm's accuracy. The increase in accuracy provided by the SWIR wavelength is higher than the decrease caused by the lower spatial resolution of the Sentinel SWIR band (20 m) compared to NIR band (10 m). GVMI, which is computed using the SWIR 2 band (2.2 µm), gave accuracies that are comparable to that obtained using NDII (average MAE = 0.09 vs 0.07), whereas EVI, MTCI and RENDVI provided lower accuracies although they have widely been used in remote sensing of grassland biophysical parameters (Sakowska, Juszczak, and Gianelle 2016;Reinermann, Asam, and Kuenzer 2020;Imran et al. 2020;Halabuk et al. 2015).
The sites surveyed by two orbits (Viote and Vigolana) gave much higher accuracies. In these sites the shorter revisiting time provides a denser time series that is less affected by missing (cloud masked) observations. Only at the most intensively managed site, the Predazzo site, the MAE increases using longer RIs, whereas in extensively managed sites like the Viote site the MAE decreases for longer RIs. The longer RI in extensively managed sites reduces the possibility of false detections caused by VI fluctuations, whereas in intensively managed grassland is not able to describe the quick development of grassland biomass and cover.

Testing of algorithm phases
The complete algorithm -which includes all the four phases-provided the highest accuracies and was therefore chosen. The smoothing and resampling phases proved to be crucial to diminish the effect of invalid low values caused by unmasked cloudy observations and by small fluctuations of index values. The simple drop detection algorithm (phases 1 and 3) gave very low accuracy for example at the Viote site where there were three unmasked cloudy observations that caused abrupt drops in VITS that were detected as mowing events. These unmasked cloudy observations were smoothed by the running median, and the Viote sites is the one with the highest accuracies using the complete algorithm.
Following the experience of Halabuk et al. (2015) who obtained lower accuracies classifying cut and uncut hay meadows by smoothing NDVI and EVI time-series using the Fourier adjustment, smoothing processes were not included in past models because of the risk of losing small fluctuations which might be linked with mowing (Bekkema and Eleveld 2018). Also Jin and Bing (2013), proposing a temporal smoothing algorithm, advise that smoothing may not be suitable for modeling anthropogenic activities where an abrupt drop of the NDVI value reflects the actual situation rather than contamination. The running median smoother used in our work, however, proved to fix single invalid low values caused by unmasked cloudy observations. S2 temporal resolution proved to be sufficient to provide proper values and therefore to describe biomass evolution under the considered management intensities. Majority analysis significantly improved the algorithm accuracy, decreasing the MAE from 0.13 to 0.7 on average. Isolated pixels fixed by majority analysis are mainly located at parcel edge and in areas possibly shadowed by surrounding woodlands. Small and isolated trees were found to be one cause of "salt and pepper effect" at the Viote site. On more productive grasslands, lodging could be a possible cause of patchy anomalous mowing frequency values. Lodging is not a rare phenomenon on productive grasslands and can significantly alter grasslands structure and physiology, and therefore their spectral signature.

Limitations and further improvement
The major limitations of the presented algorithm are the spatial resolution, the prior management type detection, the lack of temporal accuracy, the reliability in areas with very different phenology. The spatial resolution of S2 imagery limits the accuracy of the algorithm in very fragmented parcels and in long and narrow management parcels which frequently occur in mountain areas. The algorithm was tested only on hay-meadows, and the process does not include a prior management type detection (grazed, mowed, mixed) like the one presented by Dusseux, Corpetti, and Hubert-Moy (2013). Grazed parcels should therefore be avoided, as grazing events with large stock density could be interpreted as mowing events.
The algorithm is specifically designed to detect annual grassland mowing frequency and not to predict mowing dates. In fact, smoothing and resampling phases improved impressively the algorithm accuracy, but changed the temporal resolution so that resampled dates can not be used to define precise mowing dates.
The algorithm should be tested and probably adapted before use in areas with very different climate and phenology. In Mediterranean grasslands, for example, the growing season is limited by high temperatures and the sudden decrease of water content that may occurs at the start of summer may be wrongly interpreted as a mowing event by the algorithm. In more cloudy regions, on the other hand, the lower density of the time series could affect algorithm accuracy, since the algorithm has been tested only in sites where the average number of days between uncloudy observations ranges from 4.15 (Viote site) to 9.19 (Lusia site).
Further improvement and optimization of the algorithm could be the inclusion of a classification algorithm for detecting management parcel geometries, the type of grassland management and the automatic definition of the start and the end of growing season at a pixel-size resolution (Jönsson and Eklundh 2004). A pixel-level automatically defined growing season would avoid the necessity to manually define growing season based on available climatic data and would model a growing season more similar to real one in each grassland pixel. As the frequency of cloudy masked and invalid unmasked pixels proved to considerably affect the algorithm accuracy and algorithm performed less well when the mowing frequency was higher (Predazzo), its validation in other climatic regions would be important. In addition to these, orbit overlap giving better results suggests that accuracy could be improved by increasing the density of the time series either by adding optical sensors (Griffiths et al. 2020;Lobert et al. 2021;Stumpf et al. 2020) or by multimodal approaches (Garioud et al. 2019; D'Andrimont, Lemoine, and Van der Velde 2018).

Conclusions
This study assessed the potential of a new algorithm based on S2 imagery time series for detecting mowing events. Using reliable reference data obtained by Planet daily imagery and farmers interview, it was possible to test several vegetation indices and processing phases. Masking, smoothing and resampling phases and optimization of algorithm's parameter allowed to correctly identify the mowing frequency in 93% of the pixels, with a MAE of 0.07 on average, and 90% of parcels were correctly classified (Overall accuracy at parcel level) on sites used for optimization. NDII performed better than other indices probably because it assumes a wider range of values before and after mowing events.
The low MAE obtained on the sites excluded from parameters optimization process (MAE = 0.12, overall accuracy = 89%) suggest that the developed algorithm may be applicable on other grassland areas, and new studies are needed to confirm this. The code was developed in GEE, a platform that can access and process continuously updated images worldwide, so that agencies and practitioners can easily run the algorithm as only start and end of growing season, and hay-meadows parcel geometries are required as an input parameter. The resulting mowing frequency maps can inform grasslands conservation and management policies by identifying extensively managed grasslands.