Crop identification by massive processing of multiannual satellite imagery for EU common agriculture policy subsidy control

ABSTRACT The early and automatic identification of crops declared by farmers is essential for streamlining European Union Common Agricultural Policy (CAP) payment processes. Currently, field inspections are partial, expensive and entail a considerable delay in the process. Chronological satellite images of cultivated plots can be used so that neural networks can form the model of the declared crop. Once the patterns of a crop are obtained, the correspondence of the declaration with the model of the neural network can be systematically predicted, and can be used for monitoring the CAP. In this article, we propose a learning model with neural networks, using as examples of training the pixels of the cultivated plots from the satellite images over a period of time. We also propose using several years in the training model to generalise the patterns without linking them to the climatic characteristics of a specific year. The article also describes the use of the model in learning the multi-year pattern of tobacco cultivation with very good results.


Introduction
The European Union (EU) allocates 38% of its annual budget to the Common Agricultural Policy (CAP), taking into account that the multiannual financial framework proposed by the Commission for the period 2021-2027 stands at 1.13 trillion euros (European Commission, 2019). The CAP, which is the oldest policy in the EU, is based on a system of subsidies and agricultural programmes that aims to provide affordable, safe, and highquality food; guarantee an equitable standard of living for farmers; as well as conserve natural resources and respect the environment. In the EU there are 12 million farmers who cultivate 48% of the land to feed a market consisting of 500 million consumers, generating 44 million jobs throughout the food chain. Both the significant budgetary effort and the strategic importance of the agricultural sector make it necessary to control the aid, in order to guarantee its proper distribution. Traditionally, these controls have been based on field visits that cover only 5% of the declared plots. Instead, the EU proposes that the control of plantations, in a few years, be extended systematically to all farmers who receive aid, and they propose using data from Copernicus Sentinel satellites to make these controls (Devos et al., 2018). Therefore, it is very convenient to automate the download process of all appropriate Sentinel images instead of downloading the images manually from the Copernicus hub.
Agricultural monitoring has been a crucial issue of Earth observations since the first satellites were launched. Remote sensing offers a non-destructive means of providing recurrent information from the local to the global scale in a systematic way, thereby enabling the characterisation of spatiotemporal variability within a given area (Weiss et al., 2020). In this respect, satellite imagery has been used to monitor crops for a long time. Traditionally the Landsat constellation has been used both to determine land cover types and to keep track of agricultural production. At present, it is still used to determine the crop area adjustment policy in terms of the magnitude and direction of agricultural land-use change (L. Yang et al., 2019). Nevertheless, the dense time series of Senti nels offer a unique opportunity to systematically monitor crops on a weekly repeat cycle (from 5 to 12 days, depend ing on the data type and the region in the world) (Veloso et al., 2017). In this respect, the higher spectral resolution in the red edge of Sentinel-2 potentially has a great capacity to distinguish different types of natural vegetation and crops against Landsat-8 (Sánchez-Espinosa & Schrö der, 2019). The ever-increasing interest that the Coperni cus programme is attracting in the agriculture field, is introducing the use of other satellites like Sentinel-1 or Sentinel-3 in, for example, monitoring agricultural droughts with high consistency (Hu et al., 2019). However, the Sentinel-2 time series remains as the most commonly used mission for agricultural monitoring.
There are many practical agricultural applications that have been made possible thanks to remote sensing; for example: harvest and biomass crop estimation (Shanahan et al., 2001;Yang et al., 2000). One of the most interesting applications of precision agriculture is the possibility of detecting large-scale and low-cost crops. To this end, numerous researchers (Conrad et al., 2010;Yang et al., 2011) have used multispectral images of remote sensing on a temporal and spatial scale. The differences in reflectance among crops, as well as the differences in their phenology, gives each crop a more or less different spectral signature, although it is highly influenced by the climatic conditions during the measurements. Yang et al. (2011), found other limitations such as the similarity of the spectral signatures of different crops as well as the natural plot-to-plot variability of the same crop and changes in the pattern of individual crop phenology. Schmedtmann and Campagnolo (2015), showed that even for hard-to-identify crops (grasslands, fallow and uncultivated areas) their automatic recognition system, based on the spectral signature, achieved a high success rate of around 80%. This percentage increased to 100% in crops such as rice and corn. Some authors, such as Hao et al. (2016) and Godinho et al. (2017), work at pixel level, a common methodology to detect spectral signatures of different crops. Other authors have used multi-temporal, multi-polarisation and multi-frequency synthetic aperture radar (SAR) imagery, for crop identification and crop growth monitoring with great success (Chen et al., 2007;Lopez-Sanchez et al., 2011, 2012McNairn et al., 2009;Stankiewicz, 2006). Taking into account the temporal evolution of satellite images, Szostak et al. (Szostak et al., 2018) use Sentinel-2 images to analyse abandoned agricultural parcels and the forest succession progressed with good results. Yidi Xu et al. propose transfer seasonal/ yearly training Landsat data to find temporal resolution mapping for time and labor efficiency (Yidi Xu et al., 2019). In a recent review Liu et al. (2019) showed many studies (Forkuor et al., 2014;McNairn et al., 2009;Skakun et al., 2016;Villa et al., 2015) that indicate that the combined use of optical and SAR data can improve the accuracy of crop identification studies.
This crop identification from multitemporal and multispectral satellite imagery is usually done using Machine Learning (ML), and this is, specifically, the EU Joint Research Centre proposal (Devos et al., 2018). There are a lot of studies (Kamilaris & Prenafeta-Boldú, 2018) applying different ML algorithms for crop phenology identification with cases of very significant successes. A popular branch of ML is Deep Learning (DL) by adding more complexity into the feature learning. This method aims at learning feature hierarchies with features from higher levels of the hierarchy composed of lower level features. In order to successfully apply DL, a large amount of data is needed to generate the characteristic pattern of the classes to be learned. In the agricultural domain, if it is possible to collect the data on a large set of plots of a crop for several years from satellite imagery, it will be possible to achieve the phenological pattern of the crop under study with great precision.
Recently, several studies have used Convolutional Neural Networks (CNN), a DL algorithm, to identify different land covers. In this respect, Syrris et al. (2019) experimented with four CNN variants to detect land cover in general (with a general class concerning crops): a standard one, a fully convolutional network (FCN) a U-net and a SegNet with four groups of 11 Sentinel-2 images (44 images all together). Other examples are the study by Nevavuori et al. (2019) that used CNN to detect different types of crops, but with UAV images, and the work of Phartiyal and Singh (2018) that used 5 SAR and optical images. Studies like the one by Ji et al. (2018) that uses 3DCNN for crop classification with 9 Gaofen images to build the CNN and 4 images for testing should also be noted.
These previous studies have achieved very good results by applying deep learning to discover crop patterns using satellite images. Some use images of short time periods, analysis by zones, or calculated indices (such as NDVI, CI or NWDI). However, those approaches focus on the behaviour of some crops in a concrete territory, using a small number of images and in a certain period of time, instead of proposing a general methodology.
The contribution of this paper focuses on the proposal of a method for the identification of crops based on using, as training examples for neural networks, satellite information at the pixel level. The pixels are treated as independent elements to the plot, using all the satellite bands and the information from all the images throughout the year. Thus, a massive use of data is achieved, and it allows to improve learning processes, achieving patterns that are less dependent on the meteorological peculiarities of a specific year. In our case, a multi-category crop classification is not necessary because the local administration only needs to identify if the plot corresponds to the declared crop. So, in the present study, Section 2 proposes a framework for using the pixel of the Sentinel image as the smallest unit of identification of a crop plot. In the Sentinel satellite image data, a pixel has 12 spectral bands with raw data. This makes it possible to gather the complete evolution of the pixel throughout the year (over 70 images per year) as examples for the learning process of the neural network. This allows us to use all the available information in a massive way, processing them with the greatest granularity, and studying their behaviour throughout the whole evolutionary cycle. We have applied this method to the identification of the tobacco crop in the northern region of Extremadura, due to the great economic importance for the region and the strict controls, and gathering data for three years (2017, 2018 and 2019), as described in Section 3. In Section 4, we show the learning results of the neural network, from the experiments with a single year, pairs of years, and considering the data from the three years. Section 5 analyses and discusses these results, and Section 6 shows the conclusions and possible applications of the proposal.

Method
This study presents a method for generating crop patterns from the Sentinel satellite images, by combining the images with the digitized polygons of different crop parcels declared in several years. We apply artificial neural networks (ANN) that use, as training examples, the multispectral data of a pixel in all of the Sentinel images in a year.
Using the pixel as the minimum entity of prediction offers several benefits, because it allows detecting the concrete surface covered by a crop, and it is also more flexible to control the quality and quantity of samples. In this context, this approach is chosen instead of using the full polygon as a unit, being conscious that it may increase the training time.
The whole process is divided into three main phases, as can be seen in, Figure 1 distinguishing a first step where all the Sentinel data from several years are downloaded, and clipped images are generated for every polygon, as explained in Section 2.2; the second step consists of preparing the input pixels that will be used in the ANN, including filtering for discarding contaminated data, as explained in Section 2.3.; and the third step, described in Section 2.4, includes the operations needed to train an ANN model based on the previously calculated information.

Sentinel-2 data
The Sentinel-2 mission 1 is a constellation with two satellites (A and B) that has provided revisits every ten days since 2015 (A) and 2017 (B). Since 2018, they have a global revisit of five days. They share the same orbit, offset by 180º. The orbit of the SENTINEL-2 mission is sun-synchronous so that the angle of solar incidence is always maintained. The images have different spatial resolutions depending on the spectral band: 10 m for B2 (490 nm), B3 (560 nm), B4 (665 nm) and B8 (842 nm), 20 m for B5 (705 nm), B6 (740 nm), B7 (783 nm), B8a (865 nm), B11 (1610 nm) and B12 (2190 nm) and 60 m for B1 (443 nm), B9 (940 nm) and B10 (1375 nm). The Level-2A products, used in this study, are calibrated images that provide Bottom Of Atmosphere (BOA) reflectance images Level. Sentinel-2 Level-1 C Top-Of-Atmosphere (TOA) products are calibrated using the Sen2Cor algorithm. This algorithm performs atmospheric corrections to correct effects of the atmosphere in order to deliver BOA reflectances.

Massive processing of satellite imagery
The source information with which this process works is the geographic information about all the polygons that declare agricultural plantations in an area. This information might have different sources and types, and the format of the source files may not be the same either. The first, and one of the most important steps, is to unify this information.
All these polygons have to be processed and converted into a standard format. One of the appropriate formats is GeoJSON, because it has a lot of libraries and extensions to work with, and can be easily integrated into map systems. Once the GeoJSON files are generated, it starts the downloading process. In order to make it automatic and massive, a Python library is used to connect to the Copernicus Open Access Hub in an easy way. 2 In the first place, its bounding box is calculated for each polygon to improve performance and avoid possible errors with the coordinates. Given the bounding box coordinates and a date range -in our case, it looks for a single date in each step-, the API returns a list of the Sentinel images that contains the specified coordinates.
Having a list of candidate images, the image must be checked to see if it has already been downloaded and stored locally. Storing the images temporally saves time as the number of studied crops may increase in the near future, and it will be necessary to use a specific Sentinel image again. If there is no downloaded image in the list, it is mandatory to choose the most appropriate one, so that some conditions are set, filtering only L2A imagescorrecting 1 C images by using SEN2COR 3 may delay the process-. When the image is chosen, it is downloaded and stored. Considering the objective of reducing future time according to the massive processing, it is decided to clip the Sentinel image and store a single image for every processed polygon. This may also lead to future applications by itself.
For clipping the Sentinel images, it is usual to use desktop programs such as SNAP 4 (Sentinel Application Platform). These kinds of software are discarded because of the purpose of our approach for the massive and automatic treatment of data. However, SNAP offers a Python interface named Snappy, 5 through which you can access all the SNAP operations from a Python script. This is how the Sentinel image is clipped, returning the minimum rectangular image that contains a polygon. All the clipped images generated are also stored. In the images clipping process, resampling unifies the band resolutions to 10x10m.
Based on the previous clipped images, it is mandatory to check which pixel corresponds to the polygon. These selections are made by generating the real coordinates for the four corners of every pixel and checking their position in relation to the polygon coordinates. After this verification, the real coordinates of every relevant pixel are stored in the database. By discarding the pixels that touch the polygon but are not exclusively inside it, we prevent noise in our results due to different possible causes, such as trees, roads, etc.

Treatment and selection of training pixels for the ANN
The presence of clouds is a very persistent problem in remote sensing; therefore, it is important to process the information to eliminate these data that will distort the values in the learning process. In this method, various approaches are applied to solve it, such as those described by Braaten et al. (2015) and Hollstein et al. (2016). We have implemented these algorithms, applied on a pixel level, so the pixels that are affected by cloud cover can be detected and discarded. Howe ver, this may cause more problems in terms of having to unify input for the ANN on the pixel level, because there might be dates without any valid information because of the existence of clouds. To solve this, the proposal is to calculate mean values between short periods of time. Considering the periodicity of sentinel images, the best approach found was to use fortnights as time units.
Only the raw 12 source bands are used in the traini ng, avoiding employing any calculated index such as NDVI. It is important to emphasise that every single entry is the information on a single pixel over time.
Making the entry on a pixel level, instead of using complete features, helps to avoid possible invalid infor mation, by detecting clouds and noise and discarding those anomalous pixels.

ANN architecture
With the data above, the next step is to choose a machine learning technique to identify the crop patterns. Our method does not suggest using a specific one, but to establish a systematic processing of multitemporal satellite images for large-scale crop classification, with any machine learning technique that matches with this approach. In the experiment on Section 3, we have used the well-known dense neural networks (DNNs), which are suitable with the proposal and yielded good enough results. Other alternatives, for example, Random Forest (Chunhui et al., 2017) or more complex CNNs such as VGG (Abdul et al., 2019), used in similar projects may also be appropriate, but the good results of the DNNs have helped us to test the method and carry out the experiment.
Once the technique was selected, the next step is to define the internal architecture. In our experiment, we used three hidden layers as it is one of the most usual configurations. So, the final architecture is composed of an input layer, three hidden layers and one output layer. It should be noted that the main approach in the proposal is to discern if a polygon corresponds to one crop or not, instead of categorising for different crops. For this reason, the output layer has only one final node. The activation function for all the hidden layers corresponds to a Rectified Linear Unit. Moreover, the output layer uses SoftMax as the activation function.
Some algorithms were applied in order to improve the performance of the DNN: reducing the learning rate when a metric has stopped improving -monitoring the loss value-and EarlyStopping (Zhang & Yu, 2005), which stops training when a monitored quantity has stopped improving.

ANN training
The common approach to neural network training is to split all the data into three large groups: a training set, handled by the ANN to train the model; a validation set, necessary to evaluate the hyper-parameters from the ANN; and a test set, to calculate different metrics, measure the prediction accuracy and evaluate the final model.
To preserve randomness by ensuring the replication of the experiment the data are randomly split using a fix seed. In this approach, the distribution of the three sets represents 70% for the training set, 15% for the validation set and 15% for the test set. Distributing the data randomly also ensures the geographical distribution of the data into the three groups.
In all neural network training, it is also very important to have negative examples. In this case, our method focused on detecting a single crop against several others, which are considered as negative examples, as well as arable lands and pastures. It is important that several negative examples are from crops similar to those intended to be learned. The quantity of negative examples should also be balanced according to the positive ones, and also balanced among themselves, with the different types of crops. After the selection, all the positive and negative examples must be mixed before the training, avoiding the ANN overfitting.

Study area
This experiment focuses on Extremadura (Figure 2), the fifth largest Spanish region with 41,633 km2. It is located in the South West of Spain, and is one of the historical transit areas as part of the Vía de la Plata [Silver route], and a necessary nexus between two large capitals such as Madrid and Lisbon. Its territory is divided between two large hydrographic basins, the Tagus river and the Guadiana river, that make a great percentage of its land very productive. The climate is mainly Mediterranean, except some parts in the North and West. However, rainfall is scarce in almost the entire region, and the temperatures have a very pronounced pattern, so the water balance is clearly negative. Nonetheless, agriculture remains one of the main economic drivers of the region. In 2018 the gross domestic product (GDP) of agricultural production reached 1,506,836 million euros. The agricultural sector in this region has a weight (7.8%) that is three times higher than the national average (2.6%), showing the importance that the primary sector continues to maintain over Extremadura production. The crops covering the largest area are cereals (280,257 ha), olive trees (261,349 ha), vineyards and fruit trees (195,930 ha), although it is also important to highlight the economic importance of tomatoes and tobacco (Coleto Martínez et al., 2019).

Study data
Although crop geographical demarcation has been registered by the Extremadura Regional Government's CAP agency for several years, this study only considered the years 2017, 2018 and 2019. This is because the precision of the information increases every year, due to the more sophisticated techniques implemented to delineate the spatial graphic declaration. Nevertheless, in the first years more inconsistencies may appear that must be solved. For every year, there is information for about a million and a half polygons, including crops, grasses, fallow, trees, etc. The study focuses on the detection of tobacco, using other crops with similar phenological patterns (corn, rice and tomato) as negative examples, but the methodology could obviously be applied to other crop types.
Another reason for using different years is to avoid the predicting model to train weather features from a specific year. Considering that the project goal is to predict the corresponding crop over a non-trained year, it is important to have a generalist crop pattern.
For the three years of the study there are 13,957 polygons of tobacco for positive examples, and 56,422 polygons of corn, 28,128 of rice and 25,241 of tomato for negative examples. These crops were selected because of their great similarity, as they are summer crops, in order to get a better verification method.
For this experiment, we can use a large number of training examples because we assume that almost all the farmer's declarations match the actual crop. According to data from the Junta de Extremadura paying agency, on the spot controls for the years 2015-2019 in these four crops, only 1.41% did not fully or partially match with what was declared.
The method for checking that the information collected was correct has been to calculate the average value of all (valid) pixels of all polygons during a year for each crop. The validity of a pixel is given by the absence of clouds, which detection was exposed at Section 2.3. As explained at the beginning of Section 2.2, summer crops were selected, so the average values for each band should show a significant variation between June and September, as seen in Figure 3.
The experiment focuses on a period between mid-April and mid-October, which covers those crops named summer crops, resulting in twelve periods, that multiplied by twelve bands of information make a matrix of 12 × 12 as an entry for the DNN, so the input layer is composed of 144 input nodes.

Results
We have validated our model by doing experiments on a large number of plots of a given crop for several years, provided by the statement from the Regional Governme nt of Extremadura's CAP agency. The aim of the experiment was to show that training and validating with seve ral years' input present good accuracy, as expected. The experiments have focused on tobacco, with a large number of plots (13,957) in 2017, 2018 and 2019, generating models with the different years' combinations. The main reason for this motivation is that the weather may significantly affect the phenological process of the crop: a drought may retard the process, and heavy rains could bring it forward. The 13,957 plots of tobacco in the three years of the study, provide 545,682 positive examples for the DNN, as will be described in the next section.

Data material
As explained in Section 2.2, a positive example is the data of a pixel in all the images in a year, and only the valid pixels are selected as an entry to the DNN. The number of pixels for each crop mostly depends on the size of the polygons, as well as the weather conditions. For example, polygons from tobacco are more affected by clouds as a whole because most of them are cultivated in the same region, as seen in Figure 4, so it is more feasible that there are days with no information for almost any polygon. Despite calculating the values by fortnights to try to solve this issue, it still may affect the number of entries. In Table 1, the relation of valid pixels for each crop is presented for the three studied years.

Dataset partitioning
Having the pixels distributed in the three sets (training, validation and test), it is important to separate the training set into positive (+) and negative (-) ones. As a remi nder, tobacco was the crop chosen for the current experiment, so all tobacco training pixels are considered as posi tive examples. Considering the other three crops for colle cting negative examples, it is necessary to split the number of possible pixels into three, obtaining the number of negative pixels for each crop, as can be checked in Table 2.

By-pixel crop pattern detection using multiple years
This section presents the data obtained after the learning process of the DNN in the three years of the study. It is important to emphasise that all the tests were done with the same test set, as explained in Section 4.2. This test set was generated by considering all tobacco test data (116,931 pixels), and a balanced distribution of the other three crops used as negative examples, resulting in a total test set of 229,672 entries. Figure 5 shows different permutations of year training. They represent similar confusion matrices, showing the comparison between the expected results and the predicted ones.

Results per year
Although the results obtained by training with asingle year are quite good, it can be appreciated in Figure 6 that by combining different years in the training phase, the results are even better, due to the presence of different years' data in the test set.

Results for three years together
After training some models with different year combinations, the target of the experiment was to achieve a final model by training with the three years studied. The results of this approach are presented in Figure 7. We can also check the increase in accuracy between the different combinations in Figure 8.

Discussion
The previous section described the performance of several ANN models to identify tobacco crops against three other crops with similar characteristics (corn, rice and to mato). It shows the results generated in the models by gro uping years. As mentioned, there are several studies that use ANNs for the recognition of crop patterns (Ji et al., 2018;Kamilaris & Prenafeta-Boldú, 2018;Nevavuori et al., 2019;Syrris et al., 2019) and employ several satellite images of an area, studying some indexes such as the NDVI to learn the phenological pattern of a crop. In our study, we have chosen to use all available information fr om satellite sensors (the 12 bands) without any calculation of indexes that combine these bands. That is, we have not performed any manual pre-processing of data or selec tion of features, as proposed by deep learning techniques. The Regional Government of Extremadura has provided us a sufficient set of data with the graphic   would be examples that include few pixels and others that cover several hectares; and (2) especially in large plots, those that partially contain clouds or data out of range, calculate average values of the bands of all pixels, distorting the sample. In our case, taking as input the information on all the bands of a pixel for a year, it allows, on the one hand, to have a considerably greater number of positive examples. In addition, those pixels that, in a considerable period of time (in our case a month), have invalid data (mainly due to the existence of clouds) can be removed from the input sample.
On the other hand, our model proposes as its main contribution, to use as many years as possible to identify the pattern of a crop. If the ANN learns with single-year crops, it is more likely that it identifies the precise pattern of weather conditions for that year. Otherwise, if you are given several years of learning, you will have a greater ability to predict crops in coming years.
As we can see in the experiment described in the previous section, we have generated several ANN models taking individual years, pairwise years, and three years together. As expected, for a test set that includes data from the three years, a model trained combining data from several years will achieve a higher accuracy.
As we see in Figure 8, by combining the pixels of two-year plots, we obtain in 2017-2018 an accuracy of 0.9575, in 2017-2019 an accuracy of 0.9647, and in 2018-2019 an accuracy of 0.9749 . Although the results are very significant, due to the large number of input elements, it can be seen how, if we use the information

Conclusion
In this study, we have developed a method for identifying crop patterns using ANNs from the temporal sequence of satellite images. The approach that has been considered is to establish, as a learning unit, the information on a pixel of a crop plot throughout the year (from around 83 images per year with 12 spectral bands of data). This allows dealing with a huge amount of valid information that can be provided as learning to the ANN. Considering this massive use of learning examples, it is possible to reduce the data with noise, which may come from cloudy days or plot borders. So, this new approach focuses on using as many learning examples as possible, not only treating the learning unit as a pixel but also using the greatest number of years. Thus, the learning patterns of the ANN models are generalised by eliminating dependence on the size of the plot, or the weather characteristics of a specific year, with the goal of improving the prediction of non-trained year data, as explained in Section 3.2.
We have applied our method to the identification of tobacco plots, compared to three other crops (corn, tomato and rice) that have a similar phenological pattern to test the learning process. The findings, described in section 4, present a very significant level of success, and show that the results are very good if we consider a greater number of years. In addition, crop identification processes are also performed by pixel. That is, for each pixel within the plot being analysed, the level of agreement with the learned pattern is shown. In this way, you can even identify which part of the plot coincides with the declared crop. This is very important because it makes it possible to identify, for example, the lack of coincidence of the boundaries of the plot, or areas where the crop has grown poorly compared to the declared crop.
The learning processes and the evaluation process have been carried out giving as examples of training units the data on one year. But, on the other hand, periods of learning and prediction can be limited to specific periods of interest (for example, tillage, planting or harvesting). This may be interesting to detect early deviations from the declared culture pattern if satellite images are collected in real time.
In order for the method to be used, taking into account millions of training pixels with year-round information, and data for several years, significant storage capacity is needed, and considerable time to obtain and process the necessary information. This is the main disadvantage of the proposed method, since it can take months to collect the training examples to form the ANN model. On the other hand, once the information is stored and processed, generating different ANN models is fast and flexible. This characteristic will allow the early identification of the coincidence of a crop with the patterns learned. Thus, for future campaigns of the CAP, the Regional Government of Extremadura will use this model to verify the systematic and generalised coincidence of the crops declared by the farmer, saving the costs of field observations, and speeding up the payment processes.