Paddy crop insurance using satellite-based composite index of crop performance

Abstract The well-known area-yield crop insurance contract, guaranteeing a certain percent of normal yield over an insured area, is losing its effectiveness due to poor quality yield data. This paper introduces a “satellite-derived crop health index” as an alternative to yield data in such an insurance model. The new approach was implemented in the 2020 crop season, covering 3.5 million ha of paddy crop over 3200 Insurance Units in the West Bengal state of India. Data of Sentinel satellites, gridded weather data, and Mobile-app based field data were analyzed to generate paddy crop map and crop health indicators, namely NDVI, LSWI, Backscatter and FAPAR for the current (2020) and past years (2016-2019). Using the metrics derived from these indices and entropy technique, a composite index of crop performance called Crop Health Factor (CHF), ranging from 0-1, was generated. Deviations of CHF and yield between the years showed good correlation. The CHF data has successfully replaced the yield data for indemnity and pay-out assessments in 2020, as notified by the Government in advance. Thus, this project makes an entry point for developing remote sensing based transformative crop insurance solutions to enhance risk transfer in agriculture which perhaps the most plausible way forward.


Introduction
Globally, agriculture is exposed to multiple hazards leading to frequent crop losses. As a result, crop insurance has become an indispensable risk management tool in the agriculture sector. Agricultural risk-sharing through crop insurance has been in existence in many counties, in many forms, and for many decades (Skees et al. 2005). Still, the need for developing innovative crop insurance products that are actuarially sound is well recognized by both developed and developing nations Quirion 2013, Department of Agriculture andCooperation 2014). each crop in each IU. Limited number of measurements and their proneness to subjectivity have become major constraints in generating reliable yield estimates. As a result, the estimated yield of an IU tends to deviate from the actual yield leading to data disputes and delays in claims settlements.
Thus, the biggest challenge in the area-yield crop insurance in India, continues to be the generation of accurate crop yield data for the current and past years in the insurance units (Department of Agriculture and Cooperation 2014, Murthy et al. 2018). A review by Smith and Watts (2009) indicates that index-based insurance schemes in developing countries are faced with considerable problems of basis risk and non-availability of reliable historical data, making the scalability and sustainability of agricultural insurance, in the long run, a big challenge.
DAC&FW has taken up many initiatives to improve crop yield estimation procedure ever since the launch of PMFBY in 2016 (www.pmfby.gov.in). Technology development agencies of both Government and Private sectors are currently developing new yield estimation methods or finding a substitute to yield data using various datasets, models, and analysis techniques. Modeling crop yields is complicated because many factors such as weather, soil, crop variety, crop management, etc., are to be parameterized. Such estimations at local scales like IU are even more challenging than regional scale estimations because capturing variability at local scales requires good quality datasets. Diverse scale input data, lack of reliable training data, lack of control on the error in models, etc., are significant limitations for developing scalable crop yield estimation models for an insurance purpose (Manivasagam andRozenstein 2020, Klompenburg et al. 2020). Therefore, more research is needed to customize crop yield models to suit the requirements of area-yield crop insurance contracts.
Thus, improving the crop yield estimates to suit the requirements of crop insurance either through (a) increasing/optimising manual measurements or (b) adopting modelled yield estimates is still associated with various challenges in India and elsewhere. As a wayout, index-based insurance linking payouts to crop performance proxies rather than measured losses are proposed to improve crop risk management (Hess and Syroka 2005, Hazell et al. 2010, Vroege et al. 2019. As such there is no operational index of crop performance to suit the needs of area-yield index insurance contracts. The current work addresses this gap area in the crop insurance space. The objective of this work is to develop an alternate measure of paddy crop's performance, called Crop Health Factor (CHF), using mainly remote sensing data and to utilize this measure in place of yield data in the existing area-yield insurance contract. All other terms and conditions of the existing area-yield insurance contract are same. The scientific rationale behind the proposed new measure was first discussed. The other elements of the work namely methodology, results, merits and challenges of the new method and conclusions were presented in subsequent sections.

Rationale behind the proposed new measure of crop loss
CHF is a composite index of crop performance incorporating multiple physical and biophysical parameters related to crop health. It is a quantitative measure of crop health and its overall performance. The research findings that establish the rationale for adopting CHF are; (a) objectively measured yield-proxy-index is a better choice than subjectivity-prone manual yield estimates to design crop insurance contracts, (b) currently satellite and weather datasets permit more objective assessment of crop health at moderate spatial and temporal scales, (c) composite indicators are effective to simplify the complex processes into easily understandable simple comparisons. Index-based insurance schemes are superior to conventional indemnity-based schemes using measured losses, and hence these schemes are widely recognized for their successful implementation (Miranda and Farrin 2012, Ibarra and Skees 2007, Leblois et al. 2014. Such schemes will become more effective if an objectively measured index guides the crop loss assessment.

Remote sensing data in crop insurance
Many index insurance contracts based on yield or weather have effectively used digital technologies such as Mobile Apps and satellites for the last 10 years to enhance outreach, reduce operational costs and premiums, and expedite claims settlement in Africa (Raithana and Priebe 2020). Vroege et al. (2019) reviewed the index-insurance schemes over grasslands in Europe and North America and concluded satellite-based indices would benefit from designing more efficient insurance schemes in agriculture.
Recent developments in remote sensing technology and weather instrumentation have been enormous, permitting close monitoring of crops with various bio-physical indices. Both optical and microwave data of moderate resolutions of 10-20 m are available once in 5-12 days. Satellite indices are extremely useful to detect vegetation/ crop health status and its anomalies from normal caused by various risks such as droughts, floods, and pests. Increased availability of satellite and weather indices provides enormous opportunities for near real-time monitoring of crops.
The Copernicus program of the European Space Agency started a new era of remote sensing by providing an unprecedented amount of free data of multiple spectral channels comprising Optical and RADAR data. With this dense data, close monitoring of crops is possible, enhancing the ability to capture multiple crop risks. Continuity of Sentinel data is guaranteed up to 2030 (https://www.esa.int/ Applications/Observing_the_Earth/). Veloso et al. (2017) showed that frequently available Sentinel data, both optical and microwave, is helpful to capture the phenological stages of various crops, viz., wheat, paddy, soybean, and maize. Roumigui e et al. (2015) developed Forage Production Index using fractional green vegetation cover integral from medium resolution data for index-based insurance over grasslands. Bokusheva et al. (2016) developed index-insurance contracts for wheat crop using satellite-derived vegetation condition index (VCI) and temperature condition index (TCI) covering Northern Kazakhstan. Copula approach was adopted to model wheat yield and satellite indices, and the results indicated substantial risk reduction in these new contracts. The study also suggests that satellite data of higher spatial resolutions would enhance the effectiveness of insurance contracts. Some of the operational index-based insurance products for grasslands using NDVI/Rainfall in Spain, Mexico, the U.S.A., Canada are summarized in Roumigui e et al. (2017). Ye et al. (2017) developed Snow index insurance for livestock in Mongolia using the percentage height of grass-covered by snow and stress imposed by snow and duration of stress as the trigger for insurance payments replacing the existing livestock Commercial Mortality Index. This new index is found to be superior to the existing product. M€ ollmann et al. (2019) demonstrated that the crop insurance contracts using remote sensing-based vegetation health indices performed better than weather indices like temperature and precipitation. The study concludes that satellite data of higher spatial resolutions covering critical stages of crops would further improve the effectiveness of the insurance. K€ olle et al. (2020) reported that satellite indices VCI and TCI outperformed the meteorological indices in crop risk management through index-based insurance, covering non-irrigated olive trees in Spain.

Composite indexing method
Composite indicators are gaining popularity in recent years for agricultural assessments because of their multidimensionality, flexibility, transparency, and simplicity. These indices are now increasingly used in environment, economy, society, and technology development applications (Saisana andTarantola 2002, OECD 2008). The greatest strength of these indicators lies in their ability to summarize complex and multiple dimensional processes and provide a big picture that is easy to interpret and make decisions. Various scholars have much debated the advantages and disadvantages of using composite indicators. Composite indicators approach was adopted for drought vulnerability assessment and climate change impact assessments (Wilhelmi and Donald 2002, Tsheko 2004, and Murthy et al. 2015, for identification of desertification hotspots (Singh and Ajai 2019), for agricultural water resources management (Liu et al. 2019) and for agriculture sustainability assessment (Talukder et al. 2017). The Aggregate Drought Index by Keyantash and Dracup (2004), the Joint Drought Index by Kao and Govindaraju (2010), the Combined Drought Indicator by Sepulcre-Cranto Horison and Sigleton (2012), and the Multivariate Standardised Drought Index by Hao and AghaKouchak (2014) followed an integrated approach using multiple sub-indicators. Waseem et al. (2015) and Rajsekhar et al. (2015) reported that the new composite index of drought using multiple sub-indicators of meteorological, hydrological, and agriculture has excellent potential for a comprehensive assessment of drought conditions.

Study area
The current project was implemented in West Bengal, an east Indian state of India, which is predominantly an agrarian State (Figure 1). The state is located between 21 0 31 0 & 27 0 14 0 North Latitude and 85 0 91 0 & 89 0 53 0 East Longitude. Annual rainfall ranges from 1250 mm to 2500 mm in different districts. The agriculture economy is greatly dependent on the southwest monsoon from the Bay of Bengal. Flood, land erosion, drought, and other natural calamities often affect production in agriculture. The total cultivable area of the West Bengal state is about 5.6 million hectares which is 63% of its geographical area. About 65% net cropped area is irrigated. The gross cropped area is 9.4 million hectares with a cropping intensity of 182%. Rice is the dominant crop in the State, followed by potato, wheat, mustard, and jute. Tea is also grown in the State (https://agricoop.nic.in/en/agriculturecontingency/west-bengal, Murthy et al. 2018). The State's 22 districts cover about 3200 IUs (Figure 1). A small group of villages constitutes an IU.

Methodology and data
The framework of CHF computation was presented in Figure 2. Satellite-based crop mapping, satellite and weather data analysis for generating crop health indicators, field data collection and analysis, composite index generation, and insurance loss assessment are main tasks in this project.

Field data collection
Mobile technology has tremendously improved the field data collection system by producing real-time crop status and crop management data. A robust system of mobile-based crop surveillance was part of this project for close and continuous monitoring of the agricultural situation. Mobile App was developed with a set of attributes for field data collection and was provided to the field functionaries. Protocols for field data collection were developed. Field data covered most of the IUs , 2-3 times during the crop season. There were two types of field data points i.e., reference points, and random points. Reference points were observed multiple times to monitor crop performance, whereas random points were covered at least once in a season. It was ensured that every block (a group of IUs ) was represented by at least 10-20 data points, and 1-2 data points represent each IU within the block at each time of data collection. Around 130 thousand field data points were collected for the 2020-season consisting of 15 attribute factors and photographs of the fields. Thus, an extensive repository of field data points was involved in the analysis to show substantive evidence on the ground situation. These field data points were linked to satellite indices for crop classification and crop condition assessment.

Paddy crop mapping
The generation of an accurate crop map for the current and historical years is the first important step in the CHF generation process. Paddy layer was generated with SAR and Optical data for five years, i.e., 2016 to 2020. The crop is transplanted mainly in August and harvested by December. In a few districts, crop harvesting extends till the first fortnight of January.
Paddy crop mapping with SAR back scatters data is well reported. In India, paddy crop mapping and inventory are being carried out for many years using SAR data (www.mncfc.gov.in). Extensive field data points for the current season and limited field data points for the past years were utilized in this project. The decision rules approach was adopted for classification. Rules were defined for each district separately by analyzing the ground truth data with backscatter data for signature generation. Three backscatter images representing early, active growth, and peak growth stages of paddy were used to define the decision rules to discriminate the crop. This methodology is widely followed for monsoon season's paddy crop mapping in India (www.mncfc.gov.in) and we have also adopted the same methodology for paddy crop mapping in this work.

Basic indices data
The following set of indices derived from satellite and weather datasets that are well reported to be related to crop condition were part of a vast repository of data in the current project.

Normalized difference vegetation index (NDVI)
NDVI is a chlorophyll-based crop vigor index derived from the reflectance of red and NIR (Near Infra-Red) bands. It is well established that NDVI can quantify the impact of seasonal crop stress events, characterize crop health and map the phenological patterns of crops (Malingreau 1986). NDVI and yield relations are highly variable, and hence NDVI alone may not improve the crop insurance product design (Makaudze andMiranda 2010 andSmith andWatts 2009).

Land surface wetness index (LSWI)
Indices based on the reflectance of Shortwave Infrared (SWIR) bands are sensitive to moisture available in the soil and the crop canopy (Wang et al. 2008). LSWI is one such popular index for crop stress detection (Gu et al. 2007). The combination of NDVI and LSWI would amplify the anomalies and become more responsive to the ground agricultural situation (Gu et al. 2007).

Radar backscatter
Microwaves are sensitive to the water content in the soil and vegetation and other variables influencing backscatter, i.e., soil roughness and vegetation structure. The unique sensitivity of microwave scattering to crop structure has led to many studies using SAR backscatter for crop monitoring (Nelson et al. 2014, Yuzugullu et al. 2017).

Fraction of photosynthetic active radiation (FAPAR)
Fraction of Photosynthetic Active Radiation absorbed by the chlorophyll pigments of crop canopy is based on canopy structure/status and illumination conditions (Baret et al. 2007). It is a biophysical variable closely associated with carbon dioxide assimilation and biomass production by crops/vegetation. FAPAR provides a more meaningful measure of vegetation status than Vegetation Indices (Running et al. 2004, Meroni et al. 2013). The Copernicus Global Land Service (CGLS) provides global time series of FAPAR data at a resolution of 300 m and a frequency of 10 days.

Satellite data products and indices generation
Fortnightly greenest pixel composite of Sentinel-2 surface-reflectance product (L2A) were band stacked, exported and downloaded through earth engine cloud computing environment in java script (Gorelick et al. 2017). Subsequent indices generation and its compositing were done using ERDAS IMAGINE 16.1 and ArcGIS Desktop 10.6. NDVI using Band 8 and Band 4 data and LSWI using the Band 8 and Band 12 data were computed for each fortnight of the crop growing season.
Temporal GRD product of C-band Sentinel-1B which provides backscatter normalized to ground area with 12 days repetivity were used in the study. The Sentinel-1A data were pre-processed through various steps such as, (a) Radiometric calibration; (b) Speckle filtering using 5 Â 5 adaptive Lee filter; (c) Range Doppler terrain correction algorithm using the elevation data from the 1 arc-second SRTM DEM product. In this process, data are resampled with a grid of 10 m spacing, preserving the 20 m spatial resolution. The linear backscatter data were then mosaicked and subset over the study region. Linear backscatter intensity in VH polarisation (r0VH) was further used in the study because of its consistent response to crop growth.
Time series of Copernicus Global Land FAPAR product derived from Proba-V (SPOT) and OLCI (Sentinel-3) has been used. For the study period FAPAR Version-1 products available at 300 m resolution and 10-days interval were downloaded from http://land.copernicus.eu/global. The PROBA-V/OLCI FAPAR datasets were generated from instantaneous top-of-canopy reflectance from Sentinel-3 OLCI and daily top-of -aerosol input reflectance from PROBA-V by applying a neural network. Detailed retrieval algorithm can be found in Baret et al. (2016). The dataset has been validated for different biomes and reported to have good quality by many studies (Brown et al. 2020, Fuster et al. 2020).

Weather data
The two indicators from rainfall data are; (a) season's rainfall which is crucial to represent drought and flood effects, and (b) rainy days indicating rainfall distribution. Any day recording more than 2.5 mm of rainfall is called a rainy day (www.imd.gov. in). These two rainfall-based derivatives are recommended indicators for crop stress assessment in the National Drought Manual of India (www.agricoop.gov.in). Other weather parameterstemperature, wind speed, and humidity-were also analyzed to assess the effect of certain specific risks like extreme temperatures, pests, and diseases. The source of weather data used is from India Meteorological Department, and the resolution of such datasets ranges from 10-25 km, which is adequate to represent the average weather situation of an IU.

CHF generation
The important elements of the CHF generation procedure include the selection of input variables, data matrix preparation for all the IUs and years, grouping of IUs in each district, data normalization, weights generation, and index development followed by its validation.
4.4.1. Metrics from primary indices data As indicated above, optical and microwave data of Sentinel satellites were fully exploited in this project. Fortnightly and monthly time composite NDVI and LSWI images and 12-day interval SAR VH backscatter images were generated from July to January for all five years. Paddy crop layer and shapefile of IUs were used to create the data of sub-indicators. Input parameters of the model along with their functional relationship with CHF were shown in Table 1. Maps of the eight input parameters for 2020 crop season were furnished in Appendix. Such maps were also generated for all the remaining years in the project.
From the NDVI profile of each year, the maximum NDVI value occurring at n th fortnight and the NDVI of either n þ 1 th or n-1 th fortnight, whichever is higher, were averaged. Averaging two values of NDVI reduces uncertainty and ensures a better representation of the season's maximum value. A similar approach was adopted for computing the season's maximum LSWI and Backscatter. Integrated VH backscatter represents the total Backscatter of all the data of the growing season. Integrated FAPAR (sowing to harvesting) -Monthly composite FAPAR data from September to the first fortnight of November, when the crop was actively growing and maximum vegetative phase were summed up. Crop condition variability was represented by the maximum of the Coefficient of Variation (CV) values of NDVI and LSWI. CV was computed using mean and standard deviation of respective index in a given IU.
Justification for the above parameters is drawn from the reported research. Season's maximum NDVI and LSWI, included in the model, were reported to be effective in crop condition assessment and crop yield modeling by many studies (Son et al. 2020, Islam et al. 2021. Setiyono et al. 2017 used the SAR backscatter to infer the LAI values of rice crop and used them as relative leaf growth rate parameters in the ORYZA model. Moran et al. (2011) reported that the time series of r0 offers reliable information about the crop growth stage, such as jointing and heading in grain crops and leaf development and reproduction. FAPAR integral over the season is one of the indicators of biomass production from vegetation. Roumigui e et al. (2015) used FAPAR integral in the index-based insurance over grasslands. FAPAR was successfully used in grain yield estimation procedures by Tripathy et al. (2014) and Dong et al. (2020). Rainfall and rainy days are vital determinants in crop production in India (Murthy et al. 2015).
Rainfall quantity shows differential effects on crop performance. Increase in rainfall up to certain level benefits the crop and beyond certain quantity it harms the crop. In this project, we have included only benefit-causing limit of rainfall, which was fixed at 150% of normal rainfall. India Meteorological Department defines normal rainfall limit as 80-120% of long-term average (www.imd.gov.in). We have considered an additional quantity of 30% assuming that it benefits the water loving paddy crop, and thus fixed the upper limit of rainfall quantity corresponding to 150% of longterm average. It is assumed that the negative effects of excess rainfall quantity beyond 150% limit, on crop performance would be captured by other parameters, i.e., NDVI, LSWI, and Backscatter. The impact of deficit rainfall on crop would anywhere be reflected in these spectral indices.

Stratification of the IUs
The IUs of each district were first segregated into homogeneous groups based on NDVI, rainfall, and soil water holding capacity of a normal year, i.e., 2018, in the present case. It was ensured that each group consists of at least 4 IUs that are contiguously located. By pooling contiguous IUs into a group, the number of crop scenarios would be increased. For example, a group with four IUs and four crop years (2016 to 2019) has 16 crop growing scenarios, adequate to capture the variability. The dynamic ranges of input indicators of the CHF model are more or less the same in each group because it represents a homogeneous crop growing environment. On the other hand, if each IU is treated as a discrete/standalone entity, there are only four crop growing scenarios, and such a limited database can not represent the total variability of crop repsonses leading to biased weights. All the steps in CHF computationdata normalization, weights generation, and applying weights for final index generation were carried out in each stratum independently.

Data normalization
The data was checked for reduandacy by comparing the inter-correlations between the variables. In all the cases, the correlations were less than 0.5 and insignificant and hence no further tests of multicollinearity were conducted. The input indicators of the model were in different units. Their functional relationships with CHF is either positive or negative (Table 1). To obtain these indicators unit-free, data normalization was done by following the widely recommended Min-Max approach (OECD 2008).
In the case of the input-indicators that have a positive relationship with CHF, normalization was done using the formula; In the case of the input-indicators having a negative relationship with CHF, normalization was done using the formula; Where x'ij is the normalized value of the input indicator xij.

Weights generation and CHF computation
After normalization, the input indicators were ranging from 0-1. Derivation of weights to input indicators is vital in constructing composite indices (Brooks et al. 2005). There are many methods of weight generation in a composite data framework (OECD 2008). Wilhelmi and Donald (2002) selected weights based on the relative contribution of each factor to their drought vulnerability study. Li et al. (2006) used Principal Component Analysis to generate weights for the variables. Brooks et al. (2005) assigned equal weights to each indicator in their study. Murthy et al. (2015) adopted the variance approach in their study on drought proneness. Feature extraction techniques such as Principal Component Analysis, Partial Least Squares, multi-criteria decision models are also used to aggregate different variables into a single index. But linearity assumption in data transformation is a serious limitation in most feature extraction techniques (Rajsekhar et al. 2015). Entropy technique based on information theory depends on the disorder degree of information. It is a more effective information measure providing balanced relationships and unbiased weights better than linear methods (Waseem et al. 2015, Rajsekhar et al. 2015, Liu et al. 2019. The entropy-based weighting technique was adopted in this project, considering its merits. The entropy-derived variability of a feature and its weight are directly related. This technique does not include any assumptions, cumbersome derivations, and transformations.
Consider a normalized data matrix D which consists of observation for features, where x ij denotes the value of the j th feature of i th observation. The entropy of the j th feature, E j, can be calculated as: The weight of the feature can then be calculated as: Given the weight (w's) and the normalized feature values of i th IU. (x ij 's), the Crop Health Factor (CHF) can be calculated as:

Results
This innovative method, the first of its kind in the country, was implemented in the 2020 crop season covering 3.5 m ha of paddy in the West Bengal state of India. The State's Department of Agriculture had issued the notification in advance vide G.O.No: 524-(nab)-AG/O/Crop Ins/7C-06/2020 dated 10.06.2020, indicating the new method of crop insurance called 'Technology-based Bangla Shasya Bima Scheme' to be implemented in the Aman crop season of 2020. The scheme followed the area approach, and Gram Panchayat, a small group of villages, was the IU, as mentioned in the previous section. Around 6.2 million farmers out of 7.2 million farmers were enrolled in this scheme indicating high outreach. Aman season starts from the second fortnight of July and ends in the second fortnight of December.
A dedicated interactive dashboard, representing all the key features and parameters, such as area wise enrolment, farmer demographics, CHF values, other satellite-based parameters/indices, claim amount payable was deployed. Adequate publicity was given in all the villages of the notified districts/areas. Electronic and print media were utilized to create and disseminate awareness about benefits and provisions of the Scheme among the cultivators and the agencies involved in implementing the Scheme. Publicity efforts were initiated a month prior to the start of coverage period. All the published material and information were uploaded on the crop insurance portal along with coverage/frequency/duration date etc. Capacity building of the field functionaries for effective implementation of the scheme and training workshops/sensitization programs were also organised.

Paddy crop distribution map
The classification was performed for each district separately, and the outputs were verified extensively by superimposing on cloud-free optical data of October/ November months when the crop was at maximum vegetative phase ( Figure 3). Field data points were also used to check the classification performance. It was ensured that commission errors were <10% and omission errors were accommodated up to 20%. By adopting this strategy on mapping error, the IU average crop health indicators represent the crop pixels more closely.

CHF values of current and past years
Data table showing the results on the values of weights and CHF for one of the districts namely Purba Bardhman was presented here (Table 2) for easy understanding by the readers. Such tables for the remaining districts were not presented here since all of them followed the same approach and interpretation and also to avoid too many tables in the manuscript. The district has 220 IUs, which were clubbed into 17 groups. The number of IUs in these groups ranges from 9 to 20. There were eight input indicators in the CHF model. The data matrix consisted of 220 IUs, eight input indicators, and 5 years. CHF was developed separately for each group in two stepsthe first step was CHF generation with past years' data, and the second step was to apply the weights to the normalized indicators of the current crop season, i.e., 2020, to generate its CHF. Weights of sub-indicators vary between groups but remain the same for different IUs within a group. Weights are the same for different years in a given group. Therefore, CHF values of an IU for different years are comparable and can be used to discriminate the years based on crop performance. CHF maps of five years from 2016 to 2020 along with 4-years (i.e., 2016-2019) average map covering all the IUs of the State were presented in Figure 4. CHF values of the current year (2020) and past years average for different IUs were plotted for one district namely Purba Bardhman in Figure 5.

Validation of CHF
Although CHF is expected to act as a close yield proxy, CHF values and the current yield estimates of IUs are not directly comparable because CHF is derived based on the total enumeration method, whereas the yield estimates are based on a few manual measurements. Further, the reliability of yield estimates is also a concern, as discussed in previous sections, limiting such comparisons. Therefore, validation of CHF in this project was done at a coarser level (i.e., block). The underlying assumption is that the current yield estimates are relatively error-free when aggregated to coarser scales. Block represents a group of IUs. Multiple blocks constitute a district. Yield and CHF data of IUs were aggregated to block level, based on an area-weighted approach considering the respective paddy area. Data gaps in yield data were identified and excluded from the analysis. Thus, the CHF and yield data were available for four years from 2016 to 2019 for different blocks.
CHF is an index ranging from 0-1, whereas the yield data is in absolute quantities, i.e., kgs per ha. Therefore, CHF deviation and yield deviation between two years were compared. Deviations were expressed as percent deviation of year1 CHF/yield from that of year2. Data of CHF and yield deviations for all pairs of years from 2016 to 2019 were generated and pooled. The Scatter diagram between CHF deviations and yield deviations in Figure 6 reveals that these two deviations were well correlated with the correlation coefficient of 0.85. Reduction in CHF was found to be associated with reduction in yield and vice versa. Thus, sensitivity of CHF to yield changes was established.
Considering the bias in IU yields and the averaging effect when aggregated to slightly coarser level, the extent of association achieved in this analysis is noteworthy establishing the fitness of CHF for its targeted application.
Besides the correlation analysis, the CHF-based crop performance in the current year was also verified using the ground truth data collected during the season with Mobile App. Attribute data in the mobile app related to crop conditions were used to corroborate with CHF deviations. Farmer inquiries were made to infer the crop status. This comparison has indicated good agreement between CHF and the field situation. However, this is only an indicative analysis and not an objective assessment.

CHF based loss assessments and claims settlement in 2020 crop season
The 2020 crop season was, in general, a normal season with no major incidents of crop risk. CHF-based crop risk assessment and compensation payouts adopted the following formula, and the claim calculation procedure for an IU is furnished in Table 3. As per the notification of the Government, the indemnity level for paddy crop is 80% of the normal CHF called 'Threshold CHF' for each IU. Normal CHF is the average of the past four years in the current project, as per the notification. That means 80% of normal crop performance (CHF) was guaranteed . The % reduction from the Threshold CHF determines the insurance payout. If the current year's CHF of an IU is 70% of its Threshold CHF, the insurance payout would be 30% of the sum insured, and all the insured-farmers in the IU would be compensated uniformly as per area approach principle. Thus, the insurance mechanism here was modified as 'area-crop performance' from the existing 'area-yield' basis. Although the 2020 crop season was generally normal, there were certain IUs with poor crop conditions. These IUs were detected by CHF deviations leading to a total eligible claim of Rs. 1130 Million. Loss assessment and claims determination was completed within 30-45 days after crop harvest. There is scope for reducing the turnaround time further in subsequent seasons once the methodology is fully stabilised.

Accounting for end-of-season risks in the CHF
The crop risks that occur during the reproductive stage of cropsflowering to harvest period, i.e., in the time window of 30-40 days before harvest, are not parameterized in the CHF model. Hence, a Correction Factor (CF) approach was developed to correct the CHF for such risks. These risks include (a) unseasonal rains/Floods/Cyclones (submergence, lodging, panic harvest etc.), (b) weather aberrations such as hot and dry winds, rise in temperatures (poor grain setting and grain development), (c) weather-induced pest/disease incidence and (d) Pest incidence following floods (BPH in rice) Generally, floods and cyclones occur during the pre-harvest period of the crop in the State, causing crop lodging, submergence, and mechanical damages in extreme cases. Lodged crops sometimes recover, leading to reduced losses. SAR data offers a unique opportunity to discriminate between lodged and normal crops, thus helps monitor crop lodging caused by various risk events (Han et al. 2017, Van Delden et al. 2010. Chouhan et al. (2020) introduced the concept of Crop Angle Inclination to quantify the crop lodging effect and assessed it using multi-temporal and multipolarimetric SAR data. A sudden increase in cross-polarized return was observed over the affected jute crop area due to lodging and partial inundation Chakraborty et al. (2021). Thus, satellite data of pre-and post-event could effectively detect the changes of crop vigor and surface wetness. If combined effectively, it can produce an objective assessment of the affected crop area.
The methodology framework for generating CF in the event of crop risks occurring in the terminal part of the season has two components, namely (a) mapping the

Hedging effectiveness
Hedging effectiveness (HE) of insurance contracts indicates their risk-reducing potentials. Hedging effectiveness varies with the indices used in designing the contracts. A more effective contract is associated with higher risk-reducing efficiency and lower basis risk. Vedenov and Barnett (2004) applied three different measures, namely mean root square loss, value-at-risk, and certainty-equivalent revenues on the time series yield data, to measure the efficiency of insurance instruments. HE of insurance contracts is determined by comparing the semi-variance of un-insured yields with insured yields using simulated data points (Vedenov and Barnett 2004). K€ olle et al. 2020 observed that HE varied among indices VCI, TCI, and VHI and provinces and Gumbel copula showed higher efficiency than Gaussian copula. Salgueiro (2019) carried out HE with real and simulated detrended yield data and computed the Expected Shortfall.
Quantitative assessment of HE, through the metrics reported in the past studies, requires time series data on CHF and yield, which is a limitation in the current project. Availability of moderate resolution satellite indices in 10-20 m from Sentinels have started from 2016, and hence the CHF data in the current project is available from 2016 only. The yield data at the IU level is available for the past eight years but it has many quality-related issues. Therefore, as an indicative analysis, hedging patterns of yield and CHF-based methods were evaluated by comparing the crop loss assessments resulting from these two methods.
CHF and yield data of paddy of IUs for 2016, 2017, 2018, and 2019 were analyzed. A small fraction of IUs was not included in the analysis due to yield data gaps. 2019 crop season was assumed to be the insurance-assessment season and the threshold value generated using the CHF data of previous 3 years values (2016, 2017, and 2018). The insurance loss assessment for the year 2019 was computed based on a % reduction of CHF/ Yield from respective thresholds, and represented in four classes as given below 1. Greater than -10% deviation i.e. either no loss or loss is less than 10%, 2. -10 to -20%, deviation i.e., loss is 10 to 20%, 3. -20 to -30% deviation, i.e., loss is 20 to 30% and 4. Less than -30% to -40% i.e., loss is 30-40%. Table 4.

Number of IUs under different agreement classes of CHF and yield losses is summarized in
It is evident from Table 4 that in 1466, i.e., 54 percent of IUs, diagonal elements in the Table, there was a perfect agreement between the losses assessed by CHF and Yield. In 519 IUs (19%), there was one class deviation (þ/-side) between the two. Of these, in 348 IUs, the yield-based loss was higher (up to 20%), and in 171 IUs, CHFbased loss was higher (up to 20%). Again, in 518, i.e., 19% of IUs, the yield-based loss was significantly higher (>20%) than that of CHF. In 207, i.e., 8% of IUs, CHFbased assessment leads to significantly higher payouts than yield. Even if there is a good agreement between CHF and yield methods, choosing the CHF approach is better because the methodology is more transparent and less vulnerable to bias.

Discussion
Methodological interventions for improving crop risk assessment make the crop insurance products actuarially stronger with wider acceptability. Several crop insurance evaluation studies conducted by the World Bank and others (Rao 2010, Department of Agriculture andCooperation 2014) have widely acknowledged that emerging technologies like remote sensing and GIS need to be effectively utilized to improve the crop yield data and crop risk assessment in crop insurance. Replacing or reducing the dependency on the conventional yield estimates is the most important requirement to enhance effectiveness and sustenance of the area-yield insurance scheme in India.
As discussed in previous sections, models and algorithms to generate the yield estimates are still not mature for operational implementation. Klompenburg et al. (2020), in their review on Machine learning/deep learning algorithms on crop yield prediction, indicated that the most challenging aspects were selecting the best model and best input datasets and suggested further research to address these challenges. They found that the most used data sets in the model are features on temperature, rainfall, and soil type, and the most applied algorithm is Artificial Neural Networks. Despite numerous research reported on crop yield estimation using various models, the performance of these models is still to be improved to desired levels (Filippi et al. 2019). Farmers and other stakeholders in the insured regions could not understand the complexity of crop models and hence prefer the indices that they can observe themselves (Binswanger-Mkhize 2012).
Numerous data, indices from satellites and weather stations, and the techniques of data analysis currently available offer enormous scope for close monitoring of crops. For example, NASA developed the COVID-19 Earth Observation Dashboard, and European Space Agency (ESA), and the Japanese Aerospace Exploration Agency (JAXA) are using satellite data to monitor the impacts of COVID-19 on the agricultural situation worldwide. The wealth of data from multiple satellites is being used to track crop planting, harvest patterns, and crop health progression (https://www.esa. int/Applications/Observing_the_Earth/). Therefore, the current project has focused on exploiting richly available satellite data to develop a crop health index for insurance application. Satellite data utilization for paddy crop mapping, phenology tracking, and condition monitoring is a wellestablished application. Many of the abiotic and biotic stress effects are detected by satellite indices. Numerous research publications show that backscatter data has a unique advantage for rice monitoring and flood impacts assessment. Thus, remote sensing of paddy crop, a proven application, was exploited in this project to achive the goal of developing a measure for crop loss assessment.
The novelty of this work is the development of a unified index summarizing the crop performance, based on a multi-parameter approach involving spectral indices, weather data, and local crop growing conditions. The multiple input indicators of the CHF model complement and supplement each other and permit a comprehensive assessment of the crop situation. Entropy analysis followed here is a well recognized method for generating unbiased weights for the input variables. Strong correlation between CHF and yield deviations is an interesting result in this project that has a direct bearing on the end-use of CHF for crop insurance. The implementation of CHF based crop risk assessment in crop insurance reflects stake holders' acceptance of such technology intervention.
Thus, this study has made an entry point for introducing alternative risk assessment measures such as CHF in the conventional 'Area-yield framework' substituting the yield data. The new index can also be combined with the existing yield estimates with certain weightage, thereby minimizing the impact of biased yield estimates and improving the crop insurance contract. It is also evident from the current project that assessing crop performance with multiple satellite indices is more practical, objective, and transparent than estimating crop yields with cumbersome manual measurements. The composite index procedure does not involve complex empirical derivations, data transformations, thus making it easy to compute and interpret.
The claims settlement process becomes more objective, systematic, transparent, and timely by adopting CHF. Such features would eventually regulate the insurance premiums also. The CHF framework permits full control on data and methodology and is highly amenable to adopting advanced data analysis techniques such as machine learning for further improvements. Calibration of historic years' CHF data with new developments happening subsequently is easy where as in case of yield data such adjustment of historic data is difficult. The CHF methodology can be extended to other paddy growing regions also. We hope it also works well for other crops like wheat, where the relation between satellite indices and yield is widely reported.
CHF type index insurance greatly reduces moral hazard, adverse selection, cost of insurance and enhances the actuarial soundness of the scheme. With an independently verifiable index like CHF, reinsurance support is also assured and thus enabling insurance companies to transfer part of their risk.
The CHF's risk-reducing ability is also more than that resulting from the yield data because of its stronger scientific basis, lesser proneness to moral hazard factors, and lesser basis risk. In respect of economic viability also, the new approach scores better because satellite-based surveillance is much cheaper now-a-days than ground-based methods. This is particularly true for agriculture applications covering larger geographic areas. The use of data-centric technologies and geospatial tools of data analysis are going to increase in the agriculture sector in the future, and dependence on technology-driven indices for evaluating agriculture performance is expected to be more.
The important challenge is to evolve a correction mechanism to CHF to account for multiple crop risks occurring in the season, that are not detected by the input indicators defined in the model. This can be addressed by adopting picture-based data collected by smart phones. Ceballos et al. (2019) conclude that smart phone based field photographs complement the existing insurance products. Longer time series data of CHF is needed to model the distribution and assess the hedging effectiveness. Non-availability of moderate resolution optical and SAR data for historic years is a limitation. As a way out, the CHF and yield data points of five years generated in this project, can be simulated using parametric distributions and such data can be analysed for measuring the hedging effectiveness of the two measures. Implementation of the proposed CHF model requires infrastructure, data collection systems, and skilled human resources. The impact of such technology interventions in crop insurance has to be assessed by conducting farmer surveys and analyzing various associated parameters.

Conclusion
The need for either replacing or reducing the dependence on the current yield data in crop insurance is increasingly recognized by all the stakeholders in India to sustain crop insurance contracts. The results of the current work are quite relevant to address such well-recognized requirement in crop insurance in the country. The new basis for crop loss assessment reported here indicates that the indexbased insurance linking payouts to the crop performance proxies rather than measured losses is a workable proposition to improve the crop risk-sharing mechanism.
Manual yield measurements and the resulting yield estimates for the insured regions are fraught with limitations related sampling methods, enumeration procedure, and moral hazards. Similarly, adopting yield modeling techniques is also not feasible now since such techniques for local level applications are not mature yet for routine implementation. Therefore, adopting a satellite-based index of crop performance is practical, objective, and transparent. The ever-increasing challenges with the existing yield based crop loss assessment are thus addressed through a technologybased solution in this project.
Our project has introduced an alternate measure of crop risk assessment called CHF exploiting the potential of richly available remote sensing data. This new measure has successfully replaced the conventional yield data in the existing area-yield crop insurance contract. If the replacement of age-old yield measurement system is difficult, a hybrid approach can be formulated with a combination of yield and CHF data to reduce the dependence on bias-prone yield data.
With an independently verifiable index like CHF, reinsurance support can be easily available, enabling the insurance companies to transfer part of their risk. Thus, CHF type index-insurance has many beneficial features over conventional yield-based insurance. The CHF-based index insurance method developed in this project needs to be replicated for another 1-2 years to fine-tune the model, check the consistency of results. Such efforts are needed before hand-holding the technique to a third party for regular implementation. There is scope for strengthening the index further with additional features.
Thus, adopting data-driven and evidence-based indices like CHF would certainly bring a paradigm shift in the crop insurance system and result in sustainable business models. The impact of such transformative insurance solutions at the grassroots level must be assessed by conducting systematic and structured farmer surveys. The outcome of such exercises is helpful to address the gaps in the new approaches. this project could not have been initiated and completed. Sincere thanks to the former and the present Directors of NRSC and Deputy Directors (Remote Sensing Applications), NRSC, who have continuously supported this innovative project. Comments of the anonymous referees have helped to improve the manuscript, and we remain thankful for the same.

Data availability statement
Open-source data and materials that support the analysis and results of this paper will be shared. Other input datasets, derived datasets, will be shared in a restricted manner at the authors' discretion since it involves the financial and intellectual resources of the authors. Further, the research reported here has the potential implications for developing new crop insurance models for which the interests of the original contributors need to be protected.

Disclosure statement
No potential conflict of interest was reported by the authors.