Estimation of rainfall-induced surface runoff for the Assam region, India, using the GIS-based NRCS-CN method

ABSTRACT The NRCS-CN method, integrated with GIS and remote sensing, can be used for estimating curve numbers (CN) and surface runoff in geohydrological systems. The study area is divided into 63 sub-basins, and the land use land cover (LULC)-hydrologic soil group (HSG) complex is identified for each sub-basin. The CN values for three antecedent soil moisture (AMC) conditions are calculated and corrected for surface slope variations. The surface runoff depth is determined using the rainfall data for 16 years (2005–2020). The average runoff depth and mean annual precipitation ranges from 444.50 to 1960.55 mm and 936.99 to 3520.55 mm, respectively. For all sub-basins, strong correlations between runoff depth and rainfall (R2 ≥ 0.8) as well as between simulated runoff and measured runoff (R2 ≥ 0.8) are observed. The Nash–Sutcliffe model efficiency coefficient (NSE) values suggest that the model's efficiency is good to satisfactory.


Introduction
With increased urban sprawling and changes in landuse patterns, the available water resources on the earth are experiencing excessive pressure (Bera et al., 2021;Satya et al., 2020). The threat of urban flooding is also arising due to increased impervious land, a higher ratio of rainfall to infiltration rate, and a lower groundwater recharge rate (Devi et al., 2019;Mukherjee et al., 2018;Pathak et al., 2020). The branch of hydrology plays a significant role in effective water resource management of an area by quantifying and assessing the relationship between the amount of precipitation, catchment area, length of the dry period, storage capacity, amount of runoff, evaporation, and transpiration (Psomiadis et al., 2020;Tian et al., 2020). Floods and other natural hazards affect the socio-economic scenario of a region . Accurate estimation of rainfall-induced surface runoff is essential for water resource management and flood control measures (Abdessamed & Abderrazak, 2019).
Runoff is the significant portion of the precipitation transferred as flow to the streams or nearby waterways after being captured by the terrestrial soil and vegetation (Kumari et al., 2019). It mainly occurs when the precipitation rate exceeds the infiltration rate and depends on rainfall intensity, slope, soil texture, catchment area, LULC, etc. (Garg et al., 2013;Kumar, 2021;Rao, 2020;Saran et al., 2021). The National Resources Conservation Service-Curve Number (NRCS-CN) method, developed by the NRCS, United States Department of Agriculture (USDA) in 1976, is used frequently for estimating surface runoff (Akbari et al., 2021;Farran et al., 2021).
It is based on the basic empirical formula and estimates rainfall-runoff relationships by obtaining curve number maps, so detailed information regarding the spatial distribution of LULC, and soil profile are required (Ajmal et al., 2016;Deshmukh et al., 2013;Ebrahimian et al., 2012;Zhang, 2019). Remote sensing (RS) and geographical information systems (GIS) act as efficient and powerful tools for hydrogeomorphological modeling Patil et al., 2008;Verma et al., 2017). The application of GIS has enabled the use digital elevation model for hydrological modeling based on the topography of a region. RS helps to extract the topography, LULC, soil profile, drainage characteristics, etc., of a region that can be stored as a geo-referenced database in GIS. The extracted layers are then combined with meteorological parameters in the GIS environment to analyze further, interpret, and visualize to develop rainfallrunoff models (Ebrahimian et al., 2009;Mohammad & Adamowski, 2015;Rajbanshi, 2016). Using traditional methods to calculate runoff from a given watershed, especially for inaccessible terrain, is timeconsuming and complex due to the spatiotemporal variability of hydrological parameters, but the application of geospatial datasets has made it reliable, flexible, and accepted by researchers, hydrologists, and planners (Ajmal et al., 2016;Bal et al., 2021;Lian et al., 2020).
Many researchers have estimated GIS-based surface runoff using the NRCS-CN method (Al-Ghobari et al., 2020;Kumar et al., 2016;Rao, 2020;Tirkey et al., 2014;Verma et al., 2017). Satheeshkumar et al. (2017) analyzed the efficiency of GIS-based runoff estimation for the Pappiredipatti watershed using the NRCS-CN model for different AMCs. Rawat and Singh (2017) observed that rainfall and runoff, estimated by the NRCS-CN method using LANDSAT-7ETM+, NOAA data for the Jhagrabaria watershed in Allahabad district, are strongly correlated. Pathan and Joshi (2019) estimated runoff of the Karjan reservoir basin in the Narmada district by the GIS-based NRCS-CN method and found a strong correlation between measured and simulated runoff. Kumar (2021) performed the runoff estimation for the Sind river basin by integrating the NRCS-CN method with GIS and concluded that the technique is efficient for large datasets and broader environmental sites.
The present study aims to develop a GIS-based rainfall-induced Srunoff model for the Assam region, India, using the NRCS-CN method. Floodplains of the Brahmaputra river mainly cover the Assam region, and it is highly flood-prone during the monsoon. It provides a decision-making support system for implementing effective and sustainable water resource management and flood mitigation measures for the flood risk zones.

Study area
Assam region is situated in the north-east of India (24°8 ′ N to 28°2 ′ N latitude and 89°42 ′ E to 96°E longitude), covering approximately 78,438 km 2 , with an elevation of 45-1964 m above the mean sea level. It is mainly divided into five administrative divisions, i.e. Barak valley, Lower, Northern, Upper, and Central Assam (Figure 1). This region is prone to multiple natural hazards (Dixit et al., 2016;Raghukanth et al., 2011). The climate is tropical monsoon rainforest with heavy rainfall and high humidity (Garaga et al., 2020;Gupta & Dixit, 2022). The temperature in summers ranges from 32°C to 38°C and in winters from 8°C to 20°C. The region experiences heavy rain due to the southwest monsoon, mainly from May to September, and the annual rainfall ranges between 1500-3750 mm (Sharma et al., 2020). The population primarily depends on agriculture as their source of income, and the population growth rate is about 16.93% (Census, 2011).
Due to heavy monsoon and the Brahmaputra river, the Assam region experiences floods every year, lakhs of people become homeless, and damages to agricultural lands and crops occur. (Kumar, 2021;Pal & Singh, 2018).

Dataset used
In the present study, elevation, LULC, rainfall, etc., datasets are obtained from different sources, and details of the dataset are provided in (Supplementary  Table S1).

National Resources Conservation Services-Curve Number (NRCS-CN) method
The NRCS-CN method estimates direct runoff using a simple empirical equation based on the rainfall, soil, LULC, and the curve number (CN) (Figure 2) (Mishra et al., 2018;NRCS, 2004;USDA, 1985). The HSG-LULC complex and antecedent soil moisture (AMC) determines the curve number (Ebrahimian et al., 2009;Mohammad & Adamowski, 2015). In the NRCS-CN method, the ratio of actual retention (F) to the potential maximum retention (S) is equal to the ratio of direct runoff (Q) to rainfall (P) minus initial abstraction (I ) as given by equation (1) Here, the unit of F, S, P, Q, and I are in mm. Actual retention, F, can be given by the equation (2) Combining equation (1) and equation (2), Q can be defined as shown in equation (3) The relation between initial abstraction (I ) and potential maximum retention (S) is shown by equation (4), where λ is the initial abstraction coefficient varying from 0 to infinity, for the general use value of 0.2 is recommended.
The value of Q can be expressed for two different conditions of P as given by equation (5) and equation (6).
For P ≤ lSQ = 0 (6) I is a function of S, and replacing I with 0.2S, equation (7) is obtained.
The relation between potential maximum retention S and Curve Number CN can be represented as (Pathak et al., 2020;Tirkey et al., 2014): where CN is a dimensionless parameter ranging between 0-100, the potential maximum retention S can vary between 0-⍰, and the constant 254 represents S in mm. Equation (8) establishes the relationship between S and CN to show the linear trend of the operations like averaging, weighting, and interpolation (NRCS, 2004). Three levels of AMC are used: AMC-I, AMC-II, and AMC-III for dry, normal, and wet conditions, respectively (NRCS, 2004).
The CNII is further converted to CNI and CNIII for AMC-I and AMC-III, respectively, by equations (9) and (10).
Each basin consists of a different LULC-HSG complex, so CN values are determined for each complex, and finally, the composite curve number (CN w ) is estimated by weighting the resulting CN values by using the equation (11) where CN i is the curve number of the sub-region, A i represents the sub-basin area with the associated curve number, and A represents the total study area. The slope is a significant factor in determining runoff and CN, so it is essential to incorporate slopecorrected CN values (Garg et al., 2013). In the present study, the CN values are adjusted to include the slope factor using equation (12) given by Huang et al. (2006) CNII α signifies slope-corrected CNII for normal conditions, and α is the average slope of the basin.

Delineation of sub-basins and extraction of streams using DEM data
The elevation profile of the study area is generated from the SRTM Digital Elevation Model (DEM) of 1 arc-second (30 m resolution) stream network, and 63 sub-basin layers are extracted from it using GIS hydrology tools ( Figure 3a) (Taesiri et al., 2020). The vertical accuracy of SRTM DEM 30 m is more accurate, and it has a better quality to extract slope and drainage network (Rawat et al., 2019).

Accuracy assessment of LULC
The LULC map for the study area is derived from Sentinel-2 imagery (10 m resolution) and classified into eight classes (Karra et al., 2021). With the help of Google Earth satellite images, the accuracy of the LULC map is analyzed by determining overall accuracy (OA) and Kappa (K) statistics using a detailed error matrix for each classified image (Hishe et al., 2020;Vignesh et al., 2021) using equations (13) and (14), respectively. OA mainly considers the relative weight of each class and is calculated by dividing the total number of accurately classified pixels by the total number of pixels taken into consideration (Verma et al., 2017).
where n represents the number of classes, M ii represents the number of sample points in row i and column i of the matrix, and M ij are row i and column j elements. The range of the Kappa coefficient lies between 0-1, and the higher coefficient value denotes higher accuracy (Rwanga & Ndambuki, 2017). It is mainly applied to assess the random classification results and accuracy of remote sensing imagery (Talukdar et al., 2020).  N denotes total sample points, M ii represents the number of sample points on the diagonal of the matrix, M +i and M i+ are the number of samples of each row (r) and column (i), respectively.
For each LULC class, the User's accuracy (UA) and Producer's accuracy (PA) are also calculated using equations (15) and (16) N ii represents the number of samples of each class correctly matched in the i th row and column, N irow equals the total number of samples in the i th row, and N icolumn represents the total number of samples in the i th column.

Determination of correlation between rainfall and runoff depth
To estimate the correlation between rainfall (P) and runoff depth (R), linear regression is performed for 63 sub-basins using the equation given by Subramanya (2006) shown in equation (17). The coefficient of determination (r) of each sub-basin is obtained from equation (18) N denotes the number of observations, n is the intercept, and m is the slope of the linear regression straight line.

Determination of correlation between simulated and measured runoff depth
Generally, the actual runoff is calculated by the field measurement following a rainfall event. In the present study, since the conventional hydrological data were not available for all the sub-basins, so the measured runoff of each sub-basins is calculated with the help of the regression equation (19) by considering N number of observations of runoff determined by NRCS-CN (R) and rainfall (P) as input parameters (Chanu et al., 2015;Kumar et al., 2016). The constant x and y can be calculated using equations (20) and (21).
2.8. Performance evaluation of the method Performance evaluation of the method used in the present study is conducted to determine the goodness of fit between the simulated and measured values (Ritter & Munoz-Carpena, 2013). To evaluate the accuracy of the method, Nash-Sutcliffe model efficiency coefficient (NSE) values are determined as given by equation (22) (Chanu et al., 2015;Deshmukh et al., 2013;Nash & Sutcliffe, 1970;Zhang, 2019) Here, Q msd , Q sim, and Q mean denote the measured, simulated, and mean of the measured runoff depth in mm. If the value of NSE is 100%, it indicates that a perfect agreement exists between measured and simulated values. The NSE will become zero for the equal measured and simulated runoff depth value.

LULC map and its accuracy assessment
The LULC map is classified into eight classes, and its accuracy is validated by the overall accuracy assessment and Kappa coefficient determination method. A total of 884 sampling points are taken and validated by Google Earth images. The value of overall accuracy and Kappa coefficient are 89.81% and 0.87, respectively. The User's and Producer's accuracy for each LULC class is greater than 80% (Table 1) (Zhou et al., 2018).
According to the LULC map, most of the study area is covered by forest and agricultural land (Figure 3b). Forest areas are predominant in Central Assam, the hilly region, whereas agricultural lands are dominant around the flood plains of the Brahmaputra river and Barak river basin extending from Lower to Upper Assam. A high built-up area density is observed in Lower, Northern, And Upper Assam. The areas with low vegetation cover like the urban, barren land, and scrubland have higher runoff volumes and show a good correlation between rainfall and runoff. Similar results were shown by Bal et al. (2021), Koneti et al. (2018), and Mukherjee et al. (2018), where a strong correlation between vegetation cover and runoff depth is observed.

Hydrologic soil group (HSG)
The HSG map is classified into four classes of HSG, i.e. A, B, C, and D (Figure 3d). According to NRCS US (2009), the four groups of HSG differ in texture, water transmission rate, and infiltration capacity as (i) Group A (sand, loamy sand, or sandy loam soil) has low runoff potential, high infiltration rates, and high water transmission rate. (ii) Group B (silt loam or loam soil) is characterized by having a moderate infiltration rate (iii) Group C (sandy clay loam soil) with moderately fine to fine texture has a low infiltration rate. (iv) Group D (clay loam, silty clay loam, sandy clay, silty clay, or clay) are clay soils with the highest runoff potential and very low infiltration rates.
The Brahmaputra basin is predominantly covered by HSG C, and the Barak river basin is dominated by HSG D. In some parts of Lower, Central, and Barak valley HSG B (loam) is present. HSG A is mainly present in the upper part of Lower Assam and has the highest infiltration rate (NRCS, 2009). Due to the dominance of HSG C and HSG D, the study region have a slow transmission rate and infiltration rate; as a result, it can contribute to a large amount of surface runoff during a rainfall event (Al-Ghobari et al., 2020). Therefore, the sub-basins with HSG C and HSG D will have higher runoff depth.

Estimation of curve number (CN)
In the present study, the CN values of each sub-basin are determined by intersecting the LULC and HSG layers. The CNI, CNII, and CNIII values are determined for all corresponding AMCs, i.e. AMC-I, AMC-II, and AMC-III conditions ( Table 2). The CNI values for   (Al-Ghobari et al., 2020;Kumar et al., 2016). Shi and Wang (2020) and Huang et al. (2006) highlighted the significance of slope corrected curve number, the surface runoff generation increases with a steeper slope. So, the CNI, CNII, and CNIII values are further corrected according to the surface slope variation of the study area (Garg et al., 2013;Huang et al., 2006;Shi & Wang, 2020). The slope corrected CNI, CNII, and CNIII are estimated between 48-72, 69-86, and 83-93, respectively, which slightly varies from the slope uncorrected curve numbers ( Figure  4d-f). Due to the small area falling under the steep slope class, LULC and HSG exceeded the effect of the slope factor and thus causing only a slight variation of the spatial distribution of curve number in the study area. In the case of Central Assam and Barak valley, the slope corrected curve numbers are less than floodplains because of high vegetation density.

Rainfall data
The spatial distribution of the mean annual precipitation (MAP) is estimated using the inverse distance weighting (IDW) interpolation method (Caloiero et al., 2020; Al-Ghobari et al., 2020). MAP ranges between 1552-3520 mm for Lower, and Upper Assam, 1956 -3520 mm for Barak valley, 1552 −1955mm for Northern Assam, and 936-1551 mm is observed for the Central Assam region (Figure 5a). The daily rainfall data is further analyzed to determine the AMC condition to estimate CN values associated with AMC, initial abstraction for each year, and corresponding maximum retention value (Farran & Elfeki, 2020;Karunanidhi et al., 2020;Sanz-Ramos et al., 2020).

Estimation of surface runoff
The runoff depth ranges from 444 mm to 1724mm (Figure 5b). In the Lower Assam, most of the site shows moderate to very high runoff depth ranging between 956-1724mm due to high built-up area, high CN, high precipitation, and less vegetation cover. For the Northern and Upper Assam, the runoff depth falls in 444-1394 mm, a very low to high range. In the case of Central Assam, due to dense vegetation cover, low precipitation, and lower value of CN, the estimated runoff depth lies in very low to low class, i.e. 444-955 mm. Areas with high rainfall, less vegetation cover, and impervious soil profiles are experiencing high surface runoff due to decreased infiltration rate and water abstraction (Deshmukh et al., 2013). More runoff depth is generated on steep slopes than on flat plains due to increased slope angle, the infiltration rate, initial  abstraction, and recession period decrease (Ajmal et al., 2016;Ebrahimian et al., 2012).

Estimation of correlation between rainfall and simulated runoff
The present study analyzes the relationship between rainfall and simulated runoff by estimating the correlation coefficient using linear regression for each subbasin (Subramanya, 2006;Tirkey et al., 2014). The value of R 2 for each sub-basin is greater than 0.80, which shows that a strong correlation exists between rainfall and runoff depth (Supplementary Table S2). The result of the linear regression is illustrated for some of the sub-basins (Figure 6a-f).

Estimation of correlation between simulated and measured runoff depth
The correlation between the measured and simulated runoff is estimated for each sub-basin, and the value of R 2 is found to be greater than 0.80. It shows that the measured runoff of each sub-basin is strongly correlated with the simulated runoff (Supplementary Table S2). The correlation coefficient values of subbasin 6,11,26,38,40, and 50 are shown ( (Figure 7a-f)).
The temporal variation among rainfall, simulated, and measured runoff from 2005 to 2020 shows that with the increase in precipitation, the simulated and measured runoff depth is also increasing (Figure 8).

Validation of the model
The NSE assessment reveals that the NRCS-CN method is efficient and reliable in predicting runoff depth for the Assam region. The range of the NSE coefficient estimated for the study area lies between 0.623-0.786 (Supplementary Table S2). According to Moriasi et al. (2007), the model's performance falls under the satisfactory class for NSE coefficient values ranging between 0.50-0.70, and in the good class from 0.70-0.80. Out of 63 sub-basins, the NSE value of 45 sub-basins falls under the good category.

Discussion
For effective flood risk management in flood-prone regions like Assam, it is essential to analyze the relationship between precipitation and runoff within a watershed. LULC and soil condition of land affect its infiltration capacity, and surface runoff occurs when precipitation rates exceed the infiltration rate. Figure 7. Correlation between simulated and measured runoff depth. (Bal et al., 2021;Bera et al., 2021;Deshmukh et al., 2013;Satya et al., 2020). The study results indicate that the NRCS-CN model is effective and reliable for the Assam region, and it can also be applied to other areas with a high potential of waterlogging during heavy rainfall scenarios. The areas with the dominance of soil groups like HSG C and HSG D, with low infiltration rates, and settlement areas with impervious land experience higher runoff (Akbari et al., 2021;Kumari et al., 2019). For some parts of the study area, the effect of surface slope variation is also shown on the curve number estimation (Ajmal et al., 2016;Ebrahimian et al., 2012;Garg et al., 2013). The correlation coefficient estimated for the rainfall vs. simulated runoff and simulated runoff vs. measured runoff of each sub-basin shows a strong positive correlation and the results are comparable with the studies conducted by Al-Ghobari et al. (2020), Zhang (2019), Ajmal et al. (2016), Chanu et al. (2015), and Tirkey et al. (2014). The strong positive correlation obtained in the present study enhances the suitability of the NRCS-CN model to estimate surface runoff. Further, the model's efficiency is examined by carrying out an NSE assessment for each sub-basin, and the majority of the correlation value falls under the good category (Oliveira et al., 2016;). It is also evident that geospatial techniques can aid in establishing the precise relationship between rainfall and runoff by providing accurate information on watershed characteristics located in rugged and inaccessible terrains (Verma et al., 2017;Zhang, 2019). The study also shows that social and ecological components of a system interact in a complex and non-linear manner. It is essential to give attention to better irrigation practices, water conservation policies, developments of green urban spaces, and improvement of drainage networks.

Conclusion
The present study conducts runoff estimation for the Assam region using the GIS-based NRCS-CN method.
The study area is divided into 63 sub-basins, and the LULC-HSG complex is generated using GIS for each sub-basin. The study area shows the dominance of forest and agricultural land with HSG C and HSG B types of soils. The mean annual rainfall is estimated from 2005 to 2020, and the study area receives a relatively good amount of precipitation ranging from 936 mm to 3520 mm. For each LULC-HSG complex, CNI, CNII, and CNIII for AMC-I, AMC-II, and AMC-III, respectively, are estimated and are further corrected according to the surface slope variation. It is found that higher values of CN are associated with steeper slopes. The rainfall and simulated runoff of each sub-basin are strongly correlated with R 2 greater than 0.800. A similar result is observed for the correlation coefficient between simulated and measured runoff depth for each sub-basin. The runoff depth for the study area ranges from 1960 to 444 mm. Moderate to very high runoff is observed in the Lower Assam, very low to high range in the Northern and Upper Assam, and very low to low for Central Assam. The study also reveals that the spatial variation of runoff depends on the distribution of LULC, soil permeability, and rainfall intensity. The value of the NSE coefficient of each sub-basin also suggests that the model's performance falls from satisfactory to good. Compared with the conventional hydrological method, the NRCS-CN method is more suitable for estimating rainfall-induced surface runoff for the Assam region.
Estimation of surface runoff is critical as it demands accurate LULC, soil, meteorological conditions, and assigning correct CN values. The use of remote sensing and GIS helps in the extraction and simulation of the required parameters and provides a better technique to estimate the hydrological characteristics of the watershed. The result of the study helps to understand the rainfall-runoff behavior of the study area. It will also enhance the development of water resources and flood management systems at the sub-basin or watershed level by integrating remote sensing and GIS applications where data availability is limited.
Software ESRI ArcGIS 10.5 is used for the spatial analysis of data, generation of different layers used in the study, and calculation of the estimated parameters.
LULC map is extracted from; https://livingatlas. arcgis.com/landcover/ Google Earth Pro is utilized for the accuracy assessment of the LULC map.
The coordinate system utilized for the analysis is UTM-WGS 1984, zone 46N.

Data Availability Statement
The authors confirm that the data supporting the findings of this study are available within the article and some of the raw data were generated at our laboratory and derived data supporting the findings of this study are available upon reasonable request.

Declaration of Competing Interests
The authors declare that they have no known competing financial interests or non-financial interests or personal relationships that are directly or indirectly related to the work submitted for publication that could have appeared to influence the work reported in this paper.

Disclosure statement
No potential conflict of interest was reported by the author (s).