Analysis of soil organic matter influencing factors in the Huangshui River Basin by using the optimal parameter-based geographical detector model

Abstract To quantitatively analyze the spatial distribution of soil organic matter (SOM) content and its main influencing factors in the Huangshui River Basin, Qinghai Province, China, a total of 13 factors including topography, climate, soil properties, and vegetation cover were selected. The influence of these factors on the spatial distribution of SOM content was analyzed using the geographical detector model and the Pearson correlation coefficient method. The results showed that the average SOM content in the cultivated soil layer of the Huangshui River Basin was 28.38 g·kg−1. The results of factor detection and Pearson correlation analysis indicated that the three dominant factors causing differences in the spatial distribution of SOM content were mean annual temperature, altitude, and soil pH. Furthermore, SOM content was negatively correlated with temperature and positively correlated with elevation. When the factors interacted, they exhibited a type of bi-enhance or nonlinear-enhance. The interaction between two factors improved the explanatory power of SOM, indicating the complexity of factors causing spatial distribution differences in SOM. This study provides a favorable basis for precision fertilization, soil improvement, and of soil quality improvement in the Huangshui River Basin.


Introduction
Soil nutrients are a critical regulator of ecosystem responses to global change, and their role in regulating ecosystem structure and function is widely recognized (Sundert et al. 2020).Soil organic matter (SOM) is a crucial indicator for assessing soil fertility and quality, and it is a major component of the terrestrial carbon cycle (Xu XB et al. 2023).Furthermore, SOM has a strong association with global warming (Ondrasek et al. 2019;Chen SC et al. 2022).Although SOM content accounts for a small fraction of soil composition, it plays a crucial role in promoting vegetation growth and augmenting soil fertility.The analysis of factors influencing the spatial distribution of SOM content is vital for promoting sustainable agricultural development (Heil et al. 2022;Wang Q et al. 2022).
Soil nutrients are important soil attributes, and among them, SOM content determines soil structure and cation exchange capacity, thus influencing agricultural productivity and soil security (Xu YM, Li, et al. 2022;Yan et al. 2022;Chen SC et al. 2022;Tibhirine et al. 2023).In recent years, there has been an increasing number of studies on the factors influencing the regional spatial distribution of SOM, and the mechanisms by which different factors affect the spatial distribution of SOM content have received considerable attention.Research has shown that there are differences in SOM content between different regions in fields (Xia et al. 2022;Bouasria et al. 2022).Furthermore, with the impact of human activities, changes in land use types have a significant effect on the spatial distribution of SOM content (Goswami et al. 2020;Nadal-Romero et al. 2023;John et al. 2022).Inherent soil properties and agricultural management practices are factors that affect the dynamic changes of SOM content (Richardson et al. 2023).Crop rotation, changes in fertilizer application and agricultural management systems, among others, can influence the spatial distribution of SOM content (Han et al. 2022;Ning et al. 2022).Straw incorporation is a common agricultural management practice (Yan et al. 2022;Bouasria et al. 2021), can enhance soil physicochemical properties, which may affect SOM content.
In addition to the influence of human activities, environmental factors also impact the spatial distribution of SOM content.Research has demonstrated that SOM content increases with increasing elevation (Raluca et al. 2022).Considering altitude, slope, and different planting methods together, the explanation for the variation of SOM content can reach up to 73.8% (Huynh et al. 2022), indicating that the combined influence of human activities and natural environmental factors has a significant impact on the spatial distribution characteristics of SOM.Climate change also affects SOM content.The temperature factor, through its impact on enzyme activity, controls the rate of organic matter decomposition, thereby affecting soil fertility (Du et al. 2021).Meanwhile, precipitation can effectively alleviate agricultural drought, improve soil moisture content, and promote the utilization of SOM, thereby playing an important role in the variation of SOM content (Chen QY et al. 2020;Wang JF and Lu 2017).
In summary, various influencing factors have a significant impact on the spatial distribution of SOM.Geographical detector (Wang JF et al. 2010), as a powerful tool for exploring the relationships between geographic phenomena and driving factors, has been applied in multiple fields, including spatial heterogeneity of soil heavy metals (Liang et al. 2023;Wang GJ et al. 2022), land use change (Xu DH, Zhang, et al. 2022), land desertification (Wang YF et al. 2022), and eco-environmental quality (Long et al. 2023).Numerous researchers have revealed that the dominant factors contributing to the spatial heterogeneity of SOM are mean annual temperature, soil type, and land use type using geographical detector (Gangopadhyay and Reddy 2022;Gao et al. 2022;Wang Q et al. 2022).However, it is worth noting that the main controlling factors influencing the spatial distribution of SOM may differ with variations in geographic location and research scale (Zhao et al. 2023;Wiesmeier et al. 2019).Additionally, there is a lack of research on the analysis of influencing factors for SOM spatial distribution at the watershed scale.Therefore, analyzing the spatial variation characteristics and influencing factors of SOM in the Huangshui River Basin, China, is particularly important for watershed-scale agricultural management and precision fertilization.
When conducting a principal factor analysis using Geographical detector, it is necessary to discretize continuous variables.Previous research on exploring the driving factors of SOM spatial heterogeneity using geographical detector has often relied on empirical categorization of continuous variables, which may overlook the quantitative analysis of different discretization methods and variable categorization on the spatial heterogeneity of SOM at the watershed scale.The novelty of this study lies in the utilization of the optimal parameters-based geographical detector (OPGD) model (Song et al. 2020) to analyze the impact of five different discretization methods, including equal interval method, natural breaks method, quantiles method, geometrical interval method, and standard deviation method (SD), on the spatial distribution characteristics of SOM.The study aims to comprehensively explore the main driving factors and types of interaction that contribute to the spatial heterogeneity of SOM.

Overview of the study area
The Huangshui River is an important tributary of the of the Yellow River in its upper reaches.Located in the northeastern part of Qinghai Province, China, the Huangshui River Basin lies in the transitional zone between the Qinghai-Tibet Plateau and the Loess Plateau.The basin includes 4 districts and 3 counties of Xining city, 4 counties of Haidong City and part of Haiyan County in Haibei Prefecture, covering a significant portion of Qinghai Province in terms of politics, economy, and population.It is known as the "Mother River" of Qinghai Province and is the most economically developed region in the province.The basin is characterized by a continental climate, with high elevation and significant variations in topography, averaging around 3000 m in altitude.The basin experiences arid and low-rainfall conditions throughout the year, with an average annual temperature of around 8 � C and an average annual precipitation of approximately 400 mm (Wang PQ et al. 2022).Due to its relatively high elevation and limited precipitation, the main agricultural practices in the basin are dryland farming, focusing on crops such as corn, spring wheat, and rapeseed.An overview of the study area is shown in Figure 1.

Soil sample collection
Taking into account the ease of sampling and the convenience of transportation in the study area, a total of 110 soil samples were collected from the topsoil layer (0-20 cm) in the study area in April 2021.The latitude and longitude information of the sampling points was recorded using a Global Positioning System (GPS) receiver.The collected soil samples were naturally air-dried, impurities were removed, and they were subsequently sieved.In the laboratory, the organic matter content and its pH value of the samples were determined.Preprocessing was applied to the sampling data, and 3 soil samples were excluded due to outliers.Thus, a total of 107 soil samples were used for the study.The distribution of the sampling points is shown in Figure 1.

Other data sources
The Digital Elevation Model (DEM) used in this study was obtained from the Geospatial Data Cloud platform of the Chinese Academy of Sciences (https://www.gscloud.cn/),with a resolution of 30 m.The land use data originated from the 30-m global surface land cover data set (Jun et al. 2014) (GlobeLand 30, http://www.globeland30.org/).The soil type data were obtained from the Natural Environment and Data Center of the Chinese Academy of Sciences (https://www.resdc.cn/).The main cultivated soil types in the study area include castanozems, chernozems, dark felty soils, and sierozems.The classification of soil types is based on the "Chinese Soil Classification System," which provides the names and codes of soil classification in China.The climate data was sourced from the National Earth System Science Data Center, National Science and Technology Infrastructure of China (http://www.geodata.cn)with a spatial resolution of 1 km (Peng et al. 2019).The NDVI data was obtained from the National Ecosystem Science Data Center, National Science and Technology Infrastructure of China (http://www.nesdc.org.cn) with a spatial resolution of 30 m (Yang et al. 2019).

Data processing
Five factors were extracted from the DEM data of the Huangshui River Basin, including slope, aspect, curvature, relief degree of land surface (RDLS), and topographic wetness index (TWI).The land use type data was reclassified into six categories, namely cropland, forestland, grassland, water area, construction land, and unused land.The climate data obtained in this study had a spatial resolution of 1 km.To match the spatial resolution of other factors, the temperature and precipitation data were resampled to a resolution of 30 m. Subsequently, the corresponding temperature and precipitation values were extracted for each sampling point.To facilitate the classification of continuous variables, the soil pH values measured at each sampling point were interpolated into spatially continuous data using inverse distance weighting interpolation.The spatial distribution of the factors used in this study is shown in Figure 2.

Analytical method
In this study, the terrain factors, climate factors, soil attribute factors, vegetation coverage factors, and human activity factors were initially obtained for the Huangshui River Basin.Subsequently, the influence factors corresponding to the sampling points were extracted based on the latitude and longitude information of the measured soil samples (continuous variables without discretization).Then, five different methods, namely the equal interval method, natural breaks method, quantile method, geometrical interval method, and standard deviation method, were employed to discretize the continuous variables.Finally, the Geodetector and Pearson correlation analysis methods were used to analyze the influential factors of the spatial distribution differences of the SOM in the Huangshui River Basin, Qinghai Province, China.The technical flowchart is illustrated in Figure 3.

Geographic detector model
Geographical detector is a statistical method used to study the spatial heterogeneity of a given phenomenon in geographical space and reveal the underlying driving forces.It is a spatial differentiation approach.The geographical detector model assumes that if an independent variable (x) has an effect on a dependent variable (y), indicating a correlation between them, then they will show similarity in their spatial distribution.The Geographical detector consists of four detectors: risk detector, factor detector, ecological detector, and interaction detector.
(1) The factor detector is used to detect the spatial heterogeneity of SOM and the extent to which the factor explains that spatial heterogeneity (Wang JF and Xu 2017;Zhu et al. 2020).The explanatory power of the influence factor on the spatial differentiation of the SOM is measured by the Geodetector q-statistic (Wang JF et al. 2016), which takes values between 0 and 1.When the value of q-statistic is larger, it indicates that the independent variable explains the dependent variable to a stronger extent, and conversely, it indicates that the independent variable explains the dependent variable to a weaker extent, based on the q-statistic that explains the dominant factor of spatial variation in SOM content in the study area.The calculation formula is as follows: Wher: h ¼ 1, 2 … L is the number of categories of x or y; N h is the number of sample points in the h stratum of the classification, and N is the number of sample points in the whole study area; r 2 h is the variance of the dependent variable in the h stratum of the classification, and r 2 is the variance of the dependent variable in the whole study area; SSW is the sum of the variance of all strata of the independent variable, and SST represents the total variance of the whole study area.
(2) Interaction detection is to identify the degree of explanation of the dependent variable when different influencing factors act simultaneously, i.e., to analyze whether the interaction between factors enhances or weakens the explanatory power of the dependent variable, and the types of interactions can be divided into the following categories (Table 1).
(3) The ecological detector is a comparison of whether the effect of any two factors on the spatial distribution of SOM content is significantly different.The F-statistic is used to measure this, and the formula is shown below.
Where N x1 and N x2 and are the sample numbers of the independent variables x 1 and x 2 , respectively; L 1 and L 2 are the stratification numbers of the two independent variables, respectively.

Correlation analysis
Correlation analysis is a statistical analysis method used to measure the linear relationship between two variables, and it is widely utilized in various fields (Deng et al. 2023;Ejaz et al. 2023).In order to analyze the similarity between 13 influencing factors and the spatial distribution of SOM, this study employed the Pearson correlation coefficient method to analyze the influential factors of the spatial distribution differences of SOM.

Discretization methods
Equal interval method divides the range of values of the influencing factor into equal intervals, suitable for situations where the data distribution is relatively uniform.Natural breaks method identifies breakpoints where the factor changes in spatial distribution, and categorizes the influencing factor into different classes.Quantile method divides the factor into different categories based on percentiles.Geometrical interval method applies
logarithmic transformation to the factor and then categorizes it into different classes.
Standard deviation method divides the dataset based on the standard deviation of the factor, which effectively reflects the degree of variability in the factor.

The spatial distribution characteristics of SOM
The spatial distribution characteristics of SOM content in the study area were obtained through inverse distance weighting interpolation, as shown in Figure 4.The interpolated SOM content ranges from 4.90 to 59.78 g�kg À 1 , with higher values observed in the northern and western parts of the study area compared to the southern and eastern parts.The northwestern region of the study area exhibits higher elevations than the southeastern region, indicating a positive correlation between SOM content and elevation.This finding is consistent with the results of previous studies conducted by Zhao et al. (2023), suggesting that higher elevations correspond to lower temperatures and sparse vegetation, which hinder SOM decomposition and result in higher SOM content.

Discretization of continuous variables
In this paper, five methods were utilized to discretize continuous variables, and the optimal discretization method and number of categories were selected based on the explanatory power of the factors on SOM (q-statistic).The continuous variables were divided into different numbers of categories ranging from 3 to 8, as shown in Figure 5.It can be observed that there are significant differences in q-statistic among different discretization methods and numbers of categories.The q-statistic of precipitation shows fluctuates with an increasing number of classifications.The largest q statistic (0.13) is achieved when the precipitation factor is classified into 8 categories using the geometric interval method.
Similarly, for the temperature factor, the q-statistic increases as the number of categories increases when the geometric interval method is applied.Other discretization methods and numbers of categories result in fluctuations in the q-statistic.The maximum q-   statistic (0.31) for the temperature factor is attained when it is divided into 8 categories using the quantile method.The pH factor exhibits a similar pattern to temperature, with the maximum q-statistic obtained when using the quantile method.The choice of discretization method has a significant impact on the q-statistic for the Slope factor, with both the geometric interval and equal interval methods yielding higher q-values compared to other discretization methods.
When the number of categories for the elevation factor is set to 7, the q-statistic reaches its maximum value.The profile curvature factor exhibits an increasing trend in the q-statistic as the number of categories increases, with the SD discretization method yielding the highest q-statistic (0.14).The q-statistic variation for the plan curvature is similar to that of the profile curvature, but the natural breakpoint method produces the highest q-statistic (0.16) when discretizing the plan curvature factor.The influence of TWI on SOM is minimal, and the q-statistic shows little variation among different discretization methods.For the RDLS factor, there is little change in q-statistic when using the equal interval and quantile methods, while the remaining discretization methods result in significant q-statistic fluctuations.The q-statistic for the NDVI factor remains relatively stable, and different discretization methods have minimal impact on its q-statistic.In conclusion, different discretization methods have a significant impact on the q-statistic, and different continuous variables require the use of appropriate discretization methods to achieve the maximum q-statistic.

Factor analysis of the spatial distribution of SOM content
The factor detector, interaction detector and ecological detector in the Geodetector were used to quantitatively analyze the 13 influencing factors leading to the spatial variation of SOM content in the study area, and to explore the dominant factors of the spatial variation of SOM content.

Analysis of single factor results
Factor detector was used to analyze the influence of 13 factors including elevation, aspect, slope, NDVI, soil pH, soil type, temperature, precipitation, land use type, TWI, RDLS, plan curvature, and profile curvature on SOM content.The results of the factor detection are shown in Figure 6.The influence of 13 factors on SOM content in the study area ranged from 0.042 to 0.309, indicating variations in the impact of different factors on the spatial distribution of SOM content.Temperature, elevation and soil pH had the highest explanatory power on SOM with q-statistics of 0.309, 0.251 and 0.169, respectively.The significant impact of temperature on the spatial distribution of SOM is attributed to the increase in enzyme activity and microbial activity in soil as temperature rises, which can affect the decomposition rate of soil organic matter and thus influence SOM content in the study area (Alkorta et al. 2017;Hu et al. 2023).Soil pH and Profile curvature variables explained SOM spatial distribution respectively with q-statistics of 0.169 and 0.141.The NDVI factor, representing vegetation growth, had a q-statistic of 0.132.Land use type, TWI, and aspect had the lowest influence on SOM, with q-statistic of 0.004, 0.044, and 0.059, respectively.In summary, elevation, temperature and soil pH are the dominant factors affecting the spatial distribution of SOM in the Huangshui River Basin, and variations in elevation and temperature lead to differences in SOM spatial distribution.On the other hand, land use type and TWI have the lowest effect on SOM.

Factor interaction results
Factors contributing to spatial variation in regional variables often involve not just one, but multiple factors working in conjunction.Therefore, an interaction detector was utilized to analyze the combined effects of two factors on the spatial distribution of SOM content, and the results are presented in Table 3.It can be observed from the Table 3 that the explanatory power of the spatial distribution of SOM is enhanced when two factors interact simultaneously.When individual factors operate alone, their explanatory capacity for SOM content spatial variation is not notably high.However, when elevation and plan curvature interact, the q-statistic reaches 0.591, which is significantly higher than the impact of elevation (q ¼ 0.251) or plan curvature (q ¼ 0.119) when acting individually.The interaction between elevation and pH yields an explanatory power of 0.547 for SOM, whereas the interaction between TWI and temperature results in an explanatory power of 0.542 for SOM.After the interaction between NDVI and elevation, the explanatory power is significantly improved, with a q-statistic of 0.444.This can be attributed to the fact that NDVI reflects vegetation coverage, and vegetation growth relies on suitable temperatures and sufficient precipitation, both of which are determined by variations in elevation within the study area.Hence, the interaction between the NDVI factor and other factors enhances the explanatory capacity for SOM.The explanatory power of aspect is also enhanced after interaction, such as with aspect and temperature (q ¼ 0.477).This is mainly because different aspects lead to varying lighting conditions, which, combined with temperature fluctuations, subsequently affect SOM content.The aforementioned analyses indicate that the spatial distribution of SOM is influenced by multiple interacting factors.While the individual influence of a single factor on SOM spatial distribution is limited, their combined effects exhibit a bilinear-enhancing interaction type.

Analysis of variance between factors
The ecological detection results depicted in Table 3 show that each pair of influencing factors have significant effects on SOM spatial distribution.Among them, the temperature factor shows significant differences with other factors except for elevation.This further indicates that the dominant factors affecting the spatial distribution of SOM are temperature and elevation.

Comparison of the results of geographic detector and Pearson correlation analysis
The Pearson correlation coefficient method is a common method to analyze the correlation between two variables, and this paper compares the results of the Geodetector model factor detection with the Pearson correlation coefficient.Since Pearson correlation analysis requires continuous variables, three discrete influencing factors of land use type, soil type and slope direction are removed, and the results of the remaining influencing factors are shown in Figure 7.The Pearson correlation analysis method can reflect the positive and negative correlation between the influence factors and SOM content, while Geodetector can only reflect the magnitude of correlation.For example, there is a significant negative correlation between the temperature factor and SOM content.This indicates that higher the temperatures result in faster decomposition of SOM content, which, in turn, causes low SOM content levels.The Geodetector q-statistic takes the value of 0-1, which can only reflect the relative influence magnitude.

Influence of environmental factors on the spatial distribution of SOM content
Terrain factors, as the primary environmental influencing factors, play a dominant role in the spatial distribution of SOM content.The study conducted by Wicaksono et al. (2019) demonstrated a significant effect of elevation on the spatial variation of SOM (p < 0.05).Among the three topographic factors, namely elevation, aspect and slope, elevation had the highest explanatory power for the spatial distribution of SOM, followed by slope and aspect.The influence of elevation on the spatial distribution of SOM was evident due to variations in altitude, resulting in changes in temperature and precipitation, which in turn led to differences in the spatial distribution of SOM (Dong et al. 2018).From the analysis of interaction factors, it can be observed that the explanatory power of elevation and other terrain factors increases when they interact, especially the interaction between elevation and plan curvature, which has a higher explanatory power for SOM content compared to the individual effect of curvature.The explanatory power of elevation and TWI interaction for SOM content reached 0.463, both higher than the q-values obtained when the environmental factors acted individually.In the Pearson correlation analysis between SOM and terrain factors (Figure 7), elevation and NDVI showed a significant positive correlation with the spatial distribution of SOM, similar to the findings of Wu et al. (2021).The variation in terrain resulted in differences in vegetation growth, coupled with aspect and slope, which influence solar radiation and waterthermal conditions, affecting soil microbial activity and soil enzyme activity, thereby leading to differences in SOM.Correlation analysis between SOM and climate factors (Figure 7) revealed a significant negative correlation between SOM and temperature factors; as the average annual temperature increased, soil respiration and organic matter decomposition rate accelerated, consistent with the studies conducted by Li et al. (2020) and Wen et al. (2023).

Influence of land use type on SOM content
Different land use and agricultural management methods have important effects on soil properties (Tamburi et al. 2020), and understanding the influence of land use change on the spatial distribution of SOM is crucial.Land use types fall under categorical variables, and their impact on the spatial distribution of SOM can be analyzed using geographical detector.When land use types interact with temperature, they show a high explanatory power (q ¼ 0.332) for the spatial distribution of SOM content.Land use types are the major anthropogenic factors influencing SOM content, particularly with a significant impact on SOM content in cultivated lands.In contrast, forested areas and grasslands at higher elevations experience less human disturbance, have richer vegetation, and relatively higher SOM content compared to other regions.Land use can also influence soil erosion rates, consequently affecting the deposition and loss of soil in different areas, leading to variations in SOM content (Komolafe et al. 2021).With the continuous advancement of modernized agriculture, increasing fertilizer application and irrigation have a significant impact on the spatial distribution of SOM content.It is evident that the influencing factors of SOM content spatial distribution need to be considered from multiple perspectives to quantitatively analyze the explanatory power of various factors on the spatial variability of SOM.

Conclusion
This study analyzed the factors that influence the spatial distribution of SOM content in the arable layer based on the optimal parameters-based geographical detector model.The results obtained show that the SOM content in the cultivated layer of the Huangshui River Basin in China ranged from 4.31 to 59.79 g�kg À 1 , with an average content of 28.38 g�kg À 1 , indicating medium levels.The coefficient of variation was at 39.36%, which was at a moderate degree of variability.The continuous variables were discretized by different methods with different q-statistic.For comparison, the discretization method and the number of classifications where the q-statistic is the largest are selected to better analyze the main controlling factors of the spatial distribution of the SOM content.Among the 13 influencing factors, the temperature has the highest explanatory power of the SOM with an explanatory degree of 0.309.The elevation factor is followed by the SOM with an explanatory power of 0.251.
Influence factor two by two interaction, the role of the type of performance for the bienhance or nonlinear-enhance, two factors interact with each other when the explanation of the spatial distribution of SOM is greater than that of a single factor role, the interaction will significantly increase the degree of interpretation of the Huangshui River Basin SOM.The results indicate that the spatial distribution characteristics of SOM content are jointly determined by multiple factors.Based on the geodetector model, this paper analyzes the influencing factors of the spatial distribution of SOM in farmland in Huangshui River Basin, China, and provides a theoretical basis for farmland soil improvement and quality improvement in this region.

Figure 1 .
Figure 1.The geographical location of the study area and distribution of sample points.

Figure 4 .
Figure 4. Spatial distribution of SOM content.

Figure 7 .
Figure 7.Comparison of Pearson and geographic probe analysis of organic matter and impact factor correlation results.

Table 1 .
Type of interaction results.
(Zhang et al. 2022 descriptive statistical analysis results of the SOM content in the study area.According to the statistical results, the SOM content in the Huangshui River Basin ranges from 4.31 to 59.79 g�kg À 1 , with an average content of 28.38 g�kg À 1 .The SOM content of the study area is at a moderate level according to the nutrient classification standards of the Second National Soil Survey.The coefficient of variation (CV) can reflect the spatial variability of the SOM content.When CV < 25%, the SOM shows weak variability; when 25% < CV < 75%, it indicates moderate variability; and when CV > 75%, it indicates strong variability.The coefficient of variation for SOM in the study area is 39.36%, indicating a moderate level of variability(Zhang et al. 2022).

Table 2 .
Descriptive statistical analysis of SOM in the study area.
Note: p < 0.05, indicates that the data presents significance.

Table 3 .
Interaction detection and ecological detection results.