Assessment of Two Methods for Predicting Soil Retention Relationship from Basic Soil Properties

The purpose of this study was to develop the best transfer functions for estimating the soil water retention curve (SWRC) for Iraqi soils using multiple regression methods. Soil samples were collected from 30 different sites in Iraq at two depths (0–0.3 m and 0.3–0.6 m) to create a database for the development of predictive transfer functions. The database included information on soil particle size distribution, carbonate minerals, mass density, particle density, organic matter, saturated hydraulic conductivity, capillary height, and available water limits. Ex-planatory variables (EV) were the measured characteristics, while response variables (RV) were the volumetric water content measured at different potentials (0, 5, 10, 33, 500, 1000, 1500 kPa). Two methods were used to de - velop predictive transfer functions: the logit model and beta model. Prediction accuracy was assessed using mean bias error (MBE), mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2). The results showed that the variables included in the derivation of the two models for predicting θ ( Ψ ) were similar, except at θ (0). The variables w1 (w1 = 2P


INTRODUCTION
Soil hydraulic properties play a crucial role in regulating the transportation of water and solutes within the soil. Various applications such as irrigation, land use planning, drainage, and drought risk assessment require a thorough understanding of these properties (Dobarco et al., 2019). Among them, the soil water retention curve (SWRC) is a characteristic of particular interest because it describes the relationship between soil water potential (Ψ) and soil water content (θ) (Kudayr and Salim, 2019;Sisson et al., 1988). Hence, determining the SWRC is essential for these applications. the SWRC has an important role in describing other soil hydraulic properties such as saturated and unsaturated conductivity (Salim and Salih, 2008;Seki, 2023), diffusivity (Salim and Atee, 2007) and soil permeability (Zeng et al., 2019). The SWRC is primarily influenced by the soil's texture, structure, bulk density, organic matter, soil pore size geometry and distribution, carbonate minerals (Mahdi, 2008;Mahdi and Naji, 2016).
Usually, to measure the relationship between volumetric water content and soil pressure in the laboratory, devices are designed on the basis of the pore sizes distribution in the soil matrix, such as pressure plates and pressure membranes. This relationship can also be measured in the field directly using a tensiometer. However, these methods require a long time to obtain data, effort, and high cost (Azmi et al., 2019; Roux, 2019). In addition, there is an inherited spatial variation between and within spaces in soil properties. Therefore, many scientists' efforts have been made to predict SWRC indirectly (Al-Hasani et al., 2021; Ati et al., 2014;Seki, 2023;Vereecken et al., 2016). On the other hand, the exiting ofdata base for physical and chemical properties of soil and the ease of accessability prompted researchers to take advantage of this data and use it to develop functions that link the relationship between these characteristics and the hydro-related properties (Castellini and Iovino, 2019).
Pedotransfer functions (PTFs) translate the easy-to-measure data we have (for example, texture class, particle size distribution, bulk density, and organic matter) into the hard-to-measure data we need (soil hydraulic data such as SWRC). The use of PTFs requires some care, as soil-derived PTFs in a particular region are not suitable in other regions due to the large difference in soil properties. These differences may affect the accuracy of the results or the expected water content of the soil. Therefore, choosing the appropriate PTF for a particular region and for certain types of soil is essential. for the accuracy of the estimates (Medeiros et al., 2014; Pachepsky and Hill, 2017).
In order to obtain a good PTF for the prediction of θ(Ψ), the model must obey the range constraints of 0 to 1 (McNeill et al., 2018). In order to link to constraint, logit transform is usually used for a variable response or application of generalized linear models (such as beta regression).The logit transformation is commonly used for continuous response variables that are bounded between 0 and 1. The logit model is a useful tool for transforming soil water content to ensure linearity in the response, allowing for the application of standard linear regression analysis. However, this method has some limitations. Firstly, the interpretation of parameters must be made in relation to the mean of the logit-transformed response rather than the response itself. Secondly, soil water content tends to be skewed and may display heteroscedasticity, with greater variation around the mean and less variation towards a response of zero (Paraiba et al., 2013). To address this potential limitation, one approach is to assume a beta distribution for the response variable (McNeill et al., 2018).
This study aims to develope and evaluate the best model for predicting SWRC for Iraqi soils using the two methods of logit transformation and beta regression through the use of easy-tomeasure soil characteristics in order to reduce the time, effort and cost spent using the traditional method.

Soil samples and data preparation
In order to conduct this study, a total of 60 soil samples were procured from 30 distinct locations across Iraq, as delineated in Figure 1. These samples were obtained from two depths: the surface layer (0-0.3 m) and the subsurface layer (0.3-0.6 m).
After acquiring the soil samples, a series of physical and chemical laboratory analyses were conducted to estimate various soil properties, including soil particle size distribution (PSD), Bulk Density (ρb), particle density (ρs), porosity (f), available water (AW), soil content of carbonates (carbonate), organic matter (om), and saturated hydraulic conductivity (KS). The methods employed for these analyses were as described in [21]. Additionally, the height of the capillary (h) was determined using the method outlined by Miller and Bresler (1977), with the height value after 7 days being considered as an explanatory variable (EV). Descriptive statistics for the measured values of soil physical and hydraulic properties used to derivation of PTF as given Table 1. Furthermore, sixty relationships of θ(Ψ) were determined by measuring the volumetric water content at various potentials (0, 5, 10, 33, 50, 100, 500, 1000, and 1500 kPa) through the use of pressure plate apparatus. This relationship was regarded as the (RV).
To demonstrate the significance of soil carbonates in predicting Soil-Water Retention Curve (SWRC) within the range of [0-1], the ratio of soil Particle Size Distribution (PSD) was converted from the ternary system (Triangle Ternary Structure consisting of P sand , P silt , and P clay ratios) to the binary system (Cartesian system) (McNeill et al., 2018). This conversion was achieved by incorporating the soil carbonate content, as formulated in Equation 1:  P sand° + P silt° + P clay° + P carbonate = 1 (1) where: P sand° = (1 − P carbonate )P sand ; P silt° = (1 − P carbonate )P silt ; P clay° = (1 − P carbonate )P clay .
It is widely acknowledged that the values of the ratios of the three soil particles exhibit a strong and significant correlation, irrespective of the type of correlation (Yan et al., 2022). This is because any alteration in one ratio value inevitably affects the other two values. Regression procedures are commonly employed in Cartesian space as it is more convenient to work with. Moreover, converting to Cartesian space can reduce the apparent correlation between proportions of soil particles by eliminating the structural correlation between classes of soil particles (McNeill et al., 2018). To convert from the triple system to the Cartesian system, the theory proposed in (Cornell, 1981) is followed, which involves creating two variables via Equations 2 and 3:

Derive models
Two distinct approaches were employed to model θ(Ψ):

Logit -model
This model was constructed using the statistical software SAS version 9.4 (Statistical Analysis Systems) (SAS, 2023), with the response variable transformed using the logit transformation equation described in Equation (4): where: θ(Ψ i ) -represents the volumetric water content at a specified potential value. The purpose of this conversion is to achieve a normal and symmetrical distribution of the response variable values at different potential values. One of the conditions for constructing this model is that the values of the response variable must be limited to the range of [0-1]. The logit model should be configured in the following format: logit θ(Ψ i ) = a 0 + a 1 x 1 + a 2 x 2 + ⋯ + a n x n (5) where: a 0 -the intercept; a 1 ,..., a n -the regression coefficients; x 1 to x n -refer to the explanatory variables that represent the soil properties used in this study. The model optimization was obtained by applying a backward elimination method to select the EV at P < 0.1 level of significance.

Beta -model
The logit model is a commonly used method in the field of soil science and water resources to transform soil water content to a linear response for regression analysis. However, this approach has limitations. One challenge is the skewed distribution of soil water content, which may result in heteroscedasticity. Additionally, the interpretation of the model parameters based on the mean of the transformed response can be challenging. To address these limitations, a beta regression model can be used assuming the response follows a beta distribution (Ferrari and Cribari-Neto, 2004). The development of the beta regression model is similar to that of the logit model, but it allows for a more accurate interpretation of the response parameters. The beta model was derived by R studio version 4.2.0 statistical software (R Core Team, 2020). The Beta model should be configured in the following format: g(μ) = a 0 + a 1 x 1 + a 2 x 2 + ⋯ + a n x n (6) where: g(μ) -a correlation function that relates the mean (μ) of a response variable to a set of linear predictors (in this study us- as correlation function); a 0 -the intercept; a 1 ,..., a n -the regression coefficients; x 1 to x n -refer to the explanatory variables that represent the soil properties used in this study. The model optimization was obtained by applying the Backward elimination method to select the EV at P < 0.1 level of significance.
The predictive capabilities of the logit and Beta models were evaluated using the statistical measures quantifying goodness-of-fit including Mean Bias Error (MBE), Mean Absolute Error (MAE), root mean squared error (RMSE) and coefficient of determination (R 2 ): where: y i denotes the measured value; ŷ i refers to the predicted value; y̅ represents the average of the measured value y; n is the total number of observations. Table 2 presents the regression coefficients and coefficient of determination for Pedotransfer functions predicting θ(Ψ) using both logit and beta models. The table shows that the two models have similarities in the number and type of explanatory variables used to derive the predicted models for SWRC points, with the exception of θ(0) when the backward method is employed for variable selection. Specifically, the logit model for predicting θ(0) includes saturated hydraulic conductivity and soil porosity as explanatory variables, while the beta model adds w2, particle density, and bulk density. These common soil characteristics are important in determining water content at different potential points. The results indicate that W1, which represents soil particle size distribution(sand, silt, and clay) and soil content of carbonate minerals, soil porosity, available water, and capillary height are the most important explanatory variables in predicting volumetric water content for the SWRC points. W1 is included in the formation of both Logit and Beta models for the potential range of 10 to 1500 kPa. Soil porosity is introduced as an explanatory variable for the derivation of Logit and Beta models predicting volumetric water content for the potential range of 0 to 33 kPa and at 1500 kPa. Available water is included in the derivation of Logit and Beta models predicting volumetric water content for the potential range of 10 to 100 kPa and at 1500 kPa. Capillary height is included in the derivation of Logit and Beta models predicting volumetric water content for the potential range of 33 to 1500 kPa.These soil properties are crucial in determining the relationship between θ(Ψ) as PSD with carbonate minerals affects geometry and pore space distribution. In soils where small-sized particles (clay and silt) dominate, such as soft-textured soils, there is a high proportion of small-sized porosity, which increases the soil's ability to retain water compared to coarsetextured soils, where large-sized particles (e.g., sand) dominate, reducing water holding capacity due to lower porosity. Furthermore, specific surface area increases with decreasing soil particle size, and since the specific surface area of soil particles plays a significant role in water holding capacity, an increase in specific surface area increases the soil's ability to hold water. Capillary height and available water are also affected by PSD andpore size distribution, where smaller pore size leads to a higher capillary height and available water, contributing to the formation of most Logit and Beta models. Table 3 presents an assessment of the performance of both the logit and beta models in predicting θ(Ψ) across various measures, including MBE, MAE, RMSE, and R 2 , at different levels of water potential. The evaluation criteria for the models were based on the ones that produce the lowest MAE and RMSE values, the highest R 2 value, and the least biased MBE value (closest to zero).Overall, the results indicate that the logit and beta models performed equally well when using the MAE, RMSE, and R 2 criteria for the range of potential values tested. However, the models showed discrepancies in their efficiency evaluations when using the MBE measure. Specifically, the beta model outperformed the logit model in MBE evaluations for the range of potentials between 5 and 1500 kPa. The MBE values for the beta model ranged from -0.00013 to 0.00004, while those for the logit model ranged from 0.00011 to 0.00169. It is worth noting that at θ(0), the logit model produced a better MBE value of 0.00001, compared to the beta model's 0.00002. The convergence of MAE, RMSE, and R 2 values indicates that both the logit and beta model are performing similarly in terms of predictive accuracy. This means that both models are able to explain a similar amount of variance in the data and make similarly accurate predictions.However, the difference in MBE values suggests that there is a bias in the logit model and that is not present in the beta model. The MBE values of the beta-regression model being closer to zero or less biased compared to the logit-transformed model for explanatory variables means that the beta-regression model is better at estimating the true values of the response variable. The reason for this difference in bias could be due to the fact that the beta-regression model is specifically designed for modelling continuous proportions, whereas the logit-transformed model assumes a linear relationship between the explanatory variables and the transformed response variable. In other words, the beta-regression model is better suited for the type of data being analysed and is able to explain the underlying relationships more accurately, resulting in less bias.

RESULTS AND DISCUSSION
The graphical representation in Figure 2 demonstrates a linear relationship (1:1) between the measured and predicted water content values of the Pedotransfer Function (PTF) for both the logit and beta models. This observation indicates a strong agreement and accurate prediction between the measured and expected water content values at each specific tension for both models. Therefore, it confirms the close agreement between the measured water content and the predicted water content for each specific tension in both models.