Artificial Neural Networks vs Long Short-Term Memory Prediction of Solid Flow in Tafna Basin (North-West Algeria)

The main objective of this work is to select the most reliable machine learning model to predict the generated solid flow in the Tafna basin (North-West of Algeria). It is about the artificial neural networks (ANN) and long short-term memory (LSTM). The sediment load is recorded through three hydrometric stations. The efficiency and perfor - mance of the two models is verified using the correlation coefficient (R²), the Nash-Sutcliffe coefficient (NSC) and the root mean square error (RMSE). The obtained simulated solids load shows a very good correlation in terms of precision although the ANN model gave relatively better results compared to the LSTM model where low RMSE values were recorded, which confirms that the artificial intelligence models remain also effective for the treatment and the prediction of hydrological phenomena such as the estimation of the solid load in a such watershed.


INTRODUCTION
In arid and semi-arid regions such as North Africa, siltation of dams is the direct and most serious consequence of water erosion of watersheds.The importance of quantification and precise characterization of hydro-sedimentary transport in our rivers is very important for the environment and natural resources management (Tabatabaei et al., 2019).The preservation of land and water resources is one of the most important challenges of watershed management especially in arid and semi-arid areas.This phenomenon has many negative effects, including the significant reduction in the storage capacity of irrigation canals and the obvious deterioration of the water quality of dams (Gwapedza et al., 2021).It thus affects the structure of dams and the supply of domestic, agricultural and industrial water (Sirabahenda et al., 2020).However, several meteorological and hydrographic variables in the Mediterranean basins have an impact on the sedimentation process, which makes solid load prediction a very complex operation (Zeyneb et al., 2022).The problem of sediment deposition at the watershed scale has led several researchers to project various empirical methodologies aimed at quantifying solid transport (Adib and Mahmoodi, 2017).Due to its Mediterranean climate and the irregularity of its fluvial regime, northwest Algeria is one of the most vulnerable zones to soil erosion (Nadia and Boulemtafes, 2018).The consequences of water erosion in Algeria are disastrous, offering a naked and crisscrossed landscape by an intense ravine, particularly in mountainous regions with a dense hydrographic network.The operational dams are therefore endangered, especially in the west of Algeria, where 47% of total land is affected (Semari and Korichi, 2023) However, the problem is very difficult, complex and far from being solved with empirical formulas due to the considerable difference in

ECOLOGICAL ENGINEERING & ENVIRONMENTAL TECHNOLOGY
the estimation of each formula.To overcome these obstacles, other new estimation methods must be explored.Using various water erosion simulation models, researchers can predict sediment levels and also vulnerable locations (de Vente and Poesen, 2005).Given their high potential, high accuracy and ease of learning, researchers are increasingly moving towards the use of regression and machine learning techniques to predict solid flow (Q S ) in due to advances in computing and data science (Valentine and Kalnins, 2016).
Recently, machine learning techniques such as artificial neural network (ANN) have been widely applied in the field of hydrology for modeling rainfall-runoff relationships as well as water erosion and bed load process.However, deep learning methods such as long short-term memory (LSTM) networks are little studied in time series predictions of hydrological sequences which could capture the nonlinearity and non-stationarity related to hydrological applications (Hu et al., 2018).In this context, several studies aimed at predicting solid flows, particularly in Mediterranean basins, have been carried out, such as the Hounet wadi in western Algeria (Beddal et al., 2020) using both multilinear regression (MLR) and back-propagation neural network (BPNN) models.Their results have showed that the BPNN approach is more effective in modeling this nonlinear and complex process.To calculate the average sediment load in the Himalayan basins in India (Pham et al., 2018) have used two different algorithms; the feed-forward neural network (FFNN) and the radial basic functions (RBF).When estimating daily sediment load, their investigation confirms that FFNN model has outperformed the RBF model.
Another study conducted by (Shadkani et al., 2021) focused on the Mississippi River has showed that multi-layer perceptron-stochastic gradient model (MLP-SGD) exhibits superior predictive capabilities compared to both gradient-boosted tree (GBT) and multi-layer perceptron (MLP) approaches in predicting suspended sediment concentration.Likewise, another study of (Latif et al., 2023) using LSTM, support vector machine (SVM) and MLP models to predict sediment transport in the Johor River in Malaysia, in which their investigation confirms that LSTM and SVM approaches outperformed the MLP method.On the other hand (Kaveh et al., 2021) have used daily flow and the time series of suspended sediment (SSC) of the Schuylkill River in Manayunk, Philadelphia, USA to estimate the solid load.The LSTM technique has proven to be superior compared to the FFNN and the adaptive neuro-fuzzy inference system (ANFIS) methods.In another study conducted by (Fang and Shao, 2022), the LSTM method has also been used to model and predict the rainfall-runoff relationship.It has showed that the LSTM produce reliable results and accurately predict the peak value.On a local scale (Zeyneb et al., 2022) carried out a study on five basins in eastern Algeria in which the ANN method has outperformed ANFIS in predicting suspended sediment concentrations.
Based on this literature review and our recent knowledge, few studies focus on the prediction of solid flow (Q S ) especially in western Algeria.Thus, this study examines the reliability of machine learning algorithms in predicting solid flow generated by the Tafna basin, located in northwest Algeria.The Tafna is characterized by a long period of drought, over the previous decade namely between , in which annual rainfall recorded a drop of around 40% on average (Meddi et al., 2010), adding the predominance of silt and clayey-sandy textures.The consequences of bed-load are manifested by the siltation of many dams located downstream of the Tafna basin.The prediction and the quantification of the solid load in the Tafna wadi become essential to plan protection works and hence to reduce the rate of siltation.The objective of the study is to highlight the extent of water erosion of soils as well as the complex processes which affect the movement of suspended sediments in this basin, based mainly on data monitored through three hydrometric stations namely; Beni Bahdel (160402), Chouly Pont RN7 (160601) and Pierre de Chat (160801).

Study area
Tafna basin is located in the north-western part of Algeria in the Tlemcen region between latitude 35°.5 to 36° North and longitude 0°.5 to 2° East (Fig. 1).The Tafna basin shares its western border with Morocco.To the south, the basin is limited by the Tellian Atlas Mountains.To the north by the Mediterranean Sea.To the east by Macta and coastal Oranian center basins.The Tafna Wadi is 190 km long and drains an area of 7245 km 2 before emptying into the Mediterranean Sea in which the altitude varies from zero to 1900 m.
The Tafna basin is also distinguished by a great spatiotemporal variability in rainfall.The precipitation can be three to four times higher in the wettest years compared to the driest years (Meddi et al., 2010).From a geological and geomorphological point of view, the study area shows a basin filled with Quaternary and Miocene alluvium.The dynamics of the Tafna wadi dominate this marly basin, which is typically not very resistant.

Data processing
The data used in this study were collected from the National Water Resources Agency (ANRH) through three stations (Fig. 1): Beni Bahdel ST: 160402, Chouly Pont RN7 ST: 160601 and Pierre de Chat ST: 160801.The data in question are the daily liquid flow rates Ql (m 3 /s) used as input data for the learning process and the solid flow rates Qs (kg/s) as calibration data during the period which extends from 1990 to 2010.The normalization of the series data was carried out using the Equation 1: where: X norm -normalized value, X i -observed value, X min and X max -the minimum and maximum value in the series respectively.
Table 1 shows the max and min values as well as the standard deviation of the liquid and solid flow rates recorded in the three hydrometric stations in the Tafna basin.All the data constitutes a global model used to check the cross validation in the basin.On a temporal scale, a significant variability in the values of liquid flow rates has been observed throughout the study period from 1990 to 2010, hence important fluctuations in daily flow rates in the basin are pronounced (Fig. 2   average is approximately 6.51 (m 3 /s).Likewise, the daily solid flows recorded in the Tafna basin have experienced significant fluctuations during the study period (Fig. 2).Maximum values have been observed in 1990,1991,1996,2000,2004 and 2009 with an annual average of 29.60 (kg/s), suggesting a significant erosive load in the Tafna basin.Faced with this very pronounced fluctuation between the min and max flow rates, which is reflected by the high standard deviation, it is essential to carry out a prediction analysis using both artificial intelligence and deep learning techniques.

Applied model
The prediction of solid loads in the Tafna basin is made by the application of artificial neural networks approach.A class of machine learning methods that are frequently used in data classification and regression.We are particularly interested in the multilayer perceptron architecture.The conventional machine learning techniques only have the ability to process natural data in their raw form without other insight information.However, Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction.It could discover intricate structure in the data sets and change its internal parameters by using the backpropagation algorithms.However, to our knowledge, there are not so many studies using deep learning in hydrology, especially for large time-series datasets.The LSTM model associated with deep learning (DL) methods use for the analysis of complex and voluminous data, is also applied to predict the solid load generated in the Tafna basin.Figure 3 shows the basic architecture and algorithm of each approach and their specific used parameters.

Artificial neural networks
Artificial neural network is one of the most wellknown and effective methods used to classify or regress data in various scientific fields.Neural networks find many applications in hydrology and time series prediction (Kumar et al., 2012).The MLP model containing ten neurons is used in this study (Fig. 4a).The backward propagation is selected as the activation function, and the two layers use the tansig and purelin transfer functions, respectively (Rahman et al., 2022).In addition, the observed data of each station is divided into two parts; 70% for the training operation, and the remaining 30% is reserved to the testing phase.The model uses the following regression to calculate the output value of each layer: where: Y -output value, X -input value, Wweight matrix, B -bias.

Long short-term memory
The LSTM model is a Deep Learning type learning model that belong under the recurrent neural network (RNN) family.Due to its sophisticated automatically controlled loops, LSTM is the best model suited for data processing involving time series because it can track information over a long period of time and create routes that allow gradients to drive continuously during learning phase (Bengio et al., 1994), (Gers et al., 2000).The LSTM model uses the Equations 3-8 to calculate its parameters and make the required predictions: () = ().( − 1) + ().() Î() = ℎ(  [( − 1), ()] +   )
The LSTM model regulate automatically information updates in the cell state (Fig. 4b).The forget gate F(t) manages the input data X(t) and the previous hidden state H(t-1) connection to the cell state C(t), which allows to determine whether to forget X(t) and H(t-1).The input gates I(t) and Î(t) determine whether to pass the X(t) and the previous hidden state H(t-1), processed by the activation function σ, to the cell state C(t).The output gate O(t) controls whether to send the processed X(t) and the previous hidden state H(t-1) to the next hidden state H(t) (Fan et al., 2023).
Through which we select the most appropriate model to predict the solid loads in the Tafna basin.The calculated solid flows Qs provide useful information on the effectiveness and performance of these models through the training (TR) and testing (TS) phases.

ST: Beni Bahdel (160402)
In this station the learning results show a slight superior performance of the ANN model compared to the LSTM model which gave impressive metrics (Table 2); R 2 : 0.66-0.98,NSC: 0.77-0.80,and in particular low RMSE which vary between: 0.0096-0.02.Comparing with the other stations, both ANN and LSTM models indicate a better correlation with the station (160801) (Fig. 5).

ST: Chouly Pont RN7 (160601)
In the Chouly station (160601), the simulation results (Table 3) indicate a superior performance of the LSTM model compared to the ANN model according to correlation indicators such as; R 2 : 0.5-0.86,NSC: 0.11-0.17,and in particular low errors such as RMSE which varied between 0.033-0.018.In terms of cross validation process, the ANN model indicates a very good correlation with station 160801 from which we record the lowest error of 0.0174 (Fig. 6).

ST: Pierre de Chat (160801)
According to the simulation results applied to the Pierre de Chat station (160801), the ANN model indicates superior performance compared to the LSTM model (Table 4).The correlation parameters confirm it; R 2 : 0.70-0.98,NSC: 0.83-0.84,RMSE: 0.0168-0.027.The best value is recorded in the CV model with the station (160601), with high value of R² = 0.98, and NSC = 0.85, the errors are also very low, estimated at 0.0096 (Fig. 7).

Global model
The global analysis which groups together all hydrometric stations shows superior performance    5).The best value is recorded in the CV model with the both stations 160402 and 160601, from which low errors are recorded respectively (RMSE: 0.008-0.005)(Fig. 8).
The ANN model appears to be the best efficient choice, with a slight advantage over the LSTM model.This does not affect the credibility, reliability and effectiveness of LSTM model.It is obvious that the both models considered in this study are viable and efficient, proving their applicability to predict the rate of solid loads in particular and hydrological phenomena in general in the Tafna basin.It should be noted that the intensity of water erosion depends on several climatic conditions such as the rains erosivity, the slope which acts directly on the kinetic energy of the runoff, the cover land which absorbs the energy kinetics of the raindrops and increases the soil resistance against erosion, and finally soil erodibility which is closely related to the texture and soil structure.

CONCLUSION
In this study one has applied two artificial intelligence models, namely ANN, LSTM to simulate the solid flow generated in the Tafna basin situated in northwest Algeria.Like all semi-arid regions, this basin suffers from many problems linked to water erosion such as rapid siltation of dams.The used data represent the recordings solid flows in three hydrometric stations of Beni Bahdel (160402), Chouly Pont RN7 (160601) and Pierre de Chat (160801).To perceive the correlation between these stations, we added a global model which groups all the solid flow data from the stations in the learning process.Before training, the database is processed and normalized to ensure the validity and the accuracy of the results.
The comparison of the simulated solid flow rates and those recorded has been controlled and verified by three performance parameters such as; the correlation coefficient R 2 , the Nash parameter NSC and the root mean square error RMSE.The simulation is carried out separately by isolating each hydrometric station and globally for the entire Tafna basin.The performances of the two models are of a comparable level and indicate that the ANN model is slightly better compared to the LSTM model.Overall, the simulated results underline the excellence of both models in terms of validation criteria and confirm its relevance for predicting solid flow rates Qs using observed liquid flow Ql.This can help in planning effective solutions to reduce erosion and ensure the sustainability of hydrotechnical structures.
This study emphasizes the usefulness of the machine and deep learning models used to control the relationship between the factors amplifying water erosion, particularly in arid and semi-arid Mediterranean basins.Despite the promising results of these models, it is important to recognize the problems in predicting values for various systems, especially when dealing with instantaneous data.The modeling of solid load remains an investigation subject given the complexity of natural phenomena and the non-linearity of the relationships between the intervening variables namely the rains erosivity, the slope, the cover land, and soil erodibility.However, the use of artificial intelligence models is still part of the decisionmaking tools.

Table 2 .
Performance indicators of the ANN and LSTM models of ST: 160402

Table 3 .
Performance indicators of the ANN and LSTM models of ST: 160601

Table 4 .
Performance indicators of the ANN and LSTM models of ST: 160801