Deep Learning Forecasts of Soil Moisture: Convolutional Neural Network and Gated Recurrent Unit Models Coupled with Satellite-Derived MODIS, Observations and Synoptic-Scale Climate Index Data

Remotely sensed soil moisture forecasting through satellite-based sensors to estimate the future state of the underlying soils plays a critical role in planning and managing water resources and sustainable agricultural practices. In this paper, Deep Learning (DL) hybrid models (i.e., CEEMDANCNN-GRU) are designed for daily time-step surface soil moisture (SSM) forecasts, employing the gated recurrent unit (GRU), complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), and convolutional neural network (CNN). To establish the objective model’s viability for SSM forecasting at multi-step daily horizons, the hybrid CEEMDAN-CNN-GRU model is tested at 1st, 5th, 7th, 14th, 21st, and 30th day ahead period by assimilating a comprehensive pool of 52 predictor dataset obtained from three distinct data sources. Data comprise satellite-derived Global Land Data Assimilation System (GLDAS) repository a global, high-temporal resolution, unique terrestrial modelling system, and ground-based variables from Scientific Information Landowners (SILO) and synoptic-scale climate indices. The results demonstrate the forecasting capability of the hybrid CEEMDAN-CNN-GRU model with respect to the counterpart comparative models. This is supported by a relatively lower value of the mean absolute percentage and root mean square error. In terms of the statistical score metrics and infographics employed to test the final model’s utility, the proposed CEEMDAN-CNN-GRU models are considerably superior compared to a standalone and other hybrid method tested on independent SSM data developed through feature selection approaches. Thus, the proposed approach can be successfully implemented in hydrology and agriculture management.


Introduction
The precise requirements for water resource supply, constant monitoring, and forecasting are changing continuously with population growth, agricultural and human activities. Any variations in weather and perturbations in climate patterns due to anthropogenicallyinduced factors affect usable water distribution and accessibility. Instead of precipitation playing a paramount role, the terrestrial water basin tends to dominate the actual functioning of the hydrological, ecological, and inter-coupled socio-economic systems [1]. Notably, the knowledge of fundamental components of water reservoirs, e.g., soil moisture (SM) and streamflow, is essential for an effective water resources management strategy. SM also governs the physical interactions between land and the atmosphere [2,3] and acts as a driver to feed irrigation systems [4], grazing and crop yield predictions [5]. A decline in groundwater reduces soil water content and the storage volume in underlying soils. A lack of soil moisture can affect agricultural and hydro-meteorological processes. Therefore, predictive models providing prior information on monitoring and forecasting water, such as in this study, are critical to soil moisture forecasts as a principal regulating factor in groundwater hydrology to understand the soil's future state.
With increasing computer power, researchers are developing intelligent models to extract features in historical data (e.g., SM). Such models demonstrate acceptable skills in forecasting hydro-metrological variables, e.g., precipitation [6][7][8][9], drought [10], streamflow [11,12], runoff [13,14], floods [15,16], soil moisture [17], water demand and water quality [18][19][20][21]. However, very few studies have focused on the prediction of soil moisture, with most examples being the artificial neural networks (ANN) [22] and the extreme learning machines (ELM) [23]. Irrespective of the model type and domain of applications, accurately forecasted soil moisture presents a greater understanding of water resources and agricultural management, leading to more sustainable decisions. Intelligent systems based on deep learning utilise feature extraction and reveal the compounded association between predictors and targets [24]. Hence, soil moisture prediction with advanced algorithms is a highly practical tool for agricultural water management. DL methods, however, are yet to be explored in the present study region (i.e., Australian Murray Darling Basin). In this study, we adopt a gated recurrent unit (GRU) neural networks as a modified long-short term memory (LSTM) that has attracted good research attention [25]. There appear to be only a few studies on GRU-based models, especially in hydrology [26,27]. Convolutional Neural Networks (CNNs) is a useful feature extraction method to improve the overall predictive process [28]. Therefore, an integration of CNN and GRU can, in foreseeable possibilities, lead to a robust pre-processing of data providing a viable option to improve the model's forecasting skill. This has been evident in some studies that integrated CNN with LSTM for improved performance, with Ghimire et al. [28] showing the superior skill of the CNN-LSTM model in the problem of solar radiation. Integration of deep learning (i.e., CNN-GRU) for soil moisture forecasting is yet to be tested explicitly, with no studies previously using this method, the focus of this study.
Given the stochastic nature of hydrological variables, multi-resolution analysis (MRA) can enhance any model's performance as a tool to reveal the data features. Conventional MRA, for example, discrete wavelet transforms (DWT), have long been implemented [29][30][31][32]. However, DWT appears to have drawbacks, and this critical issue is resolved by the maximum-overlap discrete wavelet transform (MODWT), an advanced DWT method [11,33,34]. In this study, we adopt an improved version of EMD, i.e., complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) to implement a self-adaptive decomposition of the predictor variables [23]. In CEEMDAN-based decomposition, a coefficient representing Gaussian white noise with a unit variance is added consecutively at each stage to reduce the forecasting procedure's complexity, avoiding the time series' intricacy [35]. Previous studies used CEEMDAN in forecasting soil moisture [23,36] with an earlier version (i.e., EEMD) used in forecasting streamflow [37] and rainfall [38][39][40]. Moreover, The multivariate empirical mode decomposition (MEMD) is a self-adaptive algorithm that establishes multivariate inputs to perform a proper investigation [41]. The MEMD method has been successfully applied in time series forecasting [42,43]. The study incorporates the CEEMDAN method as neither the EEMD nor the CEEMDAN decomposition approach has been assimilated with any deep learning approach (i.e., GRU) to produce a soil moisture forecast system, as attempted in the present study.
Climate indices have long been recognised as a useful synoptic-scale indicator of teleconnections representing climate variability [9,44]. La Niña, represented by climate indices, is accountable for substantial rainfall in eastern Australia, whereas the El Niño phenomenon is related to drought [45]. However, El Niño Southern Oscillation (ENSO) has a potential impact on precipitation in northern and eastern Australia [46]. Considering the substantial effects of ENSO phenomena on Australia's climate variability, some

Convolutional Neural Network
To build the CEEMDAN-CNN-GRU hybrid model trained for daily SSM forecasts, this study purposely employs the Convolutional Neural Networks (CNN) for optimal feature extraction from the input dataset. CNN's have some similarities with conventional neural networks. They are, however, different in their connectivity between and within neuronal layers. In conventional neural networks, every neuron is wholly connected to all neurons in prior layers, whereas single layer neurons do not contribute to the model's network. CNN's are similar to Feed Forward Neural Networks [54], with its model architecture having three layers based on pooling, convolutional, and fully connected layer settings.
The connected layer is employed to estimate objective variables depending on the predictor variable's input features. CNN has proven to be a reliable modelling tool to extract hidden features in inputs and generating filters capturing data features in predictors [55]. To extract the pattern in an objective variable (i.e., SSM) and associated predictor variables, each convolutional layer is established as follows [56]: Here, W k is referred to as the weight of the kernel associated with kth feature map, f is the activation function, and the operator of the convolutional procedure is denoted by In a GRU Network, two input features, including the input vector x(t) and output vector h(t − 1), are present in each layer. The yield of each gate is achieved by logical operation and non-linear transformation of predictors. Moreover, the association between predictors and predictand can be defined as follows: In a GRU Network, two input features, including the input vector x(t) and output vector h(t − 1), are present in each layer. The yield of each gate is achieved by logical operation and non-linear transformation of predictors. Moreover, the association between predictors and predictand can be defined as follows: where r(t) is the reset gate vector, z(t) is defined as the update gate vector, W and U are parameter metrics and vector. σ h is referred to as a hyperbolic tangent, and σ g is defined as a sigmoid function. Finally, given the architecture of GRU, a training approach is chosen, which includes backpropagation through time. Based on previous studies, Adam optimiser was implemented as it has enhanced expertise.

Hybrid CNN-GRU. Neural Network
In this paper, the hybrid modelling approach utilises a deep learning method built upon a feature extraction procedure under a forecast model framework. This research demonstrates how the CNN-GRU model comprised of three-layered CNN is used for feature extraction to generate future changes in the objective variable (i.e., SSM). In particular, the GRU layer is employed to integrate input features extracted by the CNN algorithm to finally forecast the target variable (i.e., SSM) with minimal training and testing error.

Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)
As elucidated in Section 1, CEEMDAN is adopted as an improved version of EMD and EEMD to perform a self-adaptive decomposition of model input signals [23] prior to modelling the target variable. The CEEMDAN decomposition process commences by discretising the n-length inputs of any model χ(t) into intrinsic mode functions (IMFs) and residues to comply with tolerability provision. Nevertheless, to ensure no leakage of information in the IMFs and residues from the training series into the future (i.e., testing and validation subset), the decomposition is performed separately for training, validation, and testing. The actual IMF is produced by taking the mean of the EMD-grounded I.M.F.s across a trial and the combination of white noise to model the predictor-target variables.
Assume that we have D-dimensional set, with n-length X i matrix (i.e., inputs selected by two-phase decomposed sub-series achieved during the decomposition) and the 1dimensional surface soil moisture as the target variable. The difference between CEEMDAN and EEMD is that in the CEEMDAN case, a restricted noise (ε i ) across [0, 1] is included at every single decomposition stage, calculated to induce the IMF to take the lead to insignificant error. Considering E j (.) as an operator producing Jth modes obtained from EMD, we follow Torres et al. [58] to implement the CEEMDAN process as follows: Step 1: The decomposition of p-realizations of χ[n] = ε 1 ω p [n] using EMD to develop their first intrinsic approach, as explained according to the equation: Step 2: Putting k = 1, the 1st residue is computed following Equation (7).
Remote Sens. 2021, 13, 554 6 of 30 Step 3: Putting k = 2, the 2nd residual is obtained as: Step 4: Setting k = 2 . . . K calculates the kth residue as: Step 5: Now we decompose the realisations Res k [n] + ε 1 E 1 (ω p [n]), Here, k = 1, . . . K until their first model of EMD is reached; here, the (k + 1) is: Step 6: Now the k value is incremented, and steps 4-6 are repeated. Consequently, the final residue is achieved: Here, K is defined as the limiting case (i.e., the highest number of modes). To comply with the replicability of the earliest input, χ[n], the following is performed for the CEEMDAN approach.
The additive noise demonstrates that signal-to-noise ratio (ε) is operated at every phase [59,60] and must connect the low magnitude with high-frequency signals in the data [61,62]. Figure 1a provides the CEEMDAN decomposed IMFs and residuals and CNN architecture.

Feature Selection: Neighbourhood Component Analysis
The selection of features within the inputs used to forecast soil moisture is vital in applying a predictive model. This is implemented to reduce the dimensionality of model inputs and computational cost, including the desired improvements in the forecasting accuracy and interpretation of the predictive model characteristics and nature of its predictors [59,[63][64][65]. This study has adopted Neighbourhood Component Analysis (NCA) based on regressions applied to segregate the potential input variables from 52 predictor variables. Introduced by Yang et al., this method uses a competent, non-rectilinear, and non-parametric implanted approach. The MATLAB function called "fsrnca" performs NCA feature selection with regularisation to learn feature weights for the minimisation of an objective function that measures the average 'leave-one-out' regression loss over the training data. The NCA process's fsrnca approach is adopted to train a variable set to better understand the importance of features through weight by minimising the objective function and calculating the regression loss of predictive model for soil moisture forecasts.
Consider training a dataset T = {(x i , y i ): i = 1, 2, 3,..., N} where x i ∈ R P is the feature vectors (i.e., predictor variables), y i ∈ R is the target (i.e., SSM), and N is the sample number for the training set. A function g(x) : R P → R is absorbed by fsrnca algorithm to forecast the response y from several input variables, optimising their nearest spaces. The weighted distance (D w ) amongst any two samples is calculated as: Remote Sens. 2021, 13, 554 7 of 30 where x a and x b are the two samples used during training, and w j is defined as the weightrelated to the jth feature. Furthermore, the probability distribution (p αβ ) is employed to increase its leave-one-out forecasting correctness in the training phase. By contrast, the probability is that x α chooses x β as its reference argument. The algorithm acquires a weighting vector 'w' for gradient the ascent method to determine the feature subset with a regularisation factor to prevent overfitting.

Study Area and Description of Predictive Model Development Dataset
For the first time, this study aims to build a new forecast for daily surface soil moisture (SSM) with convolutional-gated recurrent unit neural networks within the Australian Murray Darling Basin (MDB). The MDB covers~1,042,730 km 2 (or 14%) of mainland Australia [24,66] and~67% of agricultural lands [67]. As illustrated (Figure 2), the sites are selected based on climate class and soil type diversity, namely Menindee, Deniliquin, Fairfield, and Gabo Island. The appropriate selection of predictors related to the objective variable has a crucial role in predictive model design. To build a robust model, we adopt remotely sensed MODIS satellite-derived data identified as potential predictor variables in other studies, e.g., solar radiation prediction [24, 71,72]. We consider different studies that demonstrate the potential utility of synoptic-scale climate indices that modulate Australian rainfall and crops [41,73,74]. This study integrates three unique data (i.e., satellite-derived data, climate indices, and ground-based variables) to capture a diverse suite of predictive features to forecast SSM, enabling the deep-learning approach a significant edge over the solely station-based models.  Table 1. It should be noted that the site Gabo Island is located at the border of the MDB region for comparison purposes with the other study stations, whereas 20 lakes surround Menindee in a harsh desert environment. The site Fairfield lies within the savannah climate class with land-use patterns of dryland cropping [23]. Figure 2 also shows a histogram of monthly surface soil moisture patterns for the candidate sites. The appropriate selection of predictors related to the objective variable has a crucial role in predictive model design. To build a robust model, we adopt remotely sensed MODIS satellite-derived data identified as potential predictor variables in other studies, e.g., solar radiation prediction [24, 71,72]. We consider different studies that demonstrate the potential utility of synoptic-scale climate indices that modulate Australian rainfall and crops [41,73,74]. This study integrates three unique data (i.e., satellite-derived data, climate indices, and ground-based variables) to capture a diverse suite of predictive features to forecast SSM, enabling the deep-learning approach a significant edge over the solely station-based models.

MODIS Satellite Dataset
Our hybrid deep learning model (i.e., CEEMDAN-CNN-GRU) is built upon NASA's Geospatial Online Interactive Visualization and Analysis Infrastructure (GIOVANNI) repository (1 February 2003 to 31 March 2020). GIOVANNI represents a powerful online visualisation and analysis tool for geoscience datasets, capturing 2000 satellite variables [75,76]. In this study, MODIS-based predictor variables, presented in Table 2, are utilised to design and evaluate the hybrid CEEMDAN-CNN-GRU model for SSM forecasting. These are extracted from the GLDAS system representing the high-temporal resolution terrestrial modelling system consisting of the land surface state and several flux parameters with three temporal resolution products: hourly, daily, and monthly. Our study has used GLDAS 2.0 datasets extracted in daily temporal resolutions available publicly. The study utilised MODIS-based surface soil moisture (SSM) data as a target variable obtained from the GLDAS 2.0 model.

Scientific Information for Landowners (SILO) Dataset
To increase the pool of predictors, enabling effective feature engineering and increased performance of the DL model, this study selects nine meteorological variables from Scientific Information for Landowners (SILO): https://www.longpaddock.qld.gov.au/silo/ ppd/index.php (accessed on 31 December 2020). SILO, managed by the Department of Environment and Science, Queensland Government [77], is popular for studying the Australian climate. Table 2 provides a list of SILO data.

Climate Indices
In previous studies, e.g., [9,29,59,74] on modelling precipitation, streamflow, and soil moisture, the role of synoptic-scale and climate indices were found significant in improving the overall model. In this study, twenty-one climate indices are thus obtained from many sources: National Climate Prediction Centre, Australian Bureau of Meteorology [70], and National Oceanic and Atmospheric Administration (NOAA) with daily sea surface temperature (Nino1 + 2SST, Nino3SST, Nino3.4SST, Nino4SST) over 1 March 2003 to 31 March 2020 from KNMI Climate Explorer [78]. As the positive SOI is related to La-Nina and negative SOI concurs with El-Nino events [79,80], this study has used all of these indices due to strongly correlated rainfall with lagged SOI showing high predictability of rainfall from August-November [44,81]. To further enhance the predictive skill of the deep learning model, we consider Madden-Julian Oscillation (MJO) known to produce a substantial effect on tropical weather [70], which indeed entails a change in rainfall, wind, sea surface temperature (SST), and cloudiness [82]. Hence, eight daily MJO indices were adopted from KNMI. Climate Explorer [78], together with Interdecadal Pacific Oscillation (IPO), was introduced by Henley et al. [83], collected from NOAA National Climate Prediction Centre. Detailed information on climate indices and SSTs are in Table 2.

Predictive Model Development
To design a forecast model for SSM over multi-step periods of 1st, 5th, 7th, 14th, 21st, and 30th day lead time, three distinct datasets from satellites (i.e., GIOVANNI), climate indices, and ground source (SILO) for 17 years, 1 February 2003 to 31 March 2020 are used. Hybrid DL is implemented under Intel i7 @ 1.5 GHz and 16 GB memory. The proposed model algorithms were demonstrated using freely available DL libraries, namely the Keras [85,86] and TensorFlow [87] libraries. MATLAB 2020 software is used for Neighbourhood Component Analysis feature selection with packages matplotlib, and Minitab is used to visualise the forecasted SSM in the testing phase.
Data-driven models were built by normalising the input variables, transforming these predictors into a more consistent form [88]. To ensure the variable features were given proportional attention in network training, all were normalised [89] between (0, 1) [41,53,90].
In Equation (15), x is the respective variable, x min is the minimum value, x max is the maximum and x norm is the normalised value. After normalising the variables, the datasets are partitioned into training (February 2003-December 2013), validation (January 2014-December 2016), and testing (January 2017-March 2020) subsets. Figure 3 shows the methodological steps of the proposed CEEMDAN-CNN-GRU model. CEEMDAN is implemented in four stages. and 30th day lead time, three distinct datasets from satellites (i.e., GIOVANNI), climate indices, and ground source (SILO) for 17 years, 1 February 2003 to 31 March 2020 are used. Hybrid DL is implemented under Intel i7 @ 1.5 GHz and 16 GB memory. The proposed model algorithms were demonstrated using freely available DL libraries, namely the Keras [85,86] and TensorFlow [87] libraries. MATLAB 2020 software is used for Neighbourhood Component Analysis feature selection with packages matplotlib, and Minitab is used to visualise the forecasted SSM in the testing phase.
Data-driven models were built by normalising the input variables, transforming these predictors into a more consistent form [88]. To ensure the variable features were given proportional attention in network training, all were normalised [89] between (0, 1) [41,53,90].
In Equation (15), is the respective variable, is the minimum value, is the maximum and is the normalised value. After normalising the variables, the datasets are partitioned into training (February 2003-December 2013), validation (January 2014-December 2016), and testing (January 2017-March 2020) subsets. Figure 3 shows the methodological steps of the proposed CEEMDAN-CNN-GRU model. CEEMDAN is implemented in four stages.

Feature Selection
By incorporating the MODIS satellite and ground and climate indices, this study has utilised 52 different predictors for SSM forecasting; hence, feature selection was crucial for data pre-processing. This is because irrelevant and redundant features increase the network size, congestion and cause a reduction in the algorithm's speed, reducing the efficiency of the predictive model [91]. Therefore, our study has used the NCA algorithm to screen an optimal set of predictor variables out of the 52-variable set. In general, fsrnca calculates every predictor's relative weight against a target (SSM), illustrated in Figure 4. Following this, the standalone GRU and hybrid CNN-GRU models were executed with predictors added one by one from the highest feature to the lowest feature weight until an optimal performance was achieved. Figure 5 illustrates the the relative root mean squared error (RRMSE) value of different combinations prepared based on NCA. Tables A1-A6 shows the GRU and CNN-GRU model's performance accordingly.

Feature Selection
By incorporating the MODIS satellite and ground and climate indices, this study has utilised 52 different predictors for SSM forecasting; hence, feature selection was crucial for data pre-processing. This is because irrelevant and redundant features increase the network size, congestion and cause a reduction in the algorithm's speed, reducing the efficiency of the predictive model [91]. Therefore, our study has used the NCA algorithm to screen an optimal set of predictor variables out of the 52-variable set. In general, fsrnca calculates every predictor's relative weight against a target (SSM), illustrated in Figure 4. Following this, the standalone GRU and hybrid CNN-GRU models were executed with predictors added one by one from the highest feature to the lowest feature weight until an optimal performance was achieved. Figure 5 illustrates the the relative root mean squared error (RRMSE) value of different combinations prepared based on NCA. Tables A1-A6 shows the GRU and CNN-GRU model's performance accordingly.   Table 2. Figure 4 illustrates the respective feature weights of predictor variables, using the Menindee station as an example. For the 1st day of SSM forecasting, the root zone soil moisture (kg m −2 ) is found to generate the highest feature weight, whereas, for the 5th day, groundwater storage (mm) is found to be the most significant feature weight. Notably, the groundwater storage contributed to the largest feature weighted for the 7th, 14th, 21st, and 30th day SSM forecasting. This evaluation indicates that groundwater has a strong influence on SSM over inter-daily scales. Tables S1-S6 illustrates the input combination for SSM forecasting in the nth day lead period with their respective forecasting performance with CNN-GRU and GRU model. It is imperative to note that fsrnca algorithm is used in two distinct phases before applying the hybrid-deep learning (i.e., CEEMDAN-CNN-GRU) model. In the first phase, fsrnca attains the feature weights and acquires the optimal predictor variable list required for SSM forecasts. Subsequently, the second phase incorporates the data decomposition process utilising CEEMDAN to each variable selected from the feature weights. Finally, the feature weight is calculated for IMF (t) deduced for each predictor variable against the objective variable (i.e., SSM). Here, the term t refers to the number of IMFs for each variable, removing four to five least significant features from the hybrid CEEMDAN-CNN-GRU model. Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 31  Figure 4 illustrates the respective feature weights of predictor variables, using the Menindee station as an example. For the 1st day of SSM forecasting, the root zone soil moisture (kg m −2 ) is found to generate the highest feature weight, whereas, for the 5th day, groundwater storage (mm) is found to be the most significant feature weight. Notably, the groundwater storage contributed to the largest feature weighted for the 7th, 14th, 21st, and 30th day SSM forecasting. This evaluation indicates that groundwater has a strong influence on SSM over inter-daily scales. Tables S1-S6 illustrates the input combination for SSM forecasting in the n th day lead period with their respective forecasting performance with CNN-GRU and GRU model. It is imperative to note that fsrnca algorithm is used in two distinct phases before applying the hybrid-deep learning (i.e., CEEMDAN-CNN-GRU) model. In the first phase, fsrnca attains the feature weights and acquires the optimal predictor variable list required for SSM forecasts. Subsequently, the second phase incorporates the data decomposition process utilising CEEMDAN to each variable selected from the feature weights. Finally, the feature weight is calculated for IMF (t) deduced for each predictor variable against the objective variable (i.e., SSM). Here, the term t refers to the number of IMFs for each variable, removing four to five least significant features from the hybrid CEEMDAN-CNN-GRU model.

Hybrid Deep Learning Algorithm Implementation
Before applying the CEEMDAN-CNN-GRU model in the 1st, 5th, 7th, 14th, 21st, and 30th day SSM forecasts, hyperparameter selection is undertaken through a grid search procedure whose theoretical descriptions are provided in Section 2. Table 3 shows the hyperparameters, optimal GRU architecture, and CNN-GRU with input combinations deduced from the feature weight matrix. Finally, the deep learning forecast model combining a data decomposition (i.e., CEEMDAN) stage with a three-layered feature extraction stage (i.e., CNN) and feature selection stage (i.e., fsrnca) is implemented to forecast SSM.
The proposed CEEMDAN-CNN-GRU model is implemented in four stages, as shown in Figure 3. Firstly, CEEMDAN is applied to decompose historical training data into IMFs and residual signals (Figure 1a) followed by segregation of each IMFs and residual, such as collecting all the IMF1 for predictor variables. The relative feature weights of respective IMFs related to IMF of the target variable (i.e., SSM) are determined. The optimal signal selection enables the algorithm to remove the least important feature-weighted IMFs, allowing the predictive model network to be noise-free. Finally, the forecasted SSM utilising the CEEMDAN-based model (i.e., the hybrid CEEMDAN-CNN-GRU) is obtained by aggregating the IMFs of the predictor variables. The robustness of the model is investigated by several evaluation criteria (Section 3.2.3).  It is worth noting that climate indices (CIs) have a notable signature of climate variability in Australia, leading to substantial influence on rainfall and a potential effect on future surface soil moisture patterns. In the final task, climate indices' relative contribution to building the CEEMDAN-CNN-GRU model is assessed by Multivariate Adaptive Regression Splines (MARS) utilising the ARESLab toolbox. Following Friedman [86], MARS can determine each predictor variable's significance by evaluating its complex and non-linear interaction with the target (i.e., SSM) based on best regressors and provide the importance of each variable. The relative importance of any predictor variable is the square root of GCV (Generalised Cross-Validation) with all basic functions involving the respective variable minus the root square of the GCV score of that full model. However, this process is scaled in such a way that the relative importance has a value of 100, expressed: Here, enp is the significant number of model parameters, p = k + c (k − 1)/2; k = basis function in MARS model; c = penalty (set to 2 or 3). However, if enp is greater or equal to N, GCV is an Inf, which indicates the model is flawed [92].

Predictive Model Evaluation
The efficacy of deep learning hybrid model is evaluated using different performance evaluation criteria e.g., Pearson's Correlation Coefficient (r), root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE) [93], mean absolute error (MAE), and Kling-Gupta efficiency [94]. Due to geographic differences between the study stations, we employ relative error-based metrics: i.e., relative RMSE (denoted as RRMSE) and relative MAE (denoted as RMAE). The appraisal of a predictive model's efficacy depends on the exactness between the predicted and observed values. RMSE is an appropriate measure of model performance compared to MAE when the error distribution in the tested data is Gaussian [95] but for an improved model evaluation, the Willmott's Index (WI) and Legates-McCabe's (LM) Index are used as more sophisticated and compelling measures [96,97]. Mathematically, the metrics are as follows: Correlation coefficient (r): Mean absolute error (MAE: kg m −2 ): Root mean squared error (RMSE: kg m −2 ): Nash-Sutcliffe Efficiency (NSE): Kling-Gupta efficiency (KGE): Mean Absolute Percentage Error (MAPE, %): Willmott's Index (WI): Legates-McCabe's Index (LM): Relative Root Mean Squared Error (RRMSE, %): Relative Mean Absolute Error (RMAE, %): Absolute percentage bias (APB, %): In Equations (17)- (27), SSM obs and SSM f or represents the observed and forecasted values for ith test value; SSM obs and SSM f or refer to their averages, accordingly, and N is defined as the number of observations, while CV stands for the coefficient of variation. CV is a standardised measure of the dispersion of the frequency distribution.

Results
The practical utility of the hybrid DL (i.e., CEEMDAN-CNN-GRU) model is established by integrating diverse data in its training and model testing phase. Significant features from predictor variables are used by incorporating NCA, and the predictive model is evaluated using statistical metrics (Equations (17)-(27)), infographics, and visualisations to appraise the degree of agreements between simulated and observed soil moisture. By several measures, the CEEMDAN-CNN-GRU model appears to outperform all the comparative models with superior r and NSE and low RMSE, MAE, and APB in the testing phase. An extensive analysis of tabulated results (Table 4) provides convincing arguments that the hybrid deep learning method is effective for surface soil moisture forecasts and can perhaps be a potential tool in agriculture water management. However, among all study sites, the CEEMDAN-CNN-GRU model for the Menindee station showed the best performance, considering r (0.996), NSE (0.995), and lowest RMSE (0.021), MAE (0.013), and APB (0.359) values for the 1st day of SSM forecasting. The performance of this model is followed by the CEEMDAN-GRU and CNN-GRU model.
For the 5th day of SSM forecasting, the results of the objective model for Menindee had the best performance (r = 0.993; NSE = 0.991; RMSE = 0.040 kg m −2 ) followed by Deniliquin (r = 0.989; NSE = 0.975; RMSE = 0.091 kg m −2 ). Likewise, for the 7th, 14th, 21st, and 30th days of SSM forecasting, the CEEMDAN-CNN-GRU model outperformed the other models by a notable margin for all the respective periods of SSM forecasting. However, a site-specific signature in the model accuracy was also evident, with the results for Menindee registering the lowest value of RMSE generated by the CEEMDAN-CNN-GRU model. In terms of MAE, the CEEMDAN-CNN-GRU model returned the lowest value for Menindee, suggesting that the CEEMDAN-CNN-GRU model was a potential forecasting tool SSM at the 1st, 5th, and 7th day ahead periods. Not surprisingly, in accordance with other studies, e.g., the present study indicates that as the length of the forecasting period was increased, the model's performance appear to reduce at a significant rate in such a way that the r-values reduced by 0.30%, 1.10%, 9.15%, 11% and 15% for the 1st to 5th, 7th, 14th, 21st and 30th day of SSM forecasting. The change of the performance metrics (i.e., NSE, MAE, and APB) for longer-term horizons relative to the shorter-term horizons also concurred with the respective changes in the r-values and is consistent with earlier studies [60,98]. For a longer-term horizon, the present r value was lower, and the MAE increased, suggesting that for the longer forecast horizon, the model appeared to lose the relevant data features in the predictor variables required to maintain precise SSM forecasting performance. The hybrid CEEMDAN-CNN-GRU model is further evaluated using a probability plot of errors at the 95th percentile, including those of the benchmark model (i.e., CNN-GRU, CEEMDAN-GRU) and the standalone model (i.e., GRU) with an illustration for Menindee at the different nth (n = 1, 5, 7, 14, 21 and 30) days ( Figure 6). The CEEMDAN-CNN-GRU model results show that~95% of SSM forecasting had the lowest error (<0.1) for the 1st and 5th days of SSM forecasting. Among all the predictive models and the forecast periods over nth days, the GRU-based model showed a more significant proportion of |FE| values at a 95% confidence level. Notably, consistently good results were also achieved for the other stations (i.e., Deniliquin, Fairfield, and Gabo Island), which are shown in supplementary materials ( Figure S1a-c). The lowest value of |FE|, with <0.063 with a 95% percentile, was evident for Fairfield compared to the other two study stations. The correlation between observed and forecasted daily surface soil moisture datasets generated by the proposed CEEMDAN-CNN-GRU model vs. the corresponding benchmark models (i.e., CNN-GRU and GRU), for the case of Menindee station, is illustrated in Figure 7. The correlations for the hybrid GRU model are positioned close to the observed SSM values up to the 7th day, revealing a high degree of forecasting accuracy. An improvement in the model's forecasting performance was attained by applying the CNN algorithm (i.e., soil moisture generated by the CNN-GRU model) and data decomposition (i.e., CEEMDAN-CNN-GRU) method on standalone GRU model. The disparity between the forecasted SSM and the reference SSM values was significantly higher for the 14th, 21st, and 30th days of SSM forecasting, which concurs with earlier metrics suggesting a potential inadequacy of the data features long time ahead periods [60]. Table 4. Evaluation of hybrid CEEMDAN-CNN-GRU vs. benchmark (CNN-GRU, CEEMDAN-GRU, GRU) models for the specific case of Menindee study site. The correlation coefficient (r), root mean square error (RMSE; Kg m −2 ), mean absolute error (MAE; Kg m −2 ), and Nash-Sutcliffe coefficient, NS) is computed between forecasted and observed surface soil moisture for the 1st day, 5th day, 7th day, 14th day, 21st day, and 30th day ahead periods in the testing phase. The optimal model is boldfacede.      To further analyse the tested models' performances, we adopt the Legates and McCabe's Index [99] as a cross-validation metric for simulated data. This metric has a better model penalisation skill when high SSM values are expected in the testing set [41]. This is illustrated in Figure 9 in terms of a polar plot of the LM values for the hybrid deep learning approach (i.e., CEEMDAN-CNN-GRU) and other models for the different day ahead forecasting. The LM values accumulated across all stations in the case of CEEMDAN-CNN-GRU have a superior result with the highest LM ≈ 0.962 for Menindee and the lowest LM for the case of Gabo Island (LM ≈ 0.846) in the 1st Day ahead SSM forecasting. In agreement with earlier results, the LM values for the 14th, 21st, and 30th day ahead for other models were comparatively smaller. Figure 10a     To further analyse the tested models' performances, we adopt the Legates and Mc-Cabe's Index [99] as a cross-validation metric for simulated data. This metric has a better model penalisation skill when high SSM values are expected in the testing set [41]. This is illustrated in Figure 9 in terms of a polar plot of the LM values for the hybrid deep learning approach (i.e., CEEMDAN-CNN-GRU) and other models for the different day ahead forecasting. The LM values accumulated across all stations in the case of CEEMDAN-CNN-GRU have a superior result with the highest LM ≈ 0.962 for Menindee and the lowest LM for the case of Gabo Island (LM ≈ 0.846) in the 1st Day ahead SSM forecasting. In agreement with earlier results, the LM values for the 14th, 21st, and 30th day ahead for other models were comparatively smaller. Figure 10a,b is a contour plot of KGE and MAPE for the hybrid DL approach (i.e., CEEMDAN-CNN-GRU) along with its benchmark (i.e., CNN-GRU) and standalone (i.e., GRU) methods for all four stations in MDB at different nth (n = 1, 5, 7, 14, 21 and 30) days in forecasting SSM. This infographic verifies the robustness of the proposed objective model that attains the highest KGE values and the lowest MAPE values for 1st and 5th day of SSM forecasting. gure 8. Scatter plot of the forecasted and observed SSM for Menindee, Deniliquin, Fairfield, and Gabo Island stations at fferent n th (n = 1 and 7) day ahead. A least square regression line, y = mx + C, and coefficient of determination (R 2 ) is own in each sub-panel.  However, for the 14th, 21st, and 30th day of SSM forecasting, the KGE values range between 0.40 and 0.80, and the MAPE values range from 4-11%, demonstrating a slightly lower forecast accuracy relative to the 1st and 5th day of SSM forecasting. Figure 11 illustrates the absolute forecasted error (|FE|) using all the four candidate study sites' implemented models. The box plot demonstrates the data dispersal in terms of the forecasted (SSM for ) SSM. Figure 11     It is noteworthy that in this study, two distinct algorithms, namely the CEEMDAN and CNN, are used to improve the GRU-based predictive model. Therefore Figure 12 shows the effect of applying CEEMDAN and CNN as data pre-processing and feature extraction methods incrementally, respectively, on the per cent change in RMAE values within the testing SSM values. In terms of 1st, 5th, and 7th day of Menindee station, the RMAE (%) values of CEEMDAN-CNN-GRU model (where both CEEMDAN and CNN are integrated) appeared to decrease by~87%, 68%, and 54%, respectively. Similarly, for the 1st-day forecasting taking the example of Fairfield station, the CNN feature-extraction skill reduced the error of~55%, whereas an additional decrease in RMAE of~18% was noted integration of the CEEMDAN selected variables (CEEMDAN-CNN-GRU). Additionally, for Deniliquin and Gabo Island study sites, the SSM forecasting for the 1st day ahead evaluated through RMAE values decreased by slightly less than 20%. It is worth mentioning that the per cent increase in RMAE was~5% for Menindee for the 30th day ahead SSM forecasting with similar deductions for the other stations.  We further show the CEEMDAN-CNN-GRU hybrid model's skill for seasonal forecasting for the different day ahead periods to better understand the seasonal effects of models used in SSM prediction. Figure 13 displays the average observed vs forecasted SSM on a seasonal basis (i.e., austral summer, autumn, winter, and spring) generated by CEEMDAN-CNN-GRU model in case of Menindee study site. The forecast error across these seasons is relatively insignificant, occupying values of (0, 0.16) kg m −2 to demonstrate the exceptional skill of the objective model. Notably, the 1st and 5th day ahead of observed and forecasted SSM for austral summer, spring, winter, and autumn appear to match with the forecast error (|FE|) < 0.04 kg m −2 , whereas, for winter, the |FE| values are slightly higher for the 5th day ahead SSM forecasting. Not surprisingly, the CNN-GRU model possesses a larger error, ranging from 0.04 to 0.18 kg m −2 , establishing the CNN-GRU model's relatively poor performance compared with the hybrid CEEMDAN-CNN-GRU model. For the case of the 30th day ahead SSM forecasting, the study site Menindee registered a higher uncertainty for austral summer (0.18<|FE|< 0.18 kg m −2 ) compared with winter and spring (0.14 < FE < 0.15 kg m −2 ). This indicates that the hybrid CEEMDAN-CNN-GRU model developed with NCA and CEEMDAN algorithms employing MODIS-derived satellite data, ground-based observations, and climate indices can be considered ideal in multi-step SSM forecasting.

Discussions
Based on the results, we note the effects of climate indices on surface soil moisture as non-negligible. In this paper, analysing this impact is undertaken using two ways. Firstly, the NCA algorithm provides key information about how climate indices affect SSM. For example, for SSM forecasting, climate indices based on SOI, EPO, MJOs and SST were found to significantly affect the SSM. Secondly, GCV values based on a MARS model were calculated following Friedman [100] approach to deduce the importance of input features. The contributory influence appeared to be between 12% and 53% according to GCV for

Discussions
Based on the results, we note the effects of climate indices on surface soil moisture as non-negligible. In this paper, analysing this impact is undertaken using two ways. Firstly, the NCA algorithm provides key information about how climate indices affect SSM. For example, for SSM forecasting, climate indices based on SOI, EPO, MJOs and SST were found to significantly affect the SSM. Secondly, GCV values based on a MARS model were calculated following Friedman [100] approach to deduce the importance of input features. The contributory influence appeared to be between 12% and 53% according to GCV for Menindee station, and similarly notable effect for the other study sites. Specifically, the lowest percentage of~12% of GCV was found for the 14th, and the highest percentage (~53%) was found for the 28th Day ahead SMM forecasting. In a nutshell, we note that climate indices make a moderate to high contribution in forecasting surface soil moisture within the Murray Darling Basin.
Neighbourhood Component Analysis (NCA) was utilised to examine significant features from a relatively large pool (or 52 different) data related to soil moisture. In data-driven modelling, selecting predictor variables is crucial, as improper variables with weak relationships against SSM can lead to undesirable uncertainties in the model. As per evaluations in Tables S1-S6, a combination of predictor variables deduced by NCA at six different lead time SSM forecasting was significant, and this result concurred with previous studies [59,101].
The objective approach based on NCA yielded good accuracy (i.e., CEEMDAN-CNN-GRU), demonstrating that best predictors were attained through a careful variable selection stage (by NCA) and feature extraction stage (by CNN and CEEMDAN methods). Accordingly, the proposed forecast model for SSM was sufficiently robust in daily and seasonal tests, as well as through the inclusion of synoptic-scale features, i.e., those captured from patterns in the SST and MJO series. The probability of absolute error placing within the 95th percentile and the substantial seasonal forecasting of SSM indicates that the model can handle satellite-derived variables' error. Our study also suggests that groundwater recharge, deep percolation, and plant uptake, which are essential factors to concentrate soil moisture in different layers [57], can be ideal variables to better understand SSM characteristics while also assisting in the prediction of future changes.
The present model's performance revealed that a shorter period forecast (i.e., 1st, 5th, or 7th) was more precise, whereas a longer forecast horizon (i.e., 14th, 21st, and 30th) registered a lower accuracy than that of the shorter span of SSM forecasting. One plausible reason for this is that our predictive model appeared to struggle to capture enough input features from the dataset for a more extended time-step forecast (i.e., 30th day against 7th day). Considering the reduction in feature capturing capability of the model, we can say that as the time series data approached close to the 7th-day boundary, the model would capture it with good forecast accuracy. Undoubtedly, this occurs due to a loss of data features in the predictor-target matrix. This indeed concurs with earlier studies (e.g., [60,98], where models for the 1-and 2-day ahead modelling horizon was more accurate than the 30-day horizon for river flow forecasting, and the 1-and 3-month runoff model was more accurate than the 6-month runoff model predicting 1-, 3-, and 6-month ahead runoff in the Yingluoxia watershed, Northwestern China. The hybrid deep learning approach (i.e., CEEMDAN-CNN-GRU) incorporated with MODIS satellite-derived data, ground-based SILO data, and climate mode indices (representing synoptic-scale climate features) can be a good modelling tool to predict soil moisture or other hydrological variables at multi-step lead times, including its future use in water resource management and sustainable agriculture.

Conclusions
This study reports the performance efficacy of a DL data-driven (CEEMDAN-CNN-GRU) model based on the Gated Recurrent Unit (GRU) for daily surface soil moisture forecasting at multi-step horizons. The hybrid CEEMDAN-CNN-GRU model was built by integrating MODIS sensors (satellite-derived data), ground-based observations, and climate indices tested at important stations in the Australian Murray Darling Basin. To attain an accurate and reliable model for soil moisture, a feature extraction (i.e., CNN) and feature (or variable) selection algorithm (i.e., NCA) was used, with tests at 1st, 5th, 7th, 14th, 21st, and 30th day ahead period. The input variables, comprised initially of 52 different predictors, were extracted from March 2003 to March 2020 and screened accordingly, using the NCA algorithm through a feature selection stage, to select the most relevant input variables required to forecast daily-scale soil moisture. Three other benchmarking models (i.e., CEEMDAN-GRU, CNN-GRU, and GRU) were built and evaluated against statistical score metrics and visual analysis to ascertain the predictive skill of the objective model of observed and forecasted datasets in the testing phase. The results revealed that NCA was a practical approach to acquire the best features from an optimal set of predictor variables. The hybrid CEEMDAN-CNN-GRU model has significantly improved the decomposition of input variables to provide more defined soil moisture prediction features. Thus, the proposed CEEMDAN-CNN-GRU model yielded an acceptable level of accuracy when applied at the 1st, 5th, and 7th day ahead SSM forecasting against standalone GRU model registering a comparatively higher forecast error at all these periods. This superior performance was also endorsed with low MAE values, ranging from 0.013 kg m −2 to 0.067 kg m −2 , 0.030 kg m −2 to 0.075 kg m −2 , and 0.057 kg m −2 to 0.113 kg m −2 for the 1st, 5th, and 7th day ahead period. Other results also supported the practical utility of the CEEMDAN-CNN-GRU model. For example, the probability plot of absolute error for Menindee station has 95% of SSM forecasting with the lowest error bracket (<0.1) at the 1st, and 5th day SSM prediction, and these results were better than earlier studies on forecasting soil moisture prediction, e.g., [23,36,59,102]. As the present study has focused on daily scale prediction, in a future study, researchers may also adopt the CEEMDAN-CNN-GRU model to utilise the global climate model (GCM) model-simulated variables to estimate future SSM under global warming scenarios.
Supplementary Materials: The following are available online at https://www.mdpi.com/2072-429 2/13/4/554/s1, Table S1: Performance of CNN-GRU and GRU model to forecast the 1st day Surface Soil moisture of Minendee Station with the optimum forecasting results based on the Nash-Sutcliffe coefficient (NS) and mean absolute error (MAE; Kg m −2 ) for the testing phase, Table S2: Caption  identical to Table S1, except for the 5th day. Table S3: Caption identical to Table S1, except for the 7th day, Table S4: Caption identical to Table S1, except for the 14th day, Table S5: Caption identical  to Table S1, except for the 21st day, Table S6: Caption identical to Table S1, except for the 30th day, Figure