Modelzoo

The following section gives an overview of all implemented models in NeuralHydrology. Conceptually, all models in our package consist of two parts, the model class (which constitutes the core of the model as such) and the model heads (which relate the outputs of the model class to the predicted variables). The section Model Heads provides a list of all implemented model heads, and the section Model Classes a list of all model classes. If you want to implement your own model within the package you best start at the section Implementing a new model, which provides the necessary details to do so.

Model Heads

The head of the model is used on top of the model class and relates the outputs of the Model Classes to the predicted variable. Currently four model heads are available: Regression, GMM, CMAL and UMAL. The latter three heads provide options for probabilistic modelling. A detailed overview can be found in Klotz et al. “Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling”.

Regression

neuralhydrology.modelzoo.head.Regression provides a single layer regression head, that includes different activation options for the output. (namely a linear, relu and softplus).

It is possible to obtain probabilistic predictions with the regression head by using Monte-Carlo Dropout. Its usage is defined in the config.yml by setting mc_dropout. The sampling behavior is governed by picking the number of samples (n_samples) and the approach for handling negative samples (negative_sample_handling).

GMM

neuralhydrology.modelzoo.head.GMM implements a Gaussian Mixture Model head. That is, a mixture density network with Gaussian distributions as components. Each Gaussian component is defined by two parameters (the mean, the variance) and by a set of weights. The current implementation of the GMM head uses two layers. Specific output activations are used for the variances (torch.exp()) and the weights (torch.softmax()).

The number of components can be set in the config.yml using n_distributions. Additionally, the sampling behavior (for the inference) is defined with config.yml by setting the number of samples (n_samples), and the approach for handling negative samples (negative_sample_handling).

CMAL

neuralhydrology.modelzoo.head.CMAL implements a Countable Mixture of Asymmetric Laplacians head. That is, a mixture density network with asymmetric Laplace distributions as components. The name is a homage to UMAL, which provides an uncountable extension. The CMAL components are defined by three parameters (location, scale, and asymmetry) and linked by a set of weights. The current implementation of the CMAL head uses two layers. Specific output activations are used for the component scales (torch.nn.Softplus(2)), the asymmetries (torch.sigmoid()), and the weights (torch.softmax()). In our preliminary experiments this heuristic achieved better results.

The number of components can be set in the config.yml using n_distributions. Additionally, one can sample from CMAL. The behavior of which is defined by setting the number of samples (n_samples), and the approach for handling negative samples (negative_sample_handling).

UMAL

neuralhydrology.modelzoo.head.UMAL implements an Uncountable Mixture of Asymmetric Laplacians head. That is, a mixture density network that uses an uncountable amount of asymmetric Laplace distributions as components. The uncountable property is achieved by implicitly learning the conditional density and approximating it, when needed, with a Monte-Carlo integration, using sampled asymmetry parameters. The UMAL components are defined by two parameters (the location and the scale) and linked by a set of weights. The current implementation uses two hidden layers. The output activation for the scale has some major differences to the original implementation, since it is upper bounded (using 0.5*torch.sigmoid()).

During inference the number of components and weights used for the Monte-Carlo approximation are defined in the config.yml by n_taus. The additional argument umal_extend_batch allows to explicitly account for this integration step during training by repeatedly sampling the asymmetry parameter and extending the batch by n_taus. Furthermore, depending on the used output activation the sampling of the asymmetry parameters can yield unwarranted model behavior. Therefore the lower- and upper-bounds of the sampling can be adjusted using the tau_down and tau_up options in the config yml. The sampling for UMAL is defined by choosing the number of samples (n_samples), and the approach for handling negative samples (negative_sample_handling).

Model Classes

BaseModel

Abstract base class from which all models derive. Do not use this class for model training.

ARLSTM

neuralhydrology.modelzoo.arlstm.ARLSTM is an autoregressive long short term memory network (LSTM) that assumes one input is a time-lagged version of the output. All features (x_d, x_s, x_one_hot) are concatenated and passed to the timeseries network at each time step, along with a binary flag that indicates whether the autoregressive input (i.e., lagged target data) is missing (False) or present (True). The length of the autoregressive lag can be specified in the config file by specifying the lag on the autoregressive input. Any missing data in the autoregressive inputs is imputed with appropriately lagged model output, and gradients are calculated through this imputation during backpropagation. Only one autoregressive input is currently supported, and it is assumed that this is the last variable in the x_d vector. This model uses a standard pytorch LSTM cell, but only runs the optimized LSTM one timestep at a time, and is therefore significantly slower than the CudaLSTM.

CudaLSTM

neuralhydrology.modelzoo.cudalstm.CudaLSTM is a network using the standard PyTorch LSTM implementation. All features (x_d, x_s, x_one_hot) are concatenated and passed to the network at each time step. If statics/dynamics_embedding are used, the static/dynamic inputs will be passed through embedding networks before being concatenated. The initial forget gate bias can be defined in config.yml (initial_forget_bias) and will be set accordingly during model initialization.

CustomLSTM

neuralhydrology.modelzoo.customlstm.CustomLSTM is a variant of the CudaLSTM that returns all gate and state activations for all time steps. This class is mainly implemented for exploratory reasons. You can use the method model.copy_weights() to copy the weights of a CudaLSTM model into a CustomLSTM model. This allows to use the fast CUDA implementations for training, and only use this class for inference with more detailed outputs. You can however also use this model during training (model: customlstm in the config.yml) or as a starter for your own modifications to the LSTM cell. Note, however, that the runtime of this model is considerably slower than its optimized counterparts.

EA-LSTM

neuralhydrology.modelzoo.ealstm.EALSTM is an implementation of the Entity-Aware LSTM, as introduced in Kratzert et al. “Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets”. The static features (x_s and/or x_one_hot) are used to compute the input gate activations, while the dynamic inputs x_d are used in all other gates of the network. The initial forget gate bias can be defined in config.yml (initial_forget_bias). If statics/dynamics_embedding are used, the static/dynamic inputs will first be passed through embedding networks. The output of the static embedding network will then be passed through the input gate, which consists of a single linear layer.

EmbCudaLSTM

Deprecated since version 0.9.11-beta: Use CudaLSTM with statics_embedding.

neuralhydrology.modelzoo.embcudalstm.EmbCudaLSTM is similar to CudaLSTM, with the only difference that static inputs (x_s and/or x_one_hot) are passed through an embedding network before being concatenated to the dynamic inputs x_d at each time step.

GRU

neuralhydrology.modelzoo.gru.GRU is a network using the standard PyTorch GRU implementation. All features (x_d, x_s, x_one_hot) are concatenated and passed to the network at each time step. If statics/dynamics_embedding are used, the static/dynamic inputs will be passed through embedding networks before being concatenated.

Hybrid-Model

neuralhydrology.modelzoo.hybridmodel.HybridModel is a wrapper class to combine data-driven methods with conceptual hydrological models. Specifically, an LSTM network is used to produce a dynamic parameterization for a conceptual hydrological model. The inputs for the model are split into two groups: i) the inputs going into the LSTM dynamic_inputs, static_attributes, etc. and ii) the inputs going into the conceptual model `dynamic_conceptual_inputs. If the features used in the data-driven part are also used into the conceptual model, one should use the duplicate_features configuration argument. One also has to add the input features of the conceptual model and the target variable into custom_normalization, due to the mass-conservative structure of the conceptual part.

dynamic_inputs:
    prcp(mm/day)
duplicate_features:
    prcp(mm/day)
dynamic_conceptual_inputs:
    prcp(mm/day)_copy1
custom_normalization:
    prcp(mm/day)_copy1:
        centering: None
        scaling: None
    QObs(mm/d):
        centering: None
        scaling: None

Mamba

neuralhydrology.modelzoo.mamba.Mamba is a state space model (SSM) using the PyTorch implementation https://github.com/state-spaces/mamba/tree/main from Gu and Dao (2023).

There are two required dependencies for Mamba: mamba_ssm and causal-conv1d, which are the mamba ssm layer and implementation of a simple causal Conv1d layer used inside the Mamba block, respectively. Note the version here: causal-conv1d>=1.1.0

There are three hyperparameters which can be set in the config file: - mamba_d_conv: Local convolution width (Default is set to 4) - mamba_d_state: SSM state expansion factor (Default is set to 16) - mamba_expand: Block expansion factor (Default is set to 2)

MC-LSTM

neuralhydrology.modelzoo.mclstm.MCLSTM is a concept for a mass-conserving model architecture inspired by the LSTM that was recently proposed by Hoedt et al. (2021). The implementation included in this library is the exact model configuration that was used for the hydrology experiments in the linked publication (for details, see Appendix B.4.2). The inputs for the model are split into two groups: i) the mass input, whose values are stored in the memory cells of the model and from which the target is calculated and ii) auxiliary inputs, which are used to control the gates within the model. In this implementation, only a single mass input per timestep (e.g. precipitation) is allowed, which has to be specified with the config argument mass_inputs. Make sure to exclude the mass input feature, as well as the target variable from the standard feature normalization. This can be done using the custom_normalization config argument and by setting the centering and scaling key to None. For example, if the mass input is named “precipitation” and the target feature is named “discharge”, this would look like this:

custom_normalization:
    precipitation:
        centering: None
        scaling: None
    discharge:
        centering: None
        scaling: None

All inputs specified by the dynamic_inputs config argument are used as auxiliary inputs, as are (possibly embedded) static inputs (e.g. catchment attributes). The config argument head is ignored for this model and the model prediction is always computed as the sum over the outgoing mass (excluding the trash cell output).

MTS-LSTM

neuralhydrology.modelzoo.mtslstm.MTSLSTM is a newly proposed model by Gauch et al. “Rainfall–Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network”. This model allows the training on more than temporal resolution (e.g., daily and hourly inputs) and returns multi-timescale model predictions accordingly. A more detailed tutorial will follow shortly.

ODE-LSTM

neuralhydrology.modelzoo.odelstm.ODELSTM is a PyTorch implementation of the ODE-LSTM proposed by Lechner and Hasani. This model can be used with unevenly sampled inputs and can be queried to return predictions for any arbitrary time step.

Transformer

neuralhydrology.modelzoo.transformer.Transformer is the encoding portion of a standard transformer network with self-attention. This uses the standard PyTorch TransformerEncoder implementation. All features (x_d, x_s, x_one_hot) are concatenated and passed to the network at each time step. Unless the number of inputs is divisible by the number of transformer heads (transformer_nheads), it is necessary to use an embedding network that guarantees this. To achieve this, use statics/dynamics_embedding, so the static/dynamic inputs will be passed through embedding networks before being concatenated. The embedding network will then map the static and dynamic features to size statics/dynamics_embedding['hiddens'][-1], so the total embedding size will be the sum of these values. Instead of a decoder, this model uses a standard head (e.g., linear). The model requires the following hyperparameters specified in the config file:

transformer_positional_encoding_type: choices to “sum” or “concatenate” positional encoding to other model inputs.
transformer_positional_dropout: fraction of dropout applied to the positional encoding.
transformer_nheads: number of self-attention heads.
transformer_dim_feedforward: dimension of the feedforward networks between self-attention heads.
transformer_dropout: dropout in the feedforward networks between self-attention heads.
transformer_nlayers: number of stacked self-attention + feedforward layers.

XLSTM

neuralhydrology.modelzoo.x_lstm.XLSTM is a recurrent architecture using the official implementation of backbone https://github.com/NX-AI/xlstm from Beck et al. (2024).

The xlstm package is the required dependency for XLSTM.

There are five hyperparameters which can be set in the config file:

xlstm_num_blocks: number of stacked xLSTM blocks (Default is set to 2)
xlstm_slstm_at: indices of blocks of scalar-memory (Default is set to position [1])
xlstm_heads: number of heads (Default is set to 1)
xlstm_kernel_size: convolutional kernel size (Default is set to 4)
xlstm_proj_factor: projection factor (Default is set to 1.3)

Handoff-Forecast-LSTM

neuralhydrology.modelzoo.handoff_forecast_lstm.HandoffForecastLSTM is a forecasting model that uses a state-handoff to transition from a hindcast sequence model to a forecast sequence (LSTM) model. The hindcast model is run from the past up to present (the issue time of the forecast) and then passes the cell state and hidden state of the LSTM into a (nonlinear) handoff network, which is then used to initialize the cell state and hidden state of a new LSTM that rolls out over the forecast period. The handoff network is implemented as a custom FC layer, which can have multiple layers. The handoff network is implemented using the state_handoff_network config parameter. The hindcast and forecast LSTMs have different weights and biases, different heads, and can have different embedding networks. The hidden size of the hindcast LSTM is set using the hindcast_hidden_size config parameter and the hidden size of the forecast LSTM is set using the forecast_hidden_size config parameter, which both default to hidden_size if not set explicitly.

The handoff forecast LSTM model can implement a delayed handoff as well, such that the handoff between the hindcast and forecast LSTM occurs prior to the forecast issue time. This is controlled by the forecast_overlap parameter in the config file, and the forecast and hindcast LSTMs will run concurrently for the number of timesteps indicated by forecast_overlap. We recommend using the ForecastOverlapMSERegularization regularization option to regularize the loss function by (dis)agreement between the overlapping portion of the hindcast and forecast LSTMs. This regularization term can be requested by setting the regularization parameter in the config file to include forecast_overlap.

Multihead-Forecast-LSTM

neuralhydrology.modelzoo.multihead_forecast_lstm.MultiheadForecastLSTM is a forecasting model that runs a sequential (LSTM) model up to the forecast issue time, and then directly predicts a sequence of forecast timesteps without using a recurrent rollout. Prediction is done with a custom FC (fully connected) layer, which can have multiple layers. Do not use this model with forecast_overlap > 0.

Sequential-Forecast-LSTM

neuralhydrology.modelzoo.sequential_forecast_lstm.SequentialForecastLSTM is a forecasting model that uses a single sequential (LSTM) model that rolls out through both the hindcast and forecast sequences. The difference between this and a standard CudaLSTM is (1) this model uses both hindcast and forecast input features, and (2) it uses a separate embedding network for the hindcast period and the forecast period. Do not use this model with forecast_overlap > 0.

Stacked-Forecast-LSTM

neuralhydrology.modelzoo.stacked_forecast_lstm.StackedForecastLSTM is a forecasting model that uses two stacked sequential (LSTM) models to handle hindcast vs. forecast.

The total sequence length seq_length config parameter must be equal to the total hindcast + forecast time ranges.

The forecast sequence length forecast_seq_length config parameter must be equal to the overlapping portion of the hindcast and forecast models plus the forecast time period.

The forecast_overlap config parameter must be set to the correct overlap between these two sequences.

For example, if we want to use a spinup period of 365 days to make a 7-day forecast, then seq_length must be set to 372 (=365+7), forecast_seq_length must be set to 365, and forecast_overlap must be set to 358 (=365-7).

Outputs from the hindcast LSTM are concatenated to the input sequences to the forecast LSTM and shifted in time by the forecast horizon (7 days in the example above. This causes a lag between the latest hindcast data and the newest forecast time point, meaning that forecasts do not get information from the most recent hindcast inputs. To solve this, set the bidirectional_stacked_forecast_lstm config parameter to True, so that the hindcast LSTM runs bidirectional and therefore all outputs from the hindcast LSTM receive information from the most recent hindcast input data.

Implementing a new model

The listing below shows the skeleton of a template model you can use to start implementing your own model. Once you have implemented your model, make sure to modify neuralhydrology.modelzoo.__init__.get_model(). Furthermore, make sure to select a unique model abbreviation that will be used to specify the model in the config.yml files.

from typing import Dict

import torch

from neuralhydrology.modelzoo.basemodel import BaseModel


class TemplateModel(BaseModel):

    def __init__(self, cfg: dict):
        """Initialize the model

        Each model receives as only input the config dictionary. From this, the entire model has to be implemented in
        this class (with potential use of other modules, such as FC from fc.py). So this class will get the model inputs
        and has to return the predictions.

        Each Model inherits from the BaseModel, which implements some universal functionality. The basemodel also
        defines the output_size, which can be used here as a given attribute (self.output_size)

        Parameters
        ----------
        cfg : dict
            Configuration of the run, read from the config file with some additional keys (such as number of basins).
        """
        super(TemplateModel, self).__init__(cfg=cfg)

        ###########################
        # Create model parts here #
        ###########################

    def forward(self, data: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
        """Forward pass through the model

        By convention, each forward pass has to accept a dict of input tensors. Usually, this dict contains 'x_d' and,
        possibly, x_s and x_one_hot. If x_d and x_s are available at multiple frequencies, the keys 'x_d' and 'x_s'
        have frequency suffixes such as 'x_d_1h' for hourly data.
        Furthermore, by definition, each model has to return a dict containing the network predictions in 'y_hat',
        potentially in addition to other keys. LSTM-based models should stick to the convention to return (at least)
        the following three tensors: y_hat, h_n, c_n (or, in the multi-frequency case, y_hat_1h, y_hat_1D, etc.).

        Parameters
        ----------
        data : Dict[str, torch.Tensor]
             Dictionary with tensors
                - x_d of shape [batch size, sequence length, features] containing the dynamic input data.
                - x_s of shape [batch size, features] containing static input features. These are the concatenation
                    of what is defined in the config under static_attributes and evolving_attributes. In case not a single
                    camels attribute or static input feature is defined in the config, x_s will not be present.
                - x_one_hot of shape [batch size, number of basins] containing the one hot encoding of the basins.
                    In case 'use_basin_id_encoding' is set to False in the config, x_one_hot will not be present.
                Note: If the input data are available at multiple frequencies (via use_frequencies), each input tensor
                    will have a suffix "_{freq}" indicating the tensor's frequency.

        Returns
        -------
        The network prediction has to be returned under the dictionary key 'y_hat' (or, if multiple frequencies are
        predicted, 'y_hat_{freq}'. Furthermore, make sure to return predictions for each time step, even if you want
        to train sequence-to-one. Which predictions are used for training the network is controlled in the train_epoch()
        function in neuralhydrology/training/basetrainer.py. Other return values should be the hidden states as 'h_n' and cell
        states 'c_n'. Further return values are possible.
        """
        ###############################
        # Implement forward pass here #
        ###############################
        pass