Adding a New Model

This tutorial shows how you can add a new model to the NeuralHydrology modelzoo. As an example, we’ll implement a GRU.

The tutorial is rendered from a Jupyter notebook that is hosted on GitHub. If you want to run the code yourself, you can find the notebook here.

[1]:

import inspect
from typing import Dict

import torch
from torch import nn

from neuralhydrology.modelzoo import get_model
from neuralhydrology.modelzoo.head import get_head
from neuralhydrology.modelzoo.basemodel import BaseModel
from neuralhydrology.utils.config import Config

Template

Every model has its own file in neuralhydrology.modelzoo and follows a common template that you can find here.

The most important points about these templates are:

All models inherit from the BaseModel that’s implemented in neuralhydrology.modelzoo.basemodel.
All models’ constructors take just one argument, an instance of the configuration class (Config). The constructor initializes the model and its components.
Finally, each model implements its own logic in the forward method. This is where the actual magic happens: The forward method takes the input data during training and evaluation and uses it to generate a prediction.

In the following steps, we’ll go over the constructor and the forward method in more detail.

Adding a GRU Model

So, let’s follow that template and add a GRU model. Fortunately, there already exists a GRU implementation in the PyTorch libary, so we can wrap our code around that existing model. This way, we can be pretty sure to get a correct and reasonably fast implementation without much effort.

For the sake of brevity, we’ll omit the docstrings in this example. If you actually implement a model for production use, you should always write the documentation right within your code.

GRU Components

Every model’s constructor receives a single argument: an instance of the run configuration. Based on this config, we’ll construct the GRU.

Like most our models, the GRU will consist of three components:

An optional input layer that acts as an embedding network for static or dynamic features. If used, the features will be passed through a fully-connected network before we pass them to the actual GRU. If no embedding is specified, this layer will do nothing.
The “body” that represents the actual GRU cell.
The “head” that acts as a final output layer.

To maintain a modular architecture, the input and head layers should not be implemented inside the model. Instead, we should use the InputLayer in neuralhydrology.modelzoo.inputlayer and the get_head function in neuralhydrology.modelzoo.head which will automatically construct layers that fit to the run configuration.

[2]:

class GRU(BaseModel):

    # specify submodules of the model that can later be used for finetuning. Names must match class attributes
    module_parts = ['embedding_net', 'gru', 'head']

    def __init__(self, cfg: Config):

        super(GRU, self).__init__(cfg=cfg)

        # retrieve the input layer
        self.embedding_net = InputLayer(cfg)

        # create the actual GRU
        self.gru = nn.GRU(input_size=self.embedding_net.output_size, hidden_size=cfg.hidden_size)

        # add dropout between GRU and head
        self.dropout = nn.Dropout(p=cfg.output_dropout)

        # retrieve the model head
        self.head = get_head(cfg=cfg, n_in=cfg.hidden_size, n_out=self.output_size)

Implementing the Forward Pass

Now we have a class called GRU, but we haven’t yet told the model how to process incoming data. That’s what we do in the forward method.

By convention, our models’ forward method accepts and returns dictionaries that map names (strings) to tensors. The input dictionary (data) usually contains at least a key ‘x_d’ and possibly ‘x_s’ and ‘x_one_hot’. We say “usually”, because models that support simultaneous prediction at multiple timescales (e.g., MTS-LSTM) will get one ‘x_d’ and ‘x_s’ for each timescale, suffixed with the frequency identifier (e.g., ‘x_d_1h’ for hourly dynamic inputs). ‘x_s’ and ‘x_one_hot’ are tensors by themselves; ‘x_d’ is again a dictionary that maps each dynamic input feature name to the corresponding tensor.

But for this example, let’s assume a single-timescale model. Let’s dive deeper into what each of the input values contain:

Key	Shape	Description
‘x_d’	`{k: [batch size, sequence length, features]}`	dictionary of the dynamic input data tensors.
‘x_s’	`[batch size, features]`	static input features. These are the concatenation of what is defined in the run configuration under ‘static_attributes’ and ‘evolving_attributes’. If not a single static or evolving attribute is defined in the config, ‘x_s’ will not be present.
‘x_one_hot’	`[batch size, number of basins]`	one-hot encoding of the basins. If ‘use_basin_id_encoding’ is set to False in the run configuration, ‘x_one_hot’ will not be present.

Now, given these input data we’re supposed to generate a prediction that we return as ‘y_hat’ (multi-timescale models would return ‘y_hat_1h’, …). The returned ‘y_hat’ should contain a prediction for the full input sequence (not just the last element), even if you’re using sequence-to-one prediction. The loss will sort out which of these predictions actually need to be used in the current training configuration. All models should at least return ‘y_hat’, but we can return any other potentially useful information. In our case, we can additionally return the final hidden state that we’ll receive from the PyTorch GRU implementation. The naming convention for hidden states is to call them ‘h_n’.

So, here we go:

[ ]:

def forward(self, data: dict[str, torch.Tensor | dict[str, torch.Tensor]]) -> Dict[str, torch.Tensor]:

    # possibly pass dynamic and static inputs through embedding layers, then concatenate them
    x_d = self.embedding_net(data, concatenate_output=True)

    # run the actual GRU
    gru_output, h_n = self.gru(input=x_d)

    # reshape to [batch_size, 1, n_hiddens]
    h_n = h_n.transpose(0, 1)

    pred = {'h_n': h_n}

    # add the final output as it's returned by the head to the prediction dict
    # (this will contain the 'y_hat')
    pred.update(self.head(self.dropout(gru_output.transpose(0, 1))))

    return pred

# usually, we'd implement the forward pass right where we define the class.
# For this tutorial, we've broken it down into the constructor and the forward pass,
# so now we'll just add the forward method to the GRU class:
GRU.forward = forward

As you see, much of the heavy lifting is being done by existing methods, so we just have to wire everything up. The input layer merges the static inputs (data['x_s'] and/or data['x_one_hot']) to each step of the dynamic inputs (data['x_d']) and returns a single tensor that we can pass to the GRU cell.

Using the Model

That’s it! We now have a working GRU model that we can use to train and evaluate models. The only thing left is registering the model in the get_model method of neuralhydrology.modelzoo to make sure we can specify the model in a run configuration.

Since GRU already exists in the modelzoo, it’s already there:

[4]:

print(inspect.getsource(get_model))

def get_model(cfg: Config) -> nn.Module:
    """Get model object, depending on the run configuration.

    Parameters
    ----------
    cfg : Config
        The run configuration.

    Returns
    -------
    nn.Module
        A new model instance of the type specified in the config.
    """
    if cfg.model in SINGLE_FREQ_MODELS and len(cfg.use_frequencies) > 1:
        raise ValueError(f"Model {cfg.model} does not support multiple frequencies.")

    if cfg.model == "cudalstm":
        model = CudaLSTM(cfg=cfg)
    elif cfg.model == "ealstm":
        model = EALSTM(cfg=cfg)
    elif cfg.model == "customlstm":
        model = CustomLSTM(cfg=cfg)
    elif cfg.model == "lstm":
        warnings.warn(
            "The `LSTM` class has been renamed to `CustomLSTM`. Support for `LSTM` will we dropped in the future.",
            FutureWarning)
        model = CustomLSTM(cfg=cfg)
    elif cfg.model == "gru":
        model = GRU(cfg=cfg)
    elif cfg.model == "embcudalstm":
        model = EmbCudaLSTM(cfg=cfg)
    elif cfg.model == "mtslstm":
        model = MTSLSTM(cfg=cfg)
    elif cfg.model == "odelstm":
        model = ODELSTM(cfg=cfg)
    else:
        raise NotImplementedError(f"{cfg.model} not implemented or not linked in `get_model()`")

    return model

Since GRU is registered as a model, you can now specify model: gru in the run configuration and use the model, just like any other. For an example of training and evaluating a model, take a look at the introduction tutorial.