An mixture density network with Laplace distributions as components.
The CMAL-head uses an additional hidden layer to give it more expressiveness (same as the GMM-head).
CMAL is better suited for many hydrological settings as it handles asymmetries with more ease. However, it is also
more brittle than GMM and can more often throw exceptions. Details for CMAL can be found in [1].
Parameters:
n_in (int) – Number of input neurons.
n_out (int) – Number of output neurons. Corresponds to 4 times the number of components.
x (torch.Tensor) – Output of the previous model part. It provides the basic latent variables to compute the CMAL components.
Returns:
Dictionary, containing the mixture component parameters and weights; where the key ‘mu’stores the means,
the key ‘b’ the scale parameters, the key ‘tau’ the skewness parameters, and the key ‘pi’ the weights).
A mixture density network with Gaussian distribution as components. Good references are [2] and [3]. The latter
one forms the basis for our implementation. As such, we also use two layers in the head to provide it with
additional flexibility, and exponential activation for the variance estimates and a softmax for weights.
Parameters:
n_in (int) – Number of input neurons.
n_out (int) – Number of output neurons. Corresponds to 3 times the number of components.
x (torch.Tensor) – Output of the previous model part. It provides the basic latent variables to compute the GMM components.
Returns:
Dictionary containing mixture parameters and weights; where the key ‘mu’ stores the means, the key
‘sigma’ the variances, and the key ‘pi’ the weights.
Single-layer regression head with different output activations.
Parameters:
n_in (int) – Number of input neurons.
n_out (int) – Number of output neurons.
activation (str, optional) – Output activation function. Can be specified in the config using the output_activation argument. Supported
are {‘linear’, ‘relu’, ‘softplus’}. If not specified (or an unsupported activation function is specified), will
default to ‘linear’ activation.
An implicit approximation to the mixture density network with Laplace distributions which does not require to
pre-specify the number of components. An additional hidden layer is used to provide the head more expressiveness.
General details about UMAL can be found in [4]. A major difference between their implementation
and ours is the binding-function for the scale-parameter (b). The scale needs to be lower-bound. The original UMAL
implementation uses an elu-based binding. In our experiment however, this produced under-confident predictions
(too large variances). We therefore opted for a tailor-made binding-function that limits the scale from below and
above using a sigmoid. It is very likely that this needs to be adapted for non-normalized outputs.
Parameters:
n_in (int) – Number of input neurons.
n_out (int) – Number of output neurons. Corresponds to 2 times the output-size, since the scale parameters are also predicted.