samplingutils

neuralhydrology.utils.samplingutils.bernoulli_subseries_sampler(data: ndarray, missing_fraction: float, mean_missing_length: float, start_sampling_on: bool = True) → ndarray

Samples a timeseries according to a pair of Bernoulli processes.

The objective is to sample subsequences of a given timeseries under two criteria:

Expected long-term sample ratio (i.e., the total fraction of points in a time series that are not sampled): missing_fraction.
Expected length of continuous subsequences sampled from the timeseries: mean_missing_length.

This is done by sampling two Bernoulli processes with different rate parameters. One process samples on-shifts and one process samples off-shifts. An ‘on-shift’ occurs when the state of the sampler transitions from ‘off’ (not sampling) to ‘on’ (sampling), and vice-versa. The rate parameters for the on-shift and off-shift processes are derived from the input parameters explained above.

Parameters:

data (np.ndarray) – Time series data to be sampled. Must be (N, 1) where N is the length of the timeseries.
missing_fraction (float) – Expected total fraction of points in a time series that are not sampled.
mean_missing_length (float) – Expected length of continuous subsequences of un-sampled data from the timeseries.
start_sampling_on (bool) – Whether to start with the sampler turned “on” (True) or “off” (False) at the first timestep of the timeseries.

Return type:

A copy of the timeseries with NaN’s replacing elements that were not sampled.

neuralhydrology.utils.samplingutils.sample_cmal(model: BaseModel, data: Dict[str, Tensor], n_samples: int, scaler: Dict[str, pandas.Series | xarray.Dataset]) → Dict[str, Tensor]

Sample point predictions with the Countable Mixture of Asymmetric Laplacians (CMAL) head.

This function generates n_samples CMAL sample points for each entry in the batch. Concretely, the model is executed once (forward pass) and then the sample points are generated by sampling from the resulting mixtures. General information about CMAL can be found in [1].

The negative sample handling currently supports (a) ‘clip’ for directly clipping sample_points at zero and (b) ‘truncate’ for resampling sample_points that are below zero. The mode can be defined by the config argument ‘negative_sample_handling’.

Note: If the config setting ‘mc_dropout’ is true this function will force the model to train mode (model.train()) and not set it back to its original state.

Parameters:

model (BaseModel) – A model with a CMAL head.
data (Dict[str, torch.Tensor]) – Dictionary, containing input features as key-value pairs.
n_samples (int) – Number of samples to generate for each input sample.
scaler (Dict[str, Union[pd.Series, xarray.Dataset]]) – Scaler of the run.

Returns:

Dictionary, containing the sampled model outputs for the predict_last_n (config argument) time steps of each frequency. The shape of the output tensor for each frequency is [batch size, predict_last_n, n_samples].

Return type:

Dict[str, torch.Tensor]

References

neuralhydrology.utils.samplingutils.sample_gmm(model: BaseModel, data: Dict[str, Tensor], n_samples: int, scaler: Dict[str, pandas.Series | xarray.Dataset]) → Dict[str, Tensor]

Sample point predictions with the Gaussian Mixture (GMM) head.

This function generates n_samples GMM sample points for each entry in the batch. Concretely, the model is executed once (forward pass) and then the sample points are generated by sampling from the resulting mixtures. Good references for learning about GMMs are [2] and [3].

The negative sample handling currently supports (a) ‘clip’ for directly clipping sample_points at zero and: (b) ‘truncate’ for resampling sample_points that are below zero. The mode can be defined by the config argument ‘negative_sample_handling’.

Note: If the config setting ‘mc_dropout’ is true this function will force the model to train mode (model.train()) and not set it back to its original state.

Parameters:

model (BaseModel) – A model with a GMM head.
data (Dict[str, torch.Tensor]) – Dictionary, containing input features as key-value pairs.
n_samples (int) – Number of samples to generate for each input sample.
scaler (Dict[str, Union[pd.Series, xarray.Dataset]]) – Scaler of the run.

Returns:

Dictionary, containing the sampled model outputs for the predict_last_n (config argument) time steps of each frequency.

Return type:

Dict[str, torch.Tensor]

References

neuralhydrology.utils.samplingutils.sample_mcd(model: BaseModel, data: Dict[str, Tensor], n_samples: int, scaler: Dict[str, pandas.Series | xarray.Dataset]) → Dict[str, Tensor]

MC-Dropout based point predictions sampling.

Naive sampling. This function does n_samples forward passes for each sample in the batch. Currently it is only useful for models with dropout, to perform MC-Dropout sampling. Note: Calling this function will force the model to train mode (model.train()) and not set it back to its original state.

The negative sample handling currently supports (a) ‘clip’ for directly clipping sample_points at zero and (b) ‘truncate’ for resampling sample_points that are below zero. The mode can be defined by the config argument ‘negative_sample_handling’.

Parameters:

model (BaseModel) – A model with a non-probabilistic head.
data (Dict[str, torch.Tensor]) – Dictionary, containing input features as key-value pairs.
n_samples (int) – Number of samples to generate for each input sample.
scaler (Dict[str, Union[pd.Series, xarray.Dataset]]) – Scaler of the run.

Returns:

Dictionary, containing the sampled model outputs for the predict_last_n (config argument) time steps of each frequency.

Return type:

Dict[str, torch.Tensor]

neuralhydrology.utils.samplingutils.sample_pointpredictions(model: BaseModel, data: Dict[str, Tensor], n_samples: int, scaler: Dict[str, pandas.Series | xarray.Dataset]) → Dict[str, Tensor]

Point prediction samplers for the different uncertainty estimation approaches.

This function provides different point sampling functions for the different uncertainty estimation approaches (i.e. Gaussian Mixture Models (GMM), Countable Mixtures of Asymmetric Laplacians (CMAL), Uncountable Mixtures of Asymmetric Laplacians (UMAL), and Monte-Carlo Dropout (MCD); note: MCD can be combined with the others, by setting mc_dropout to True in the configuration file).

There are also options to handle negative point prediction samples that arise while sampling from the uncertainty estimates. This functionality currently supports (a) ‘clip’ for directly clipping values at zero and (b) ‘truncate’ for resampling values that are below zero.

Parameters:

model (BaseModel) – The neuralhydrology model from which to sample from.
data (Dict[str, torch.Tensor]) – Dictionary, containing input features as key-value pairs.
n_samples (int) – The number of point prediction samples that should be created.
scaler (Dict[str, Union[pd.Series, xarray.Dataset]]) – Scaler of the run.

Returns:

Dictionary, containing the sampled model outputs for the predict_last_n (config argument) time steps of each frequency.

Return type:

Dict[str, torch.Tensor]

neuralhydrology.utils.samplingutils.sample_umal(model: BaseModel, data: Dict[str, Tensor], n_samples: int, scaler: Dict[str, pandas.Series | xarray.Dataset]) → Dict[str, Tensor]

Sample point predictions with the Uncountable Mixture of Asymmetric Laplacians (UMAL) head.

This function generates n_samples UMAL sample points for each entry in the batch. Concretely, the model is executed once (forward pass) and then the sample points are generated by sampling from the resulting mixtures. Details about the UMAL approach can be found in [4].

The negative sample handling currently supports (a) ‘clip’ for directly clipping sample_points at zero and (b) ‘truncate’ for resampling sample_points that are below zero. The mode can be defined by the config argument ‘negative_sample_handling’.

Note: If the config setting ‘mc_dropout’ is true this function will force the model to train mode (model.train()) and not set it back to its original state.

Parameters:

model (BaseModel) – A model with an UMAL head.
data (Dict[str, torch.Tensor]) – Dictionary, containing input features as key-value pairs.
n_samples (int) – Number of samples to generate for each input sample.
scaler (Dict[str, Union[pd.Series, xarray.Dataset]]) – Scaler of the run.

Returns:

Dictionary containing the sampled model outputs for the predict_last_n (config argument) time steps of each frequency.

Return type:

Dict[str, torch.Tensor]

References

neuralhydrology.utils.samplingutils.umal_extend_batch(data: Dict[str, Tensor], cfg: Config, n_taus: int = 1, extend_y: bool = False) → Dict[str, Tensor]

This function extends the batch for the usage in UMAL (see: [5]).

UMAL makes an MC approximation to a mixture integral by sampling random asymmetry parameters (tau). This can be parallelized by expanding the batch for each tau.

Parameters:

data (Dict[str, torch.Tensor]) – Dictionary, containing input features as key-value pairs.
cfg (Config) – The run configuration.
n_taus (int) – Number of taus to expand the batch.
extend_y (bool) – Option to also extend the labels/y.

Returns:

Dictionary, containing expanded input features and tau samples as key-value pairs.

Return type:

Dict[str, torch.Tensor]

References