Configuration Arguments
This page provides a list of possible configuration arguments. Check out the file examples/config.yml.example for an example of how a config file could look like.
General experiment configurations
experiment_name: Defines the name of your experiment that will be used as a folder name (+ date-time string), as well as the name in TensorBoard. Curly brackets insert other configuration arguments into the experiment name. For example, the argumentexperiment_name: batch-size-is-{batch_size}will yieldbatch-size-is-42if thebatch_sizeargument is set to 42. Furthermore, the special expression{random_name}(in the experiment name) provides a randomly named string (e.g.,yellow-frog) that can make experimental runs easier to recognize.run_dir: Full or relative path to where the run directory is stored (if empty runs are stored in ${current_working_dir}/runs/)train_basin_file: Full or relative path to a text file containing the training basins (use data set basin id, one id per line).validation_basin_file: Full or relative path to a text file containing the validation basins (use data set basin id, one id per line).test_basin_file: Full or relative path to a text file containing the training basins (use data set basin id, one id per line).train_start_date: Start date of the training period (first day of discharge) in the formatDD/MM/YYYY. Can also be a list of dates to specify multiple training periods. If a list is specified,train_end_datemust also be a list of equal length. Corresponding pairs of start and end date denote the different periods.train_end_date: End date of the training period (last day of discharge) in the formatDD/MM/YYYY. Can also be a list of dates. If a list is specified, alsotrain_start_datemust be a list with an equal length.validation_start_date: Start date of the validation period (first day of discharge) in the formatDD/MM/YYYY. Can also be a list of dates (similar to train period specifications).validation_end_date: End date of the validation period (last day of discharge) in the formatDD/MM/YYYY. Can also be a list of dates (similar to train period specifications).test_start_date: Start date of the test period (first day of discharge) in the formatDD/MM/YYYY. Can also be a list of dates (similar to train period specifications).test_end_date: End date of the validation period (last day of discharge) in the formatDD/MM/YYYY. Can also be a list of dates (similar to train period specifications).per_basin_train_periods_file: Alternatively to specifying a global train period for all basins (usingtrain_start_dateandtrain_end_date) it is also possible to use individual periods for each basin. For that to work you have to create a dictionary with one key per basin id. For each basin id, the dictionary contains two keysstart_datesandend_dates, which contain a list of pandas TimeStamps that indicate the start and end dates of the train periods.start_datesandend_dateshave to be a list, even in case of a single period per basin (see example below). Then use the pickle library, to store this dictionary to disk and use the path to this pickle file as the value for this config argument.
import pandas as pd
dates = {
'basin_a': {
'start_dates': [pd.to_datetime('01/01/1980')],
'end_dates': [pd.to_datetime('31/12/1999')]
},
'basin_b': {
'start_dates': [pd.to_datetime('01/01/1980'), pd.to_datetime('01/01/2000')],
'end_dates': [pd.to_datetime('31/12/1990'), pd.to_datetime('01/01/2005')]
}
}
per_basin_validation_periods_file: Same asper_basin_train_periods_filebut indicating individual periods that are used as validation periods.per_basin_test_periods_file: Same asper_basin_train_periods_filebut indicating individual periods that are used as test periods.seed: Fixed random seed. If empty, a random seed is generated for this run.device: Which device to use in format ofcuda:0,cuda:1, etc, for GPUs,mpsfor MacOS with Metal support, orcpu
Validation settings
validate_every: Integer that specifies in which interval a validation is performed. If empty, no validation is done during training.validate_n_random_basins: Integer that specifies how many random basins to use per validation. Values larger n_basins are clipped to n_basins.metrics: List of metrics to calculate during validation/testing. Seeneuralhydrology.evaluation.metricsfor a list of available metrics. Can also be a dictionary of lists, where each key-value pair corresponds to one target variable and the metrics that should be computed for this target. Like this, it is possible to compute different metrics per target during validation and/or evaluation after the training.save_validation_results: True/False, if True, stores the validation results to disk as a pickle file. Otherwise they are only used for TensorBoard. This is different thansave_all_validation_outputin that only the predictive outputs are saved, and not all of the model output features.save_all_validation_output: True/False, if True, stores all model outputs from the validation runs to disk as a pickle file. Defaults to False. This differs fromsave_validation_results, in that here all model output is saved. This can result in files that are quite large. Predictions in this file will not be scaled. This is the raw output from the model.
General model configuration
model: Defines the model class, i.e. the core of the model, that will be used. Names have to match the values in this function, e.g., [cudalstm,ealstm,mtslstm]head: The prediction head that is used on top of the output of the model class. Currently supported areregression,gmm,cmal, andumal. Make sure to pass the necessary options depending on your choice of the head (see below).hidden_size: Hidden size of the model class. In the case of an LSTM, this reflects the number of LSTM states.initial_forget_bias: Initial value of the forget gate bias.output_dropout: Dropout applied to the output of the LSTM
Regression head
Can be ignored if head != 'regression'
output_activation: Which activation to use on the output neuron(s) of the linear layer. Currently supported arelinear,relu,softplus. If empty,linearis used.mc_dropout: True/False. Wheter Monte-Carlo dropout is used to sample during inference.
GMM head
Can be ignored if head != 'gmm'
n_distributions: The number of distributions used for the GMM head.n_samples: Number of samples generated (per time-step) from GMM.negative_sample_handling: How to account for negative samples. Possible values arenonefor doing nothing,clipfor clipping the values at zero, andtruncatefor resampling values that were drawn below zero. If the last option is chosen, the additional argumentnegative_sample_max_retriescontrols how often the values are resampled.negative_sample_max_retries: The number of repeated samples for thetruncateoption of thenegative_sample_max_retriesargument.mc_dropout: True/False. Whether Monte-Carlo dropout is used to sample during inference.
CMAL head
Can be ignored if head != 'cmal'
n_distributions: The number of distributions used for the CMAL head.n_samples: Number of samples generated (per time-step) from CMAL.negative_sample_handling: Approach for handling negative sampling. Possible values arenonefor doing nothing,clipfor clipping the values at zero, andtruncatefor resampling values that were drawn below zero. If the last option is chosen, the additional argumentnegative_sample_max_retriescontrols how often the values are resampled.negative_sample_max_retries: The number of repeated samples for thetruncateoption of thenegative_sample_max_retriesargument.mc_dropout: True/False. Whether Monte-Carlo dropout is used to sample during inference.
UMAL head
Can be ignored if head != 'umal'
n_taus: The number of taus sampled to approximate the uncountable distributions.umal_extend_batch: True/False. Whether the batches should be extendedn_taustimes, to account for a specific approximation density already during the training.tau_downThe lower sampling bound of asymmetry parameter (should be above 0, below 1 and smaller thantau_up).tau_upThe upper sampling bound of asymmetry parameter (should be above 0, below 1 and larger thantau_down).n_samples: Number of samples generated (per time-step) from UMAL.negative_sample_handling: Approach for handling negative sampling. Possible values arenonefor doing nothing,clipfor clipping the values at zero, andtruncatefor resampling values that were drawn below zero. If the last option is chosen, the additional argumentnegative_sample_max_retriescontrols how often the values are resampled.negative_sample_max_retries: The number of repeated samples for thetruncateoption of thenegative_sample_max_retriesargument.mc_dropout: True/False. Whether Monte-Carlo dropout is used to sample during inference.
Multi-timescale training settings
These are used if model == mtslstm.
transfer_mtslstm_states: Specifies if and how hidden and cell states are transferred from lower to higher frequencies. This configuration should be a dictionary with keysh(hidden state) andc(cell state). Possible values are[None, linear, identity]. Iftransfer_mtslstm_statesis not provided or empty, the default is linear transfer.shared_mtslstm: If False, will use a distinct LSTM with individual weights for each timescale. If True, will use a single LSTM for all timescales and use one-hot-encoding to identify the current input timescale. In both cases,transfer_mtslstm_statescan be used to configure hidden and cell state transfer.
Mamba settings
These are used if model == mamba.
mamba_d_conv: Local convolution widthmamba_d_state: State Space Model state expansion factormamba_expand: Block expansion factor
Transformer settings
These are used if model == transformer.
transformer_nlayers: Number of multi-head self-attention layers in the transformer encoder.transformer_positional_encoding_type: Choices are[sum, concatenate]. Used to change the way that the positional encoding is used in transformer embedding layer. sum means that the positional encoding is added to the values of the inputs for that layer, while concatenate means that the embedding is concatenated as additional input features.transformer_dim_feedforward: Dimension of dense layers used between self-attention layers in transformer encoder.transformer_positional_dropout: Dropout applied only to the positional encoding before using in transformer encoder.transformer_dropout: Dropout used in transformer encoder layers.transformer_nhead: Number of parallel transformer heads.
XLSTM settings
These are used if model == x_lstm.
xlstm_num_blocks: number of stacked xLSTM blocksxlstm_slstm_at: indices of blocks of scalar-memoryxlstm_heads: number of headsxlstm_kernel_size: convolutional kernel sizexlstm_proj_factor: projection factor
ODE-LSTM settings
These are used if model == odelstm.
ode_method: Method to use to solve the ODE. One of[euler, rk4, heun].ode_num_unfolds: Number of iterations to break each ODE solving step into.ode_random_freq_lower_bound: Lowest frequency that will be used to randomly aggregate the first slice of the input sequence. See the documentation of the ODELSTM class for more details on the frequency randomization.
MC-LSTM settings
These are used if model == mclstm.
mass_inputs: List of features that are used as mass input in the MC-LSTM model, i.e. whose quantity is conserved over time. Currently, the MC-LSTM configuration implemented here only supports a single mass input. Make sure to exclude this feature from the default normalization (see MC-LSTM description).
Hybrid-Model settings
These are used if model == hybrid_model.
conceptual_model: Name of the hydrological conceptual model that is used together with a data-driven method to create the hybrid model e.g., [SHM].dynamic_conceptual_inputs: List of features that are used as input in the conceptual part of the hybrid model.warmup_period: Number of time steps (e.g. days) before the information produced by the data-driven part is used in the conceptual model.
Handoff Forecast Model settings
forecast_hidden_size: Integer hidden size for the forecast (decoder) LSTM. This will default tohidden_size.hindcast_hidden_size: Integer hidden size for the hindcast (encoder) LSTM. This will default tohidden_size.state_handoff_network: Embedding network that defines the handoff of the cell state and hidden state from the hindcast LSTM to the forecast LSTM.forecast_overlap: An integer number of timesteps where forecast data overlaps with hindcast data. This does not add to theforecast_sequence_length, and must be no larger than theforecast_sequence_length.
Multihead Forecast Model settings
forecast_network: Fully coupled network with one or multiple layers (this is an Embedding Network type, see documentation herein) that defines the forecast (decoder) portion of the multi-head (non-rollout) forecast model.
Stacked Forecast Model settings
bidirectional_stacked_forecast_lstm: Whether or not the hindcast LSTM in a stacked forecast model should be bidirectional.forecast_hidden_size: Integer hidden size for the forecast (decoder) LSTM. This will default tohidden_size.hindcast_hidden_size: Integer hidden size for the hindcast (encoder) LSTM. This will default tohidden_size.
Embedding network settings
These settings define fully connected networks that are used in various places, such as the embedding network for static or dynamic features in the single-frequency models or as an optional extended input gate network in the EA-LSTM model. For multi-timescale models, these settings can be ignored.
statics_embedding: None (default) or a dict that defines the embedding network for static inputs.The dictionary can have the following keys:
type(default ‘fc’): Type of the embedding net. Currently, only ‘fc’ for fully-connected net is supported.hiddens: List of integers that define the number of neurons per layer in the fully connected network. The last number is the number of output neurons. Must have at least length one.activation(default ‘tanh’): activation function of the network. Supported values are: ‘tanh’, ‘sigmoid’, ‘linear’, and ‘relu’. The activation function is not applied to the output neurons, which always have a linear activation function. An activation function for the output neurons has to be applied in the main model class.dropout(default 0.0): Dropout rate applied to the embedding network.
Note that for EA-LSTM, there will always be an additional linear layer that maps to the EA-LSTM’s hidden size. This means that the the embedding layer output size does not have to be equal to
hidden_size.dynamics_embedding: None (default) or a dict that defines the embedding network for dynamic inputs. Seestatics_embeddingfor a description of the dictionary structure.
Training settings
optimizer: Specify which optimizer to use. Currently supported are Adam and AdamW. New optimizers can be addedhere.loss: Which loss to use. Currently supported areMSE,NSE,RMSE,GMMLoss,CMALLoss, andUMALLoss. New losses can be addedhere.allow_subsequent_nan_losses: Define a number of training steps forwhich a loss value of
NaNis ignored and no error is raised but instead the training loop proceeds to the next iteration step.
target_loss_weights: A list of float values specifying the per-target loss weight, when training on multiple targets at once. Can be combined with any loss. By default, the weight of each target is1/nwithnbeing the number of target variables. The order of the weights corresponds to the order of thetarget_variables.regularization: List of strings or 2-tuples with regularization terms and corresponding weights. If no weights are specified, they default to 1. Currently, two reqularizations are supported: (1)tie_frequencies, which couples the predictions of all frequencies via an MSE term, and (2)forecast_overlap, which couples overlapping sequences between hindcast and forecast models. New regularizations can be addedhere.learning_rate: Learning rate. Can be either a single number (for a constant learning rate) or a dictionary. If it is a dictionary, the keys must be integer that reflect the epochs at which the learning rate is changed to the corresponding value. The key0defines the initial learning rate.batch_size: Mini-batch size used for training.epochs: Number of training epochs.max_updates_per_epoch: Maximum number of weight updates per training epoch. Leave unspecified to go through all data in every epoch.use_frequencies: Defines the time step frequencies to use (daily, hourly, …). Use pandas frequency strings to define frequencies. Note: The strings need to include values, e.g., ‘1D’ instead of ‘D’. If used,predict_last_nandseq_lengthmust be dictionaries.no_loss_frequencies: Subset of frequencies fromuse_frequenciesthat are “evaluation-only”, i.e., the model will get input and produce output in the frequencies listed here, but they will not be considered in the calculation of loss and regularization terms.seq_length: Length of the input sequence. Ifuse_frequenciesis used, this needs to be a dictionary mapping each frequency to a sequence length, else an int.forecast_seq_length: Length of the forecast sequence. This is the number of timesteps from the totalseq_lengththat are part of the forecast rather than the hindcast. Note that this does not add to the totalseq_length, and thus, the forecast sequence length must be less than the total sequence length.forecast_overlap: An integer number of timesteps where forecast data overlaps with hindcast data. This does not add to theforecast_sequence_length, and must be no larger than theforecast_sequence_length. This is used forForecastOverlapMSERegularizationin thehandoff_forecast_modeland for thestacked_lstmmodel.predict_last_n: Defines which time steps are used to calculate the loss, counted backwards. Can’t be larger thanseq_length. Sequence-to-one would bepredict_last_n: 1and sequence-to-sequence (with e.g. a sequence length of 365)predict_last_n: 365. Ifuse_frequenciesis used, this needs to be a dictionary mapping each frequency to a predict_last_n-value, else an int.target_noise_std: Defines the standard deviation of gaussian noise which is added to the labels during training. Set to zero or leave empty to not add noise.clip_gradient_norm: If a value, clips norm of gradients to that specific value during training. Leave empty for not clipping.num_workers: Number of (parallel) threads used in the data loader.save_weights_every: Interval, in which the weights of the model are stored to disk.1means to store the weights after each epoch, which is the default if not otherwise specified.
Finetune settings
Ignored if mode != finetune
finetune_modules: List of model parts that will be trained during fine-tuning. All parts not listed here will not be updated. Check the documentation of each model to see a list of available module parts.
Logger settings
log_interval: Interval at which the training loss is logged, by default 10.log_tensorboard: True/False. If True, writes logging results into TensorBoard file. The default, if not specified, is True.log_n_figures: If a (integer) value greater than 0, saves the predictions as plots of that n specific (random) basins during validations.save_git_diff: If set to True and NeuralHydrology is a git repository with uncommitted changes, the git diff will be stored in the run directory. When using this option, make sure that your run and data directories are either not located inside the git repository, or that they are part of the.gitignorefile. Otherwise, the git diff may become very large and use up a lot of disk space. To make sure everything is configured correctly, you can simply check that the output ofgit diff HEADonly contains your code changes.
Data settings
dataset: Defines which data set will be used. Currently supported arecamels_us(CAMELS (US) data set by Newman et al.),camels_gb(CAMELS-GB by Coxon et al.),camels_cl(CAMELS-CL by Alvarez-Garreton et al.),camels_br(CAMELS-BR by Chagas et al.),camels_aus(CAMELS-AUS by Fowler et al.),lamah_{a,b,c}(LamaH-CE by Klingler et al.),hourly_camels_us(hourly forcing and streamflow data for 516 CAMELS (US) basins, published by Gauch et al.), andgeneric(can be used with any dataset that is stored in a specific format, seedocumentationfor further informations).data_dir: Full or relative path to the root directory of the data set.train_data_file: If not empty, uses the pickled file at this path as the training data. Can be used to not create the same data set multiple times, which saves disk space and time. If empty, creates new data set and optionally stores the data in the run directory (ifsave_train_datais True).cache_validation_data: True/False. If True, caches validation data in memory for the time of training, which does speed up the overall training time. By default True, since even larger datasets are usually just a few GB in memory, which most modern machines can handle.dynamic_inputs: List of variables to use as time series inputs. Names must match the exact names as defined in the data set. Note: In case of multiple input forcing products, you have to append the forcing product behind each variable. E.g., ‘prcp(mm/day)’ of the daymet product is ‘prcp(mm/day)_daymet’. When training on multiple frequencies (cf.use_frequencies), it is possible to define dynamic inputs for each frequency individually. To do so,dynamic_inputsmust be a dict mapping each frequency to a list of variables. E.g., to use precipitation from daymet for daily and from nldas-hourly for hourly predictions:dynamic_inputs: 1D: - prcp(mm/day)_daymet 1h: - total_precipitation_nldas_hourly
When
nan_handling_methodis set, this can also be specified as a list of lists. Then, each inner list defines a feature group. Refer tonan_handling_methodfor a description of how this can be used to handle missing input data.forecast_inputs: These are dynamic features (exactly likedyncamic_inputs) that are used as inputs to the forecasting portion of a forecast model. This allows different features to be used for the forecast and hindcast portions of a model. Ifforecast_inputsis present, then all features in this list must also appear in thedynamic_inputslist, which will contain both forecast and hindcast features.Note that this does not currently support a forecast rollout, meaning that because forecast inputs behave the same way as dynamic inputs, the forecast input for two timesteps ahead of time t will be the same as the forecast input for one day ahead of time t+1.
Note also that forecasting (and forecast inputs) is not supported for multi-timescale models.
hindcast_inputs: These are the same asforecast_inputsexcept that they are for the hindcast portion of a forecast model. As withforecast_inputsthese dynamic inputs must be included in thedynamic_inputslist.target_variables: List of the target variable(s). Names must match the exact names as defined in the data set.clip_targets_to_zero: Optional list of target variables to clip to zero during the computation of metrics (e.g. useful to compute zero-clipped metric during the validation between training epochs. Will not affect the data that is saved to disk after evaluation. That is, always the raw model outputs are saved in the result files. Therefore, you eventually need to manually clip the targets to zero if you load the model outputs from file and want to reproduce the metric values.duplicate_features: Can be used to duplicate time series features (e.g., for different normalizations). Can be either a str, list or dictionary (mapping from strings to ints). If string, duplicates the corresponding feature once. If list, duplicates all features in that list once. Use a dictionary to specify the exact number of duplicates you like. To each duplicated feature, we append_copyN, where N is counter starting at 1.lagged_features: Can be used to add a lagged copy of another feature to the list of available input/output features. Has to be a dictionary mapping from strings to int or a list of ints, where the string specifies the feature name and the int(s) the number of lagged time steps. Those values can be positive or negative (see pandas shift for details). If a list of integers is provided, only unique values are considered. We append_shiftNto each lagged feature, where N is the shift count.autoregressive_inputs: Currently, only one autoregressive input is allowed, and only one output feature is allowed in an autoregressive model. This is a list of target feature(s) to be used as model inputs. These will be lagged by some number of timesteps > 0, and therefore must appear in the list oflagged_features. Autoregressive inputs are appended to the end of the dynamic features list when building the dataset(s). Missing data is supported in autoregressive inputs. During runtime, autoregressive models append binary flags as inputs to indicate missing data. Autoregressive inputs only work with models that support autoregression and will throw an error if they are included in a config file for a model that does not support autoregression. Leave empty if none should be used.nan_handling_method: One of [masked_mean,input_replacing,attention].These strategies can be used to train and evaluate models when some of the input data is missing some of the time. To use this, specify
dynamic_inputsas a list of lists. Each list then defines a feature group.The
masked_meanstrategy will embed each of these feature groups with its own embedding network (defined viadynamics_embedding) and then average the embeddings of all groups that are not missing at a given timestep.The
attentionstrategy is a more general version of masked_mean. It uses an attention mechanism where the query is the concatenation of static attributes (or their embedding) and a positional encoding, the keys and values are the embeddings of each feature group.The
input_replacingstrategy replaces all NaNs with zeros, concatenates all features from all feature groups, and adds a binary flag for each feature group to indicate missing values.
nan_sequence_probability: If set, the dataset will yield samples where each feature group ismissing with the specified probability during training. Any combination of feature groups can be dropped, but the dataset takes care not to drop all sequences of all groups at the same time.
nan_step_probability: Same asnan_sequence_probabilityexcept it defines the probability ofdropping individual time steps of a feature group, rather than the whole sequence. Note that it is possible for all three time steps to be dropped at the same time.
nan_handling_pos_encoding_size: Defines the size of a sine/cosine positional encoding to be addedto each feature group (for
masked_mean), the concatenated vector of feature groups (forinput_replacing), or to the inputs of the query embedding (forattention).
random_holdout_from_dynamic_features: Dictionary to define timeseries features to remove random sections of data from. This allows for conducting certain types of missing data analyses. Keys of this dictionary must match exact names of dynamic inputs as defined in the data set. Values are a dict with keys “missing_fraction” and “mean_missing_length”, and values that are float and float, respectively, representing (“missing_fraction”) the long-term fraction of data to be randomly removed from a given feature, and (2) the expected value of the length of continuous subsequences removed from the timeseries. These two distribution parameters do not consider whether there are any NaN’s in the original timeseries. Only works for timeseries features (inputs and targets). Leave empty if none should be used.custom_normalization: Has to be a dictionary, mapping from time series feature names tocenteringand/orscaling. Using this argument allows to overwrite the default zero mean, unit variance normalization per feature. Supported options forcenteringare ‘None’ or ‘none’, ‘mean’, ‘median’ and min. None/none sets the centering parameter to 0.0, mean to the feature mean, median to the feature median, and min to the feature minimum, respectively. Supported options for scaling are ‘None’ or ‘none’, ‘std’, ‘minmax’. None/none sets the scaling parameter to 1.0, std to the feature standard deviation and minmax to the feature max minus the feature min. The combination of centering: min and scaling: minmax results in min/max feature scaling to the range [0,1].additional_feature_files: Path to a pickle file (or list of paths for multiple files), containing a dictionary with each key corresponding to one basin id and the value is a date-time indexed pandas DataFrame. Allows the option to add any arbitrary data that is not included in the standard data sets. Convention: If a column is used as static input, the value to use for specific sample should be in same row (datetime) as the target discharge value.evolving_attributes: Columns of the DataFrame loaded with theadditional_feature_filesthat should be used as “static” features. These values will be used as static inputs, but they can evolve over time. Convention: The value to use for a specific input sequence should be in the same row (datetime) as the last time step of that sequence. Names must match the column names in the DataFrame. Leave empty to not use any additional static feature.use_basin_id_encoding: True/False. If True, creates a basin-one-hot encoding as a(n) (additional) static feature vector for each sample.timestep_counter: True/False. If True, creates a sequence of counting integers over the forecast sequence length as a dynamic input. This input is used to signal forecast lead time for an unrolling forecast. A similar dynamic input of constant zeros is added to the hindcast inputs. If a forecast model is not used then settingtimestep_counterto True will return an error.static_attributes: Which static attributes to use (e.g., from the static camels attributes for the CAMELS dataset). Leave empty if none should be used. For hydroatlas attributes, usehydroatlas_attributesinstead. Names must match the exact names as defined in the data set.hydroatlas_attributes: Which HydroATLAS attributes to use. Leave empty if none should be used. Names must match the exact names as defined in the data set.
CAMELS US specific
Can be ignored if dataset not in ['camels_us', 'hourly_camels_us']
forcings: Can be either a string or a list of strings that correspond to forcing products in the camels data set. Also supportsmaurer_extended,nldas_extended, and (forhourly_camels_us)nldas_hourly.