is_train (bool) – Defines if the dataset is used for training or evaluating. If True (training), means/stds for each feature
are computed and stored to the run directory. If one-hot encoding is used, the mapping for the one-hot encoding
is created and also stored to disk. If False, a scaler input is expected and similarly the id_to_int input
if one-hot encoding is used.
period ({'train', 'validation', 'test'}) – Defines the period for which the data will be loaded
basin (str, optional) – If passed, the data for only this basin will be loaded. Otherwise the basin(s) are read from the appropriate
basin file, corresponding to the period.
additional_features (List[Dict[str, pd.DataFrame]], optional) – List of dictionaries, mapping from a basin id to a pandas DataFrame. This DataFrame will be added to the data
loaded from the dataset, and all columns are available as ‘dynamic_inputs’, ‘evolving_attributes’ and
‘target_variables’
id_to_int (Dict[str, int], optional) – If the config argument ‘use_basin_id_encoding’ is True in the config and period is either ‘validation’ or
‘test’, this input is required. It is a dictionary, mapping from basin id to an integer (the one-hot encoding).
scaler (Dict[str, Union[pd.Series, xarray.DataArray]], optional) – If period is either ‘validation’ or ‘test’, this input is required. It contains the centering and scaling
for each feature and is stored to the run directory during training (train_data/train_data_scaler.yml).
Load CAMELS GB attributes from the dataset provided by [2]
Parameters:
data_dir (Path) – Path to the CAMELS GB directory. This folder must contain an ‘attributes’ folder containing the corresponding
csv files for each attribute group (ending with _attributes.csv).
basins (List[str], optional) – If passed, return only attributes for the basins specified in this list. Otherwise, the attributes of all basins
are returned.
Returns:
Basin-indexed DataFrame, containing the attributes as columns.
Return type:
pd.DataFrame
Raises:
FileNotFoundError – If no subfolder called ‘attributes’ exists within the root directory of the CAMELS GB data set.
Load the time series data for one basin of the CAMELS GB data set.
Parameters:
data_dir (Path) – Path to the CAMELS GB directory. This folder must contain a folder called ‘timeseries’ containing the forcing
files for each basin as .csv file. The file names have to start with ‘CAMELS_GB_hydromet_timeseries’.
basin (str) – Basin identifier number as string.
Returns:
Time-indexed DataFrame, containing the time series data (forcings + discharge) data.