is_train (bool) – Defines if the dataset is used for training or evaluating. If True (training), means/stds for each feature
are computed and stored to the run directory. If one-hot encoding is used, the mapping for the one-hot encoding
is created and also stored to disk. If False, a scaler input is expected and similarly the id_to_int input
if one-hot encoding is used.
period ({'train', 'validation', 'test'}) – Defines the period for which the data will be loaded
basin (str, optional) – If passed, the data for only this basin will be loaded. Otherwise the basin(s) are read from the appropriate
basin file, corresponding to the period.
additional_features (List[Dict[str, pd.DataFrame]], optional) – List of dictionaries, mapping from a basin id to a pandas DataFrame. This DataFrame will be added to the data
loaded from the dataset and all columns are available as ‘dynamic_inputs’, ‘evolving_attributes’ and
‘target_variables’
id_to_int (Dict[str, int], optional) – If the config argument ‘use_basin_id_encoding’ is True in the config and period is either ‘validation’ or
‘test’, this input is required. It is a dictionary, mapping from basin id to an integer (the one-hot encoding).
scaler (Dict[str, Union[pd.Series, xarray.DataArray]], optional) – If period is either ‘validation’ or ‘test’, this input is required. It contains the centering and scaling
for each feature and is stored to the run directory during training (train_data/train_data_scaler.yml).
Load CAMELS US attributes from the dataset provided by [3]
Parameters:
data_dir (Path) – Path to the CAMELS US directory. This folder must contain a ‘camels_attributes_v2.0’ folder (the original
data set) containing the corresponding txt files for each attribute group.
basins (List[str], optional) – If passed, return only attributes for the basins specified in this list. Otherwise, the attributes of all basins
are returned.
Returns:
Basin-indexed DataFrame, containing the attributes as columns.
Load the discharge data for a basin of the CAMELS US data set.
Parameters:
data_dir (Path) – Path to the CAMELS US directory. This folder must contain a ‘usgs_streamflow’ folder with 18
subdirectories (for the 18 HUCS) as in the original CAMELS data set. In each HUC folder are the discharge files
(.txt), starting with the 8-digit basin id.
basin (str) – 8-digit USGS identifier of the basin.
area (int) – Catchment area (m2), used to normalize the discharge.
Returns:
Time-index pandas.Series of the discharge values (mm/day)
Load the forcing data for a basin of the CAMELS US data set.
Parameters:
data_dir (Path) – Path to the CAMELS US directory. This folder must contain a ‘basin_mean_forcing’ folder containing one
subdirectory for each forcing. The forcing directories have to contain 18 subdirectories (for the 18 HUCS) as in
the original CAMELS data set. In each HUC folder are the forcing files (.txt), starting with the 8-digit basin
id.
basin (str) – 8-digit USGS identifier of the basin.
forcings (str) – Can be e.g. ‘daymet’ or ‘nldas’, etc. Must match the folder names in the ‘basin_mean_forcing’ directory.
Returns:
pd.DataFrame – Time-indexed DataFrame, containing the forcing data.
int – Catchment area (m2), specified in the header of the forcing file.