CamelsCL

class neuralhydrology.datasetzoo.camelscl.CamelsCL(cfg: Config, is_train: bool, period: str, basin: str = None, additional_features: List[Dict[str, pandas.DataFrame]] = [], id_to_int: Dict[str, int] = {}, scaler: Dict[str, pandas.Series | xarray.DataArray] = {})

Bases: BaseDataset

Data set class for the CAMELS CL dataset by [1].

For more efficient data loading during model training/evaluating, this dataset class expects the CAMELS-CL dataset in a processed format. Specifically, this dataset class works with per-basin csv files that contain all timeseries data combined. Use the preprocess_camels_cl_dataset() function to process the original dataset layout into this format.

Parameters:
  • cfg (Config) – The run configuration.

  • is_train (bool) – Defines if the dataset is used for training or evaluating. If True (training), means/stds for each feature are computed and stored to the run directory. If one-hot encoding is used, the mapping for the one-hot encoding is created and also stored to disk. If False, a scaler input is expected and similarly the id_to_int input if one-hot encoding is used.

  • period ({'train', 'validation', 'test'}) – Defines the period for which the data will be loaded

  • basin (str, optional) – If passed, the data for only this basin will be loaded. Otherwise the basin(s) are read from the appropriate basin file, corresponding to the period.

  • additional_features (List[Dict[str, pd.DataFrame]], optional) – List of dictionaries, mapping from a basin id to a pandas DataFrame. This DataFrame will be added to the data loaded from the dataset, and all columns are available as ‘dynamic_inputs’, ‘evolving_attributes’ and ‘target_variables’

  • id_to_int (Dict[str, int], optional) – If the config argument ‘use_basin_id_encoding’ is True in the config and period is either ‘validation’ or ‘test’, this input is required. It is a dictionary, mapping from basin id to an integer (the one-hot encoding).

  • scaler (Dict[str, Union[pd.Series, xarray.DataArray]], optional) – If period is either ‘validation’ or ‘test’, this input is required. It contains the centering and scaling for each feature and is stored to the run directory during training (train_data/train_data_scaler.yml).

References

neuralhydrology.datasetzoo.camelscl.load_camels_cl_attributes(data_dir: Path, basins: List[str] = []) pandas.DataFrame

Load CAMELS CL attributes

Parameters:
  • data_dir (Path) – Path to the CAMELS CL directory. Assumes that a file called ‘1_CAMELScl_attributes.txt’ exists.

  • basins (List[str], optional) – If passed, return only attributes for the basins specified in this list. Otherwise, the attributes of all basins are returned.

Returns:

Basin-indexed DataFrame, containing the attributes as columns.

Return type:

pd.DataFrame

neuralhydrology.datasetzoo.camelscl.load_camels_cl_timeseries(data_dir: Path, basin: str) pandas.DataFrame

Load the time series data for one basin of the CAMELS CL data set.

Parameters:
  • data_dir (Path) – Path to the CAMELS CL directory. This folder must contain a folder called ‘preprocessed’ containing the per-basin csv files created by preprocess_camels_cl_dataset().

  • basin (str) – Basin identifier number as string.

Returns:

Time-indexed DataFrame, containing the time series data (forcings + discharge) data.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If no sub-folder called ‘preprocessed’ exists within the root directory of the CAMELS CL dataset.

neuralhydrology.datasetzoo.camelscl.preprocess_camels_cl_dataset(data_dir: Path)

Preprocess CAMELS-CL data set and create per-basin files for more flexible and faster data loading.

This function will read-in all daily time series csv files and create per-basin csv files in a new subfolder called “preprocessed”. This code is specifically designed for the “CAMELS-CL versión 2022 enero” version that can be downloaded from here.

Parameters:

data_dir (Path) – Path to the CAMELS-CL data set. All csv-files from the original dataset should be present in this folder.

Raises:

FileExistsError – If a sub-folder called ‘preprocessed’ already exists in data_dir.