CamelsGB

class neuralhydrology.datasetzoo.camelsgb.CamelsGB(cfg: Config, is_train: bool, period: str, basin: str = None, additional_features: List[Dict[str, pandas.DataFrame]] = [], id_to_int: Dict[str, int] = {}, scaler: Dict[str, pandas.Series | xarray.DataArray] = {})

Bases: BaseDataset

Data set class for the CAMELS GB dataset by [1].

Parameters:
  • cfg (Config) – The run configuration.

  • is_train (bool) – Defines if the dataset is used for training or evaluating. If True (training), means/stds for each feature are computed and stored to the run directory. If one-hot encoding is used, the mapping for the one-hot encoding is created and also stored to disk. If False, a scaler input is expected and similarly the id_to_int input if one-hot encoding is used.

  • period ({'train', 'validation', 'test'}) – Defines the period for which the data will be loaded

  • basin (str, optional) – If passed, the data for only this basin will be loaded. Otherwise the basin(s) are read from the appropriate basin file, corresponding to the period.

  • additional_features (List[Dict[str, pd.DataFrame]], optional) – List of dictionaries, mapping from a basin id to a pandas DataFrame. This DataFrame will be added to the data loaded from the dataset, and all columns are available as ‘dynamic_inputs’, ‘evolving_attributes’ and ‘target_variables’

  • id_to_int (Dict[str, int], optional) – If the config argument ‘use_basin_id_encoding’ is True in the config and period is either ‘validation’ or ‘test’, this input is required. It is a dictionary, mapping from basin id to an integer (the one-hot encoding).

  • scaler (Dict[str, Union[pd.Series, xarray.DataArray]], optional) – If period is either ‘validation’ or ‘test’, this input is required. It contains the centering and scaling for each feature and is stored to the run directory during training (train_data/train_data_scaler.yml).

References

neuralhydrology.datasetzoo.camelsgb.load_camels_gb_attributes(data_dir: Path, basins: List[str] = []) pandas.DataFrame

Load CAMELS GB attributes from the dataset provided by [2]

Parameters:
  • data_dir (Path) – Path to the CAMELS GB directory. This folder must contain an ‘attributes’ folder containing the corresponding csv files for each attribute group (ending with _attributes.csv).

  • basins (List[str], optional) – If passed, return only attributes for the basins specified in this list. Otherwise, the attributes of all basins are returned.

Returns:

Basin-indexed DataFrame, containing the attributes as columns.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If no subfolder called ‘attributes’ exists within the root directory of the CAMELS GB data set.

References

neuralhydrology.datasetzoo.camelsgb.load_camels_gb_timeseries(data_dir: Path, basin: str) pandas.DataFrame

Load the time series data for one basin of the CAMELS GB data set.

Parameters:
  • data_dir (Path) – Path to the CAMELS GB directory. This folder must contain a folder called ‘timeseries’ containing the forcing files for each basin as .csv file. The file names have to start with ‘CAMELS_GB_hydromet_timeseries’.

  • basin (str) – Basin identifier number as string.

Returns:

Time-indexed DataFrame, containing the time series data (forcings + discharge) data.

Return type:

pd.DataFrame