CamelsUS

class neuralhydrology.datasetzoo.camelsus.CamelsUS(cfg: Config, is_train: bool, period: str, basin: str = None, additional_features: List[Dict[str, pandas.DataFrame]] = [], id_to_int: Dict[str, int] = {}, scaler: Dict[str, pandas.Series | xarray.DataArray] = {})

Bases: BaseDataset

Data set class for the CAMELS US data set by [1] and [2].

Parameters:
  • cfg (Config) – The run configuration.

  • is_train (bool) – Defines if the dataset is used for training or evaluating. If True (training), means/stds for each feature are computed and stored to the run directory. If one-hot encoding is used, the mapping for the one-hot encoding is created and also stored to disk. If False, a scaler input is expected and similarly the id_to_int input if one-hot encoding is used.

  • period ({'train', 'validation', 'test'}) – Defines the period for which the data will be loaded

  • basin (str, optional) – If passed, the data for only this basin will be loaded. Otherwise the basin(s) are read from the appropriate basin file, corresponding to the period.

  • additional_features (List[Dict[str, pd.DataFrame]], optional) – List of dictionaries, mapping from a basin id to a pandas DataFrame. This DataFrame will be added to the data loaded from the dataset and all columns are available as ‘dynamic_inputs’, ‘evolving_attributes’ and ‘target_variables’

  • id_to_int (Dict[str, int], optional) – If the config argument ‘use_basin_id_encoding’ is True in the config and period is either ‘validation’ or ‘test’, this input is required. It is a dictionary, mapping from basin id to an integer (the one-hot encoding).

  • scaler (Dict[str, Union[pd.Series, xarray.DataArray]], optional) – If period is either ‘validation’ or ‘test’, this input is required. It contains the centering and scaling for each feature and is stored to the run directory during training (train_data/train_data_scaler.yml).

References

neuralhydrology.datasetzoo.camelsus.load_camels_us_attributes(data_dir: Path, basins: List[str] = []) pandas.DataFrame

Load CAMELS US attributes from the dataset provided by [3]

Parameters:
  • data_dir (Path) – Path to the CAMELS US directory. This folder must contain a ‘camels_attributes_v2.0’ folder (the original data set) containing the corresponding txt files for each attribute group.

  • basins (List[str], optional) – If passed, return only attributes for the basins specified in this list. Otherwise, the attributes of all basins are returned.

Returns:

Basin-indexed DataFrame, containing the attributes as columns.

Return type:

pandas.DataFrame

References

neuralhydrology.datasetzoo.camelsus.load_camels_us_discharge(data_dir: Path, basin: str, area: int) pandas.Series

Load the discharge data for a basin of the CAMELS US data set.

Parameters:
  • data_dir (Path) – Path to the CAMELS US directory. This folder must contain a ‘usgs_streamflow’ folder with 18 subdirectories (for the 18 HUCS) as in the original CAMELS data set. In each HUC folder are the discharge files (.txt), starting with the 8-digit basin id.

  • basin (str) – 8-digit USGS identifier of the basin.

  • area (int) – Catchment area (m2), used to normalize the discharge.

Returns:

Time-index pandas.Series of the discharge values (mm/day)

Return type:

pd.Series

neuralhydrology.datasetzoo.camelsus.load_camels_us_forcings(data_dir: Path, basin: str, forcings: str) Tuple[pandas.DataFrame, int]

Load the forcing data for a basin of the CAMELS US data set.

Parameters:
  • data_dir (Path) – Path to the CAMELS US directory. This folder must contain a ‘basin_mean_forcing’ folder containing one subdirectory for each forcing. The forcing directories have to contain 18 subdirectories (for the 18 HUCS) as in the original CAMELS data set. In each HUC folder are the forcing files (.txt), starting with the 8-digit basin id.

  • basin (str) – 8-digit USGS identifier of the basin.

  • forcings (str) – Can be e.g. ‘daymet’ or ‘nldas’, etc. Must match the folder names in the ‘basin_mean_forcing’ directory.

Returns:

  • pd.DataFrame – Time-indexed DataFrame, containing the forcing data.

  • int – Catchment area (m2), specified in the header of the forcing file.