CamelsBR

class neuralhydrology.datasetzoo.camelsbr.CamelsBR(cfg: Config, is_train: bool, period: str, basin: str = None, additional_features: List[Dict[str, pandas.DataFrame]] = [], id_to_int: Dict[str, int] = {}, scaler: Dict[str, pandas.Series | xarray.DataArray] = {})

Bases: BaseDataset

Data set class for the CAMELS-BR dataset by [1].

For more efficient data loading during model training/evaluating, this dataset class expects the CAMELS-BR dataset in a processed format. Specifically, this dataset class works with per-basin csv files that contain all timeseries data combined. Use the preprocess_camels_br_dataset() function to process the original dataset layout into this format.

Parameters:
  • cfg (Config) – The run configuration.

  • is_train (bool) – Defines if the dataset is used for training or evaluating. If True (training), means/stds for each feature are computed and stored to the run directory. If one-hot encoding is used, the mapping for the one-hot encoding is created and also stored to disk. If False, a scaler input is expected and similarly the id_to_int input if one-hot encoding is used.

  • period ({'train', 'validation', 'test'}) – Defines the period for which the data will be loaded

  • basin (str, optional) – If passed, the data for only this basin will be loaded. Otherwise the basin(s) are read from the appropriate basin file, corresponding to the period.

  • additional_features (List[Dict[str, pd.DataFrame]], optional) – List of dictionaries, mapping from a basin id to a pandas DataFrame. This DataFrame will be added to the data loaded from the dataset, and all columns are available as ‘dynamic_inputs’, ‘evolving_attributes’ and ‘target_variables’

  • id_to_int (Dict[str, int], optional) – If the config argument ‘use_basin_id_encoding’ is True in the config and period is either ‘validation’ or ‘test’, this input is required. It is a dictionary, mapping from basin id to an integer (the one-hot encoding).

  • scaler (Dict[str, Union[pd.Series, xarray.DataArray]], optional) – If period is either ‘validation’ or ‘test’, this input is required. It contains the centering and scaling for each feature and is stored to the run directory during training (train_data/train_data_scaler.yml).

References

neuralhydrology.datasetzoo.camelsbr.load_camels_br_attributes(data_dir: Path, basins: List[str] = []) pandas.DataFrame

Load CAMELS-BR attributes.

Parameters:
  • data_dir (Path) – Path to the CAMELS-BR directory. Assumes that the subdirectory 01_CAMELS_BR_attributes is located in the data directory root folder.

  • basins (List[str], optional) – If passed, return only attributes for the basins specified in this list. Otherwise, the attributes of all basins are returned.

Returns:

Basin-indexed DataFrame, containing the attributes as columns.

Return type:

pd.DataFrame

neuralhydrology.datasetzoo.camelsbr.load_camels_br_timeseries(data_dir: Path, basin: str) pandas.DataFrame

Load the time series data for one basin of the CAMELS-BR data set.

Parameters:
  • data_dir (Path) – Path to the CAMELS-BR directory. This folder must contain a folder called ‘preprocessed’ containing the per-basin csv files created by preprocess_camels_br_dataset().

  • basin (str) – Basin identifier number as string.

Returns:

Time-indexed DataFrame, containing the time series data (forcings + discharge) data.

Return type:

pd.DataFrame

Raises:

FileNotFoundError – If no sub-folder called ‘preprocessed’ exists within the root directory of the CAMELS-AUS dataset.

neuralhydrology.datasetzoo.camelsbr.preprocess_camels_br_dataset(data_dir: Path)

Preprocess CAMELS-BR data set and create per-basin files for more flexible and faster data loading.

This function will read-in all time series text files and create per-basin csv files containing all timeseries features at once in a new subfolder called “preprocessed”. Will only consider the 897 basin for which streamflow and forcings exist. Note that simulated streamflow only exists for 593 out of 897 basins.

Parameters:

data_dir (Path) – Path to the CAMELS-BR data set containing the different subdirectories that can be downloaded as individual zip archives.

Raises:
  • FileExistsError – If a sub-folder called ‘preprocessed’ already exists in data_dir.

  • FileNotFoundError – If any of the subdirectories of CAMELS-BR is not found in data_dir, specifically the folders starting with 03_* up to 13_*.