CamelsBR
- class neuralhydrology.datasetzoo.camelsbr.CamelsBR(cfg: Config, is_train: bool, period: str, basin: str = None, additional_features: List[Dict[str, pandas.DataFrame]] = [], id_to_int: Dict[str, int] = {}, scaler: Dict[str, pandas.Series | xarray.DataArray] = {})
Bases:
BaseDataset
Data set class for the CAMELS-BR dataset by [1].
For more efficient data loading during model training/evaluating, this dataset class expects the CAMELS-BR dataset in a processed format. Specifically, this dataset class works with per-basin csv files that contain all timeseries data combined. Use the
preprocess_camels_br_dataset()
function to process the original dataset layout into this format.- Parameters:
cfg (Config) – The run configuration.
is_train (bool) – Defines if the dataset is used for training or evaluating. If True (training), means/stds for each feature are computed and stored to the run directory. If one-hot encoding is used, the mapping for the one-hot encoding is created and also stored to disk. If False, a scaler input is expected and similarly the id_to_int input if one-hot encoding is used.
period ({'train', 'validation', 'test'}) – Defines the period for which the data will be loaded
basin (str, optional) – If passed, the data for only this basin will be loaded. Otherwise the basin(s) are read from the appropriate basin file, corresponding to the period.
additional_features (List[Dict[str, pd.DataFrame]], optional) – List of dictionaries, mapping from a basin id to a pandas DataFrame. This DataFrame will be added to the data loaded from the dataset, and all columns are available as ‘dynamic_inputs’, ‘evolving_attributes’ and ‘target_variables’
id_to_int (Dict[str, int], optional) – If the config argument ‘use_basin_id_encoding’ is True in the config and period is either ‘validation’ or ‘test’, this input is required. It is a dictionary, mapping from basin id to an integer (the one-hot encoding).
scaler (Dict[str, Union[pd.Series, xarray.DataArray]], optional) – If period is either ‘validation’ or ‘test’, this input is required. It contains the centering and scaling for each feature and is stored to the run directory during training (train_data/train_data_scaler.yml).
References
- neuralhydrology.datasetzoo.camelsbr.load_camels_br_attributes(data_dir: Path, basins: List[str] = []) pandas.DataFrame
Load CAMELS-BR attributes.
- Parameters:
data_dir (Path) – Path to the CAMELS-BR directory. Assumes that the subdirectory 01_CAMELS_BR_attributes is located in the data directory root folder.
basins (List[str], optional) – If passed, return only attributes for the basins specified in this list. Otherwise, the attributes of all basins are returned.
- Returns:
Basin-indexed DataFrame, containing the attributes as columns.
- Return type:
pd.DataFrame
- neuralhydrology.datasetzoo.camelsbr.load_camels_br_timeseries(data_dir: Path, basin: str) pandas.DataFrame
Load the time series data for one basin of the CAMELS-BR data set.
- Parameters:
data_dir (Path) – Path to the CAMELS-BR directory. This folder must contain a folder called ‘preprocessed’ containing the per-basin csv files created by
preprocess_camels_br_dataset()
.basin (str) – Basin identifier number as string.
- Returns:
Time-indexed DataFrame, containing the time series data (forcings + discharge) data.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If no sub-folder called ‘preprocessed’ exists within the root directory of the CAMELS-AUS dataset.
- neuralhydrology.datasetzoo.camelsbr.preprocess_camels_br_dataset(data_dir: Path)
Preprocess CAMELS-BR data set and create per-basin files for more flexible and faster data loading.
This function will read-in all time series text files and create per-basin csv files containing all timeseries features at once in a new subfolder called “preprocessed”. Will only consider the 897 basin for which streamflow and forcings exist. Note that simulated streamflow only exists for 593 out of 897 basins.
- Parameters:
data_dir (Path) – Path to the CAMELS-BR data set containing the different subdirectories that can be downloaded as individual zip archives.
- Raises:
FileExistsError – If a sub-folder called ‘preprocessed’ already exists in data_dir.
FileNotFoundError – If any of the subdirectories of CAMELS-BR is not found in data_dir, specifically the folders starting with 03_* up to 13_*.