HourlyCamelsUS

class neuralhydrology.datasetzoo.hourlycamelsus.HourlyCamelsUS(cfg: Config, is_train: bool, period: str, basin: str = None, additional_features: list = [], id_to_int: dict = {}, scaler: dict = {})

Bases: CamelsUS

Data set class providing hourly data for CAMELS US basins.

This class extends the CamelsUS dataset class by hourly in- and output data. Currently, only NLDAS forcings are available at an hourly resolution.

Parameters:

cfg (Config) – The run configuration.
is_train (bool) – Defines if the dataset is used for training or evaluating. If True (training), means/stds for each feature are computed and stored to the run directory. If one-hot encoding is used, the mapping for the one-hot encoding is created and also stored to disk. If False, a scaler input is expected and similarly the id_to_int input if one-hot encoding is used.
period ({'train', 'validation', 'test'}) – Defines the period for which the data will be loaded
basin (str, optional) – If passed, the data for only this basin will be loaded. Otherwise the basin(s) are read from the appropriate basin file, corresponding to the period.
additional_features (List[Dict[str, pd.DataFrame]], optional) – List of dictionaries, mapping from a basin id to a pandas DataFrame. This DataFrame will be added to the data loaded from the dataset and all columns are available as ‘dynamic_inputs’, ‘evolving_attributes’ and ‘target_variables’
id_to_int (Dict[str, int], optional) – If the config argument ‘use_basin_id_encoding’ is True in the config and period is either ‘validation’ or ‘test’, this input is required. It is a dictionary, mapping from basin id to an integer (the one-hot encoding).
scaler (Dict[str, Union[pd.Series, xarray.DataArray]], optional) – If period is either ‘validation’ or ‘test’, this input is required. It contains the centering and scaling for each feature and is stored to the run directory during training (train_data/train_data_scaler.yml).

load_hourly_data(basin: str, forcings: str) → pandas.DataFrame

Load a single set of hourly forcings and discharge. If available, loads from NetCDF, else from csv.

Parameters:

basin (str) – Identifier of the basin for which to load data.
forcings (str) – Name of the forcings set to load.

Returns:

Time-indexed DataFrame with forcings and discharge values for the specified basin.

Return type:

pd.DataFrame

neuralhydrology.datasetzoo.hourlycamelsus.load_hourly_us_discharge(data_dir: Path, basin: str) → pandas.DataFrame

Load the hourly discharge data for a basin of the CAMELS US data set.

Parameters:

data_dir (Path) – Path to the CAMELS US directory. This folder must contain a folder called ‘hourly’ with a subdirectory ‘usgs_streamflow’ which contains the discharge files (.csv) for each basin. File names must contain the 8-digit basin id.
basin (str) – 8-digit USGS identifier of the basin.

Returns:

Time-index Series of the discharge values (mm/hour)

Return type:

pd.Series

neuralhydrology.datasetzoo.hourlycamelsus.load_hourly_us_forcings(data_dir: Path, basin: str, forcings: str) → pandas.DataFrame

Load the hourly forcing data for a basin of the CAMELS US data set.

The hourly forcings are not included in the original data set by Newman et al. (2017).

Parameters:

data_dir (Path) – Path to the CAMELS US directory. This folder must contain an ‘hourly’ folder containing one subdirectory for each forcing, which contains the forcing files (.csv) for each basin. Files have to contain the 8-digit basin id.
basin (str) – 8-digit USGS identifier of the basin.
forcings (str) – Must match the folder names in the ‘hourly’ directory. E.g. ‘nldas_hourly’

Returns:

Time-indexed DataFrame, containing the forcing data.

Return type:

pd.DataFrame

neuralhydrology.datasetzoo.hourlycamelsus.load_hourly_us_netcdf(data_dir: Path, forcings: str) → xarray.Dataset

Load hourly forcing and discharge data from preprocessed netCDF file.

Parameters:

data_dir (Path) – Path to the CAMELS US directory. This folder must contain a folder called ‘hourly’, containing the netCDF file.
forcings (str) – Name of the forcing product. Must match the ending of the netCDF file. E.g. ‘nldas_hourly’ for ‘usgs-streamflow-nldas_hourly.nc’

Returns:

Dataset containing the combined discharge and forcing data of all basins (as stored in the netCDF)

Return type:

xarray.Dataset

neuralhydrology.datasetzoo.hourlycamelsus.load_hourly_us_stage(data_dir: Path, basin: str) → pandas.Series

Load the hourly stage data for a basin of the CAMELS US data set.

Parameters:

data_dir (Path) – Path to the CAMELS US directory. This folder must contain a folder called ‘hourly’ with a subdirectory ‘usgs_stage’ which contains the stage files (.csv) for each basin. File names must contain the 8-digit basin id.
basin (str) – 8-digit USGS identifier of the basin.

Returns:

Time-index Series of the stage values (m)

Return type:

pd.Series