utils

neuralhydrology.datautils.utils.attributes_sanity_check(df: pandas.DataFrame)

Utility function to check the suitability of the attributes for model training.

This utility function can be used to check if any attribute has a standard deviation of zero. This would lead to NaN’s when normalizing the features and thus would lead to NaN’s when training the model. It also checks if any attribute for any basin contains a NaN, which would also cause NaNs during model training.

Parameters:

df (pd.DataFrame) – DataFrame of catchment attributes as columns.

Raises:

RuntimeError – If one or more attributes have a standard deviation of zero or any attribute for any basin is NaN.

neuralhydrology.datautils.utils.compare_frequencies(freq_one: str, freq_two: str) int

Compare two frequencies.

Note that only frequencies that work with get_frequency_factor can be compared.

Parameters:
  • freq_one (str) – First frequency.

  • freq_two (str) – Second frequency.

Returns:

-1 if freq_one is lower than freq_two, +1 if it is larger, 0 if they are equal.

Return type:

int

Raises:

ValueError – If the two frequencies are not comparable via get_frequency_factor.

neuralhydrology.datautils.utils.get_frequency_factor(freq_one: str, freq_two: str) float

Get relative factor between the two frequencies.

Parameters:
  • freq_one (str) – String representation of the first frequency.

  • freq_two (str) – String representation of the second frequency.

Returns:

Ratio of freq_one to freq_two.

Return type:

float

Raises:

ValueError – If the frequency factor cannot be determined. This can be the case if the frequencies do not represent a fixed time delta and are not directly comparable (e.g., because they have the same unit) E.g., a month does not represent a fixed time delta. Thus, 1D and 1M are not comparable. However, 1M and 2M are comparable since they have the same unit.

neuralhydrology.datautils.utils.infer_datetime_coord(xr: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset) str

Checks for coordinate with ‘date’ in its name and returns the name.

Parameters:

xr (Union[DataArray, Dataset]) – Array to infer coordinate name of.

Returns:

Name of datetime coordinate name.

Return type:

str

Raises:

RuntimeError – If none or multiple coordinates with ‘date’ in its name are found.

neuralhydrology.datautils.utils.infer_frequency(index: pandas.DatetimeIndex | ndarray) str

Infer the frequency of an index of a pandas DataFrame/Series or xarray DataArray.

Parameters:

index (Union[pd.DatetimeIndex, np.ndarray]) – DatetimeIndex of a DataFrame/Series or array of datetime values.

Returns:

Frequency of the index as a pandas frequency string

Return type:

str

Raises:

ValueError – If the frequency cannot be inferred from the index or is zero.

neuralhydrology.datautils.utils.load_basin_file(basin_file: Path) List[str]

Load list of basins from text file.

Note: Basins names are not allowed to end with ‘_period*’

Parameters:

basin_file (Path) – Path to a basin txt file. File has to contain one basin id per row, while empty rows are ignored.

Returns:

List of basin ids as strings.

Return type:

List[str]

Raises:

ValueError – In case of invalid basin names that would cause problems internally.

neuralhydrology.datautils.utils.load_hydroatlas_attributes(data_dir: Path, basins: List[str] = []) pandas.DataFrame

Load HydroATLAS attributes into a pandas DataFrame

Parameters:
  • data_dir (Path) – Path to the root directory of the dataset. Must contain a folder called ‘hydroatlas_attributes’ with a file called attributes.csv. The attributes file is expected to have one column called basin_id.

  • basins (List[str], optional) – If passed, return only attributes for the basins specified in this list. Otherwise, the attributes of all basins are returned.

Returns:

Basin-indexed DataFrame containing the HydroATLAS attributes.

Return type:

pd.DataFrame

neuralhydrology.datautils.utils.load_scaler(run_dir: Path) Dict[str, pandas.Series | xarray.Dataset]

Load feature scaler from run directory.

Checks run directory for scaler file in yaml format (new) or pickle format (old).

Parameters:

run_dir (Path) – Run directory. Has to contain a folder ‘train_data’ that contains the ‘train_data_scaler’ file.

Return type:

Dictionary, containing the feature scaler for static and dynamic features.

Raises:

FileNotFoundError – If neither a ‘train_data_scaler.yml’ or ‘train_data_scaler.p’ file is found in the ‘train_data’ folder of the run directory.

neuralhydrology.datautils.utils.sort_frequencies(frequencies: List[str]) List[str]

Sort the passed frequencies from low to high frequencies.

Use pandas frequency strings to define frequencies. Note: The strings need to include values, e.g., ‘1D’ instead of ‘D’.

Parameters:

frequencies (List[str]) – List of pandas frequency identifiers to be sorted.

Returns:

Sorted list of pandas frequency identifiers.

Return type:

List[str]

Raises:

ValueError – If a pair of frequencies in frequencies is not comparable via compare_frequencies.