utils
- neuralhydrology.datautils.utils.attributes_sanity_check(df: pandas.DataFrame)
Utility function to check the suitability of the attributes for model training.
This utility function can be used to check if any attribute has a standard deviation of zero. This would lead to NaN’s when normalizing the features and thus would lead to NaN’s when training the model. It also checks if any attribute for any basin contains a NaN, which would also cause NaNs during model training.
- Parameters:
df (pd.DataFrame) – DataFrame of catchment attributes as columns.
- Raises:
RuntimeError – If one or more attributes have a standard deviation of zero or any attribute for any basin is NaN.
- neuralhydrology.datautils.utils.compare_frequencies(freq_one: str, freq_two: str) int
Compare two frequencies.
Note that only frequencies that work with get_frequency_factor can be compared.
- Parameters:
freq_one (str) – First frequency.
freq_two (str) – Second frequency.
- Returns:
-1 if freq_one is lower than freq_two, +1 if it is larger, 0 if they are equal.
- Return type:
int
- Raises:
ValueError – If the two frequencies are not comparable via get_frequency_factor.
- neuralhydrology.datautils.utils.get_frequency_factor(freq_one: str, freq_two: str) float
Get relative factor between the two frequencies.
- Parameters:
freq_one (str) – String representation of the first frequency.
freq_two (str) – String representation of the second frequency.
- Returns:
Ratio of freq_one to freq_two.
- Return type:
float
- Raises:
ValueError – If the frequency factor cannot be determined. This can be the case if the frequencies do not represent a fixed time delta and are not directly comparable (e.g., because they have the same unit) E.g., a month does not represent a fixed time delta. Thus, 1D and 1M are not comparable. However, 1M and 2M are comparable since they have the same unit.
- neuralhydrology.datautils.utils.infer_datetime_coord(xr: xarray.core.dataarray.DataArray | xarray.core.dataset.Dataset) str
Checks for coordinate with ‘date’ in its name and returns the name.
- Parameters:
xr (Union[DataArray, Dataset]) – Array to infer coordinate name of.
- Returns:
Name of datetime coordinate name.
- Return type:
str
- Raises:
RuntimeError – If none or multiple coordinates with ‘date’ in its name are found.
- neuralhydrology.datautils.utils.infer_frequency(index: pandas.DatetimeIndex | ndarray) str
Infer the frequency of an index of a pandas DataFrame/Series or xarray DataArray.
- Parameters:
index (Union[pd.DatetimeIndex, np.ndarray]) – DatetimeIndex of a DataFrame/Series or array of datetime values.
- Returns:
Frequency of the index as a pandas frequency string
- Return type:
str
- Raises:
ValueError – If the frequency cannot be inferred from the index or is zero.
- neuralhydrology.datautils.utils.load_basin_file(basin_file: Path) List[str]
Load list of basins from text file.
Note: Basins names are not allowed to end with ‘_period*’
- Parameters:
basin_file (Path) – Path to a basin txt file. File has to contain one basin id per row, while empty rows are ignored.
- Returns:
List of basin ids as strings.
- Return type:
List[str]
- Raises:
ValueError – In case of invalid basin names that would cause problems internally.
- neuralhydrology.datautils.utils.load_hydroatlas_attributes(data_dir: Path, basins: List[str] = []) pandas.DataFrame
Load HydroATLAS attributes into a pandas DataFrame
- Parameters:
data_dir (Path) – Path to the root directory of the dataset. Must contain a folder called ‘hydroatlas_attributes’ with a file called attributes.csv. The attributes file is expected to have one column called basin_id.
basins (List[str], optional) – If passed, return only attributes for the basins specified in this list. Otherwise, the attributes of all basins are returned.
- Returns:
Basin-indexed DataFrame containing the HydroATLAS attributes.
- Return type:
pd.DataFrame
- neuralhydrology.datautils.utils.load_scaler(run_dir: Path) Dict[str, pandas.Series | xarray.Dataset]
Load feature scaler from run directory.
Checks run directory for scaler file in yaml format (new) or pickle format (old).
- Parameters:
run_dir (Path) – Run directory. Has to contain a folder ‘train_data’ that contains the ‘train_data_scaler’ file.
- Return type:
Dictionary, containing the feature scaler for static and dynamic features.
- Raises:
FileNotFoundError – If neither a ‘train_data_scaler.yml’ or ‘train_data_scaler.p’ file is found in the ‘train_data’ folder of the run directory.
- neuralhydrology.datautils.utils.sort_frequencies(frequencies: List[str]) List[str]
Sort the passed frequencies from low to high frequencies.
Use pandas frequency strings to define frequencies. Note: The strings need to include values, e.g., ‘1D’ instead of ‘D’.
- Parameters:
frequencies (List[str]) – List of pandas frequency identifiers to be sorted.
- Returns:
Sorted list of pandas frequency identifiers.
- Return type:
List[str]
- Raises:
ValueError – If a pair of frequencies in frequencies is not comparable via compare_frequencies.