Data Prerequisites

All of our tutorials in which you train and evaluate a model use the CAMELS US data set, either in its original form or with some extensions. In this notebook, we will guide you through the process of downloading all essential data set pieces and explain how NeuralHydrology expects the folder structure of the CAMELS US dataset so that you will be able to run all of the tutorials.

CAMELS US meteorological time series and streamflow data

The meteorological time series serve in most of our tutorials as model inputs, while the streamflow time series are the target values. You can download both from the NCAR Homepage. Click on “Individual Files” under “Download Data and Documentation” and download the file “basin_timeseries_v1p2_metForcing_obsFlow.zip”, or use this direct link. The downloaded zip file, called basin_timeseries_v1p2_metForcing_obsFlow.zip contains two folders: basin_dataset_public (empty, 0 bytes) and basin_dataset_public_v1p2 (not empty, 14.9 GB). Extract the second one (basin_dataset_public_v1p2) to any place you like and probably rename it something more meaningful, like CAMELS_US. This folder is referred to as the root directory of the CAMELS US dataset. Among others, it should contain the following subdirectories:

CAMELS_US/              # originally named basin_dataset_public_v1p2
- basin_mean_forcing/   # contains the meteorological time series data
- usgs_streamflow/      # contains the streamflow data
- ...

NOTE: In the default configs of our tutorials, we assume that the data is stored in neuralhydrology/data/CAMELS_US. If you stored the data elsewhere, either create a symbolic link to this location or change the data_dir argument in the .yml configs of the corresponding tutorials to point to your local CAMELS US root directory.

Hourly forcing and streamflow data for CAMELS US basins

(required for Tutorial 04 - Multi-Timescale Prediction)

To be able to run this example yourself, you will need to download the hourly NLDAS forcings and the hourly streamflow data. Within the CAMELS US root directory, place the nldas_hourly and usgs-streamflow folders into a directory called hourly (/path/to/CAMELS_US/hourly/{nldas_hourly,usgs-streamflow}). Alternatively, you can place the hourly netCDF file (usgs-streamflow-nldas_hourly.nc) from Zenodo inside the hourly/ folder instead of the NLDAS and streamflow csv files. Loading from netCDF will be faster than from the csv files. In case of the first option (downloading the two folders), the CAMELS US folder structure from above would extend to:

CAMELS_US/              # originally named basin_dataset_public_v1p2
- basin_mean_forcing/   # contains the meteorological time series data
- usgs_streamflow/      # contains the streamflow data
- hourly/               # newly created folder to store the hourly forcing and streamflow data
    - nldas_hourly/     # NLDAS hourly forcing data
    - usgs-streamflow/  # hourly streamflow data
- ...

In case you downloaded the usgs-streamflow-nldas_hourly.nc it should like this:

CAMELS_US/                              # originally named basin_dataset_public_v1p2
- basin_mean_forcings/                  # contains the meteorological time series data
- usgs_streamflow/                      # contains the streamflow data
- hourly/                               # newly created folder to store the hourly forcing and streamflow data
    - usgs-streamflow-nldas_hourly.nc   # netCDF file containing hourly forcing and streamflow data
- ...

CAMELS US catchment attributes

(required for Tutorial 06 - How-to Finetuning)

When training a deep learning model, such as an LSTM, on data from more than one basin it is recommended to also use static catchment attributes as model inputs, alongside the meteorological forcings (see e.g. this paper). In tutorial 06, we use the static catchment attributes that are part of the CAMELS US dataset. They are stored in 7 txt files (camels_clim.txt, camels_geol.txt, camels_hydro.txt, camels_name.txt, camels_soil.txt, camels_topo.txt, and camels_vege.txt) that can be downloaded from this page. The attribute files should be saved in a folder called camels_attributes_v2.0. Place this folder into the CAMELS US root directory (at the same level of basin_mean_forcing and usgs_streamflow). So your folder structure should at least look like this:

CAMELS_US/                  # originally named basin_dataset_public_v1p2
- basin_mean_forcing/       # contains the meteorological time series data
- usgs_streamflow/          # contains the streamflow data
- camels_attributes_v2.0/   # extracted catchment attributes
- ...