Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Beta

The current list of use cases is a snap shot of what this page will contain in the future. The team is working on improving the use case pages to provide meaningful examples of how active WMA projects are using data and tools. Tell us about your experience using the catalog!

Use Cases


Regridding gridded climate data to NHD HUC12 polygons

Keywords: hydrology; model; climate-forcing

Domain: Climate

Language: Python

Description:

Use the gdptools python package to aggregate gridMET climate forcing dataset to HUC12 polygons within a drainage basin

Linked Catalog Datasets:

Linked Catalog Tools:


ClimateR Catalog - GridMET data

In this example, the ClimateR-catalog (CRC) developed by Mike Johnson is used to discover a gridded dataset of interest. The CRC encompasses a wealth of data sources and the metadata required to simplify the process of grid-to-polygon area weighted intersections. The CRC here uses a json catalog schema with common metadata for geospatial analysis. In this example we aggregate gridMET Daily Maximum Temperature data to Delaware River Basin.

gdptools

gdptools is a python package for calculating area-weighted statistics of grid-to-polygon intersections. gdptools contains methods for incorporating the json parameter and grid entries for a given dataset and variable to simplify the process. That is demonstrated in this notebook. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. An example can be found here: X.

NLDI

We use a client to the U.S. Geological Survey’s Hydro Network-Linked Data Index and a python client (pynhd) available through the HyRiver suite of packages.

name: GridMET Daily Maximum Temperature aggregation


Example grid-to-polygon interpolation. A) Huc12 basins for Delaware River Watershed. B) Gridded monthly water evaporation amount (mm) from TerraClimate dataset. C) Area-weighted-average interpolation of gridded TerraClimate data to Huc12 polygons.


::: {#90cd65d2 .cell execution_count=1}
``` {.python .cell-code}
import warnings

from osgeo import gdal
import pandas as pd
import numpy as np
from pynhd import NLDI
from pynhd import WaterData

from bokeh.io import export_svgs
import hvplot.xarray
import hvplot.pandas
import hvplot.dask
from gdptools import WeightGen
from gdptools import AggGen
from gdptools import ClimRCatData


warnings.filterwarnings("ignore")

:::

For reference list the version of gdptools used in this notebook.

!conda list gdptools
# packages in environment at /opt/homebrew/Caskroom/miniforge/base/envs/gdptools-examples:
#
# Name                    Version                   Build  Channel
gdptools                  0.2.7              pyhd8ed1ab_0    conda-forge

Use HyRiver pynhd package to find the HUC12 basins comprising the Delaware River upstream of gage # 01482100 at Delaware Memorial Bridge.

# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington De
gage_id = "01482100"
nldi = NLDI()
del_basins = nldi.get_basins(gage_id)
huc12_basins = WaterData("wbd12").bygeom(del_basins.geometry[0])

Create a plot of the HUC12 Basins.

#| tags: []
from holoviews.element.tiles import EsriTerrain
drb = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
    title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal'
)
EsriTerrain() * drb

Query the OPeNDAP catalog for variables aet, pet, PDSI available in the TerraClimate dataset

In the following steps we will import the json catalog files into pandas to enable the query function to return the desired json entries for the variables in the dataset of interest.

climater_cat = "https://mikejohnson51.github.io/climateR-catalogs/catalog.parquet"
cat = pd.read_parquet(climater_cat)
cat.head()
idassetURLtypevarnamevariabledescriptionunitsmodelensemble...XnY1YnresXresYncolsnrowscrstoptobottomtiled
0maca_dayagg_macav2metdata_huss_BNU-ESM_r1i1p1_historic...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1BNU-ESMr1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT
1maca_dayagg_macav2metdata_huss_CNRM-CM5_r1i1p1_histori...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1CNRM-CM5r1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT
2maca_dayagg_macav2metdata_huss_CSIRO-Mk3-6-0_r1i1p1_hi...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1CSIRO-Mk3-6-0r1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT
3maca_dayagg_macav2metdata_huss_bcc-csm1-1_r1i1p1_histo...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1bcc-csm1-1r1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT
4maca_dayagg_macav2metdata_huss_CanESM2_r1i1p1_historic...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1CanESM2r1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT

5 rows × 28 columns

Query the OPeNDAP catalog for variables tmmn (daily_maximum_temperature), tmmn (daily_minimum_temperature), and pr (precipitation)

# Create a dictionary of parameter dataframes for each variable
_id = "gridmet"
tvars = ["tmmn", "tmmx", "pr"]
cat_params = [
    cat.query(
        "id == @_id & variable == @_var"
    ).to_dict(orient="records")[0] for _var in tvars
]

cat_dict = dict(zip(tvars, cat_params))

# Output an example of the cat_param.json entry for "tmmn".
cat_dict.get("tmmn")
{'id': 'gridmet',
 'asset': 'agg_met_tmmn_1979_CurrentYear_CONUS',
 'URL': 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_met_tmmn_1979_CurrentYear_CONUS.nc',
 'type': 'opendap',
 'varname': 'daily_minimum_temperature',
 'variable': 'tmmn',
 'description': 'tmmn',
 'units': 'K',
 'model': None,
 'ensemble': None,
 'scenario': None,
 'T_name': 'day',
 'duration': '1979-01-01/..',
 'interval': '1 days',
 'nT': nan,
 'X_name': 'lon',
 'Y_name': 'lat',
 'X1': -124.76666663333334,
 'Xn': -67.05833330000002,
 'Y1': 49.400000000000006,
 'Yn': 25.066666666666666,
 'resX': 0.041666666666666664,
 'resY': 0.04166666666666668,
 'ncols': 1386.0,
 'nrows': 585.0,
 'crs': '+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs',
 'toptobottom': False,
 'tiled': ''}

Calculate the weights that will be used in the area-weighted statistics

A note on the methods applied here:

There are currently two data classes, ODAPCatData, for use with input data derived from the OPeNDAP catalog, and UserCatData for user-defined input data. The non-catalog examples demonstrate the use of UserCatData, and here ODAPCatData is demonstrated.

WeightGen() is the tool for generating weights. There is both a “serial” and “parallel” method available. If generating weights over a scale similar to the DRB huc12 basins here, serial using the serial method is appropriate. If for example weights are generated for huc12 basins over all of CONUS then “parallel” will provide a significant speed-up.

The calculate_weights() method has the option to set interesection=True, which will save the intersected polygon geometries to wght_gen and the can be retrieved with the wght_gen.intersections property. We set it to True here so we could use it to evaluate the calculated weights and resulting aggregated values later in this notebook. Normally, one could just calculate weights() and the default value for intersections is False.

user_data = ClimRCatData(
    cat_dict=cat_dict,
    f_feature=huc12_basins,
    id_feature='huc12',
    period=["1979-01-01", "1979-01-07"]
)

wght_gen = WeightGen(
    user_data=user_data,
    method="serial",
    # output_file="test_gm_weights.csv",
    weight_gen_crs=6931
)

wghts = wght_gen.calculate_weights()
Using serial engine
grid cells generated in 0.3189 seconds
Data preparation finished in 2.1297 seconds
Reprojecting to epsg:EPSG:6931 finished in 0.04 second(s)
Validating polygons
     - validating source polygons
     - fixing 0 invalid polygons.
     - validating target polygons
     - fixing 0 invalid polygons.
Validate polygons finished in 0.0260 seconds
Intersections finished in 0.4715 seconds
Weight gen finished in 0.4986 seconds

Calculate the area-weighted averages

  • For each variable in the TerraClimate dataset and for each HUC12 in the Delaware River basin
  • Output the result as a netcdf file.
  • The data can also be output as a .csv file.

Returned values

  • ngdf: The original huc12_basins geodataframe is sorted by user_data.id_feature, and dissolved by user_data.id_feature. So the returned geodataframe is a modified version of the original huc12_basins, and is returned for easy evaluation and for plotting.
  • vallist: A list of numpy arrays representing the aggregated values and dimensioned by (time, user_data.id_feature). Again return for evaluation and plotting purposes.

properties

  • agg_data. The user_data prepared for aggregating. Can be used to get the gridded dataset subsetted by time and feature bounding box.
agg_gen = AggGen(
    user_data=user_data,
    stat_method="masked_mean",
    agg_engine="serial",
    agg_writer="netcdf",
    weights=wghts,
    out_path='.',
    file_prefix="testing_gm_drb_huc12"
)
ngdf, ds_out = agg_gen.calculate_agg()
Processing: tmmn
    Data prepped for aggregation in 0.9020 seconds
    Data aggregated in 0.4716 seconds
Processing: tmmx
    Data prepped for aggregation in 0.7424 seconds
    Data aggregated in 0.4209 seconds
Processing: pr
    Data prepped for aggregation in 0.7779 seconds
    Data aggregated in 0.5393 seconds
Saving netcdf file to testing_gm_drb_huc12.nc
ds_out
<xarray.Dataset> Size: 81kB
Dimensions:                    (time: 7, huc12: 460)
Coordinates:
  * time                       (time) datetime64[ns] 56B 1979-01-01 ... 1979-...
  * huc12                      (huc12) object 4kB '020200050104' ... '0206000...
Data variables:
    daily_minimum_temperature  (time, huc12) float64 26kB 275.3 275.9 ... 270.8
    daily_maximum_temperature  (time, huc12) float64 26kB 281.5 282.6 ... 279.8
    precipitation_amount       (time, huc12) float64 26kB 16.26 13.95 ... 33.77
Attributes:
    Conventions:  CF-1.8
    featureType:  timeSeries
    history:      2024_04_16_09_50_27 Original filec created  by gdptools pac...

agg_data

agg_data is a property of AggGen(). It is the prepared user_data for aggregation and includes the dataarrays that have been subsetted by feature bounding box and time-period of the aggregated data. Here we use it to generate a plot that illustrated the aggregated data.

tvar = "tmmx"
tvarname = "daily_maximum_temperature"
aggdata = agg_gen.agg_data
da = aggdata.get(tvar).da
tcoord = aggdata.get(tvar).cat_param.T_name
print(tcoord)
day: str = "1979-01-07"
tsel = {tcoord: day}
tmin = np.nanmin(da.sel(**tsel).values)
tmax = np.nanmax(da.sel(**tsel).values)

print(tmin, tmax)
day
273.1 286.0
ngdf[tvar] = ds_out[tvarname].sel(time=day)
gm_agg = ngdf.to_crs(4326).hvplot(
    c='tmmx', geo=True, coastline='50m', frame_width=300, alpha=1, cmap='viridis', clim=(tmin, tmax),
    xlabel="longitude", ylabel="latitude", clabel="Daily maximum temperature (K)", #xlim=(-76.8, -73.8),
    title="DRB HUC12 area-weighted average", aspect='equal'
)
gm_raw = da.sel(**tsel).hvplot.quadmesh(
    'lon', 'lat', cmap='viridis', alpha=1, grid=True, geo=True, coastline='50m', frame_width=300, clim=(tmin, tmax),
    xlabel="longitude", ylabel="latitude", clabel="Daily maximum temperature (K)", #xlim=(-76.8, -73.8),
    title="gridMET", aspect='equal'
)
# Display data
from holoviews.element.tiles import EsriTerrain

(EsriTerrain() * gm_raw) + (EsriTerrain() * gm_agg)