Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Beta

The current list of use cases is a snap shot of what this page will contain in the future. The team is working on improving the use case pages to provide meaningful examples of how active WMA projects are using data and tools. Tell us about your experience using the catalog!

Use Cases


gdptools-climater-catalog

Keywords: Terraclimate; HUC12; gdptools; HyRiver; spatial-interpolation

Domain: Climate; Hydrology

Language: Python

Description:

In this notebook we will spatially interpolate a set of variables including actual evapotranspiration (aet), potential evapotranspiration (pet), and Palmer Drought Severity Index (PDSI) from the Terraclimate dataset to a fabric of HUC12 in the Delaware River Basin. The bulk of the processing is done with the Geo Data Processing Tools (GDPTools) python package. A highlight of this use-case is introducing the ClimateR-Catalog of common climate datasets.


Introduction

In this notebook we will spatially interpolate a set of variables including actual evapotranspiration (aet), potential evapotranspiration (pet), and Palmer Drought Severity Index (PDSI) from the Terraclimate dataset to a fabric of HUC12 in the Delaware River Basin. The bulk of the processing is done with the Geo Data Processing Tools (GDPTools) python package. A highlight of this use-case is introducing the ClimateR-Catalog of common climate datasets.

We define the gridded climate data as our source data and the target data, is a set of hydrologic unit polygons representing HUC12s upstream of USGS Gage 01482100 (Delaware River at Del Mem Bridge at Wilmington De). The GDPTools package provides tools for interpolating the source data to the target data using area-weighted statistics. The source data requirements of GDPTools objects are initialized using the ClimateR-Catalog.

Spatial interpolation of climate data as demonstrated here is a common and often challenging geo data process performed while building hydrologic model applications. GDPTools was developed for this common processing need.

Data:

  • No data requirements for this use-case.

Getting started

  conda env create -f environment-examples.yml
  conda activate gdptools-examples

Import packages

The imports represent a selection of typical packages for geospatial work in python including GeoPandas and XArray among others. We also use the pnhd package from the HyRiver suite of python packages, to create a target set of HUC12 polygons for our use-case. For generating plots of the source, target and the resulting spatial interpolation we use packages associated with hvPlot. The GDPTools packages are ClimRCatData for initializing the data sources, WeightGen for generating a table of weights which are used to generate the area-weighted interpolations using AggGen. These are all discussed in more detail below.

# Typical geospatial python imports
import geopandas as gpd
import spatialpandas as spd
import pandas as pd
import numpy as np
import xarray as xr
import pyproj

# HyRiver imports
from pynhd import NLDI
from pynhd import WaterData

# High-level plotting tools
import hvplot
import hvplot.xarray
import hvplot.pandas

# GDPTools imports
from gdptools import WeightGen
from gdptools import AggGen
from gdptools import ClimRCatData

import warnings
warnings.filterwarnings("ignore")
hvplot.extension('bokeh')

Data requirements for GDPTools.

The general data requirements for GDPTools are defined by the source data (gridded climate data) and target data (GeoDataFrame of polygons), and some additional parameters discussed below. We begin by identifying our source data here using the climater-catalog. The target data can be any set of polygon geometries read by GeoPandas. The three classes used by GDPTools in this tutorial are:

Data Classes:

  • ClimRCataData: Class to load source and target data

Process Classes:

Each of these classes accepts an instance of the ClimRCataData class.

  • WeightGen: Class to generate a table of weights. For every polygon in the target dataset this table identifies every grid cell that intersects that polygon and each cells weight, which is the fraction of the total polygon area, that each cell represents. These weights are used to calculate area weighted statistics, which is the method of interpolation or aggredation used in the following class.
  • AggGen: Class to perform area-weighted statistics used to interpolate or aggregate the source data to the target data.

Define the source data using the climater-catalog

Note

The GDPTools documentation provids a table with common datasets in the climater-catlog. The table provides a link to any available description/documentation of the datasets as well as a search id. Another way to gain familiarity with the catalog is to open the json-version with a text editor and search for any keywords of interest.

  • Import the climater-catalog from the web into pandas using its parquet version. Looking at the header of the catalog provides an initial view of what’s included in the catalog. Primarily the catalog provides a URL for each asset in the catalog that can be read by xarray to load that particular dataset.
climater_cat = "https://mikejohnson51.github.io/climateR-catalogs/catalog.parquet"
cat = pd.read_parquet(climater_cat)
cat.head()
idassetURLtypevarnamevariabledescriptionunitsmodelensemble...XnY1YnresXresYncolsnrowscrstoptobottomtiled
0maca_dayagg_macav2metdata_huss_BNU-ESM_r1i1p1_historic...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1BNU-ESMr1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT
1maca_dayagg_macav2metdata_huss_CNRM-CM5_r1i1p1_histori...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1CNRM-CM5r1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT
2maca_dayagg_macav2metdata_huss_CSIRO-Mk3-6-0_r1i1p1_hi...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1CSIRO-Mk3-6-0r1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT
3maca_dayagg_macav2metdata_huss_bcc-csm1-1_r1i1p1_histo...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1bcc-csm1-1r1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT
4maca_dayagg_macav2metdata_huss_CanESM2_r1i1p1_historic...http://thredds.northwestknowledge.net:8080/thr...opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1CanESM2r1i1p1...-67.06475825.06307849.3960230.0416660.0416661386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474...TrueT

5 rows × 28 columns

  • Here is a list of all the data columns in the catalog
cat.columns
Index(['id', 'asset', 'URL', 'type', 'varname', 'variable', 'description',
       'units', 'model', 'ensemble', 'scenario', 'T_name', 'duration',
       'interval', 'nT', 'X_name', 'Y_name', 'X1', 'Xn', 'Y1', 'Yn', 'resX',
       'resY', 'ncols', 'nrows', 'crs', 'toptobottom', 'tiled'],
      dtype='object')
  • Define search terms and create catalog dictionary to use with ClimRCatData. All the column headers can be searched using the pandas .query() method. Here we search for the “id”, “terraclim”, and several “variables” of that dataset, [“aet”, “pet”, “PDSI”]. The code then creates a dictionary with the variable names as keys and the resulting catalog data for each key as the values of the dictionary. This is then used by GDPTools objects.
_id = "terraclim"
_vars = ["aet", "pet", "PDSI"]
# create list of catalog parameters for each variable
cat_params = [
    cat.query("id == @_id & variable == @_var").to_dict(orient="records")[0]
    for _var in _vars
]
# create dictionary of variable names and catalog parameters
cat_dict = dict(zip(_vars, cat_params))

Example key::value pair

cat_dict.get("aet")
{'id': 'terraclim',
 'asset': 'agg_terraclimate_aet_1958_CurrentYear_GLOBE',
 'URL': 'http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_aet_1958_CurrentYear_GLOBE.nc',
 'type': 'opendap',
 'varname': 'aet',
 'variable': 'aet',
 'description': 'water_evaporation_amount',
 'units': 'mm',
 'model': None,
 'ensemble': None,
 'scenario': 'total',
 'T_name': 'time',
 'duration': '1958-01-01/2021-12-01',
 'interval': '1 months',
 'nT': 768.0,
 'X_name': 'lon',
 'Y_name': 'lat',
 'X1': -179.97916666666666,
 'Xn': 179.97916666666666,
 'Y1': 89.97916666666667,
 'Yn': -89.97916666666664,
 'resX': 0.041666666666666664,
 'resY': 0.041666666666666664,
 'ncols': 8640.0,
 'nrows': 4320.0,
 'crs': '+proj=longlat +a=6378137 +f=0.00335281066474748 +pm=0 +no_defs',
 'toptobottom': False,
 'tiled': ''}

Define the target data

In this cell we use clients to several https://labs.waterdata.ugsgs.gov tools:

NLDI

GeoServer

More examples of using the HyRiver pynhd package can be found here

Here NLDI is used to define the upstream basin from a USGS gage on the Delaware River. The Geometry of that basin is then used to retrieve all the encapsulated HUC12s.

# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington De
gage_id = "01482100"
nldi = NLDI()
del_basins = nldi.get_basins(gage_id)
huc12_basins = WaterData("wbd12").bygeom(del_basins.geometry[0])
spd_huc12_basins = spd.GeoDataFrame(huc12_basins)
  • Visualize the extracted HUC12s in the Delaware River Basin
drb = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='b', frame_width=300,
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal',
)

drb

Figure 1

Generate a weights file

ClimRCatData is initialized with the source and target data:

Source Data:

  • cat_dict=cat_dict: A dictionary with variable names as keys and climater-catalog data as the values.
  • period=[“1980-01-01”, “1980-12-31”]: In this case the WeightGen process class will simply use the first time in the data to define the data. However, in the next code cell, the data are aggregated over the period define here.

Target Data: - f_feature=huc12_basins: The GeoDataFrame of HUC12 basins we defined above - id_feature=‘huc12’: The column identifier used to identify both the target polygons in the weights file, and the interpolated results in the AggGen process class.

WeightGen is initialized as:

  • user_data=user_data: The ClimRCatData instance we defined above.
  • method=‘serial’: One of three methods including: “serial”, “parallel”, or “dask”. When working with very large source and target data, using parallel or dask methods can improve the speed-up of the calculation of weights.
  • output_file=“terraclime_wghts.csv”: Set’s the name of the resulting file of weights.
  • weight_gen_crs=6931: This is the common, equal area projection that is used to reproject both the source and target data, into an equal area projection for calculating the intersection of the polygons representing the cells of the gridded data with the source HUC12 polygons. 6931 Is a choice we commonly use for the Norther hemisphere.

calculate_weights() method calculates the weights, saveing the terraclime_wghts.csv file and also returning a pandas dataframe of the weights.

Note

Documentation of the ClimRCatData class can be found here and for the WeightGen class here.

user_data = ClimRCatData(
    cat_dict=cat_dict,
    f_feature=huc12_basins,
    id_feature='huc12',
    period=["1980-01-01", "1980-12-31"]
)

wght_gen = WeightGen(
    user_data=user_data,
    method="serial",
    output_file="terraclime_wghts.csv",
    weight_gen_crs=6931
)

wghts = wght_gen.calculate_weights()
Using serial engine
grid cells generated in 0.4124 seconds
Data preparation finished in 1.2177 seconds
Reprojecting to epsg:EPSG:6931 finished in 0.05 second(s)
Validating polygons
     - validating source polygons
     - fixing 0 invalid polygons.
     - validating target polygons
     - fixing 0 invalid polygons.
Validate polygons finished in 0.0290 seconds
Intersections finished in 0.4980 seconds
Weight gen finished in 0.5279 seconds

View the resulting weights file

wghts_df = pd.read_csv('terraclime_wghts.csv')
wghts_df.head()
huc12ijwght
02040205050170240.027234
12040205050171240.118993
22040205050172230.025754
32040205050171230.393885
42040205050171220.230850
  • Inspect the values for a given huc12 id. Note that summing the weights = 1. The weights identify the grid cell by it’s i,j index and the value of the weights is the fraction of the total area of the huc12 of the intersected grid cell.
wghts_group = wghts_df.groupby('huc12')
wghts_group.get_group(20401030203)
huc12ijwght
38072040103020324340.010206
38082040103020321360.023844
38092040103020322350.023939
38102040103020321350.209007
38112040103020320350.066674
38122040103020323340.195367
38132040103020322340.243702
38142040103020321340.149858
38152040103020320340.011366
38162040103020323330.021598
38172040103020322330.044440

Run an area-weighted interpolation of the mean values

Note

More information of the AggGen class can be found here.

Once the weights are calculated, we use the terraclime_daily_wghts.csv file generated by the calculate_weights() method as input to the AggGen process class. AggGen process class takes the following input:

  • user_data = user_data: The same dataclass define above.
  • stat_method = “masked_mean”: see documentation for other statistics available.
  • agg_engine = “serial”: There are serial, parallel, and dask engines available
  • agg_writer = “netcdf”: There are csv, parquet, and json writers also available.
  • weights = “terraclime_wghts.csv”: Weights file generated above.
  • outpath = “.”: Output will be written to this directory
  • file_prefix = “testing_tc_drb_huc12”: In this case the resulting file will be testing_tc_drb_huc12.nc - a netcdf file.
agg_gen = AggGen(
    user_data=user_data,
    stat_method="masked_mean",
    agg_engine="serial",
    agg_writer="netcdf",
    weights= "terraclime_wghts.csv",
    out_path=".",
    file_prefix="testing_tc_drb_huc12"
)
ngdf, ds_out = agg_gen.calculate_agg()
Processing: aet
    Data prepped for aggregation in 0.7537 seconds
    Data aggregated in 0.8322 seconds
Processing: pet
    Data prepped for aggregation in 1.1055 seconds
    Data aggregated in 0.9126 seconds
Processing: PDSI
    Data prepped for aggregation in 0.8593 seconds
    Data aggregated in 0.8631 seconds
Saving netcdf file to testing_tc_drb_huc12.nc

Visualize the results for the variable “aet”

  • Return the DRB HUC12 bounded subset of the TerraClimate variable “aet”.
processed_data = agg_gen.agg_data
da = processed_data.get("aet").da
da
<xarray.DataArray 'aet' (time: 12, lat: 93, lon: 65)> Size: 290kB
dask.array<getitem, shape=(12, 93, 65), dtype=float32, chunksize=(12, 93, 65), chunktype=numpy.ndarray>
Coordinates:
  * lon      (lon) float64 520B -76.69 -76.65 -76.6 ... -74.1 -74.06 -74.02
  * lat      (lat) float64 744B 42.56 42.52 42.48 42.44 ... 38.81 38.77 38.73
  * time     (time) datetime64[ns] 96B 1980-01-01 1980-02-01 ... 1980-12-01
Attributes:
    units:              mm
    description:        Actual Evapotranspiration
    long_name:          water_evaporation_amount
    standard_name:      water_evaporation_amount
    dimensions:         lon lat time
    grid_mapping:       crs
    coordinate_system:  WGS84,EPSG:4326
    _ChunkSizes:        [   1  720 1440]
  • Plot the resulting interpolation of the mean values for the variable “aet” in the DRB HUC12s
spd_ngdf = spd.GeoDataFrame(ngdf.to_crs(4326))
var = 'aet'
time_step = "1980-08-01"
tmin = np.nanmin(da.sel(time=time_step).values)
tmax = np.nanmax(da.sel(time=time_step).values)
print(tmin, tmax)
spd_ngdf['aet'] = ds_out[var].sel(time=time_step)
terraclim_agg = spd_ngdf.hvplot(
    c='aet', geo=True, coastline='50m', alpha=1, cmap='viridis', clim=(tmin, tmax),
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
    title="DRB HUC12 area-weighted average", aspect='equal'
)
terraclim_raw = da.sel(time=time_step).hvplot.quadmesh(
    'lon', 'lat', 'aet', cmap='viridis', alpha=1, grid=True, geo=True, coastline='50m', clim=(tmin, tmax),
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)", xlim=(-76.8, -73.8),
    title="TerraClimate Monthly AET", aspect='equal',
)
(terraclim_raw + terraclim_agg) 

Figure 2