An official website of the United States government
Here’s how you know
Official websites use .gov A .gov website belongs to an official government organization in the
United States.
Secure .gov websites use HTTPS A lock (
) or https:// means you’ve safely connected to the .gov website. Share
sensitive information only on official, secure websites.
Beta
The current list of use cases is a snap shot of what this page will contain in the future. The team is working on improving the use case pages to provide meaningful examples of how active WMA projects are using data and tools.
Tell us about your experience using the catalog!
Use Cases
Regridding gridded climate data to NHD HUC12 polygons
Keywords:
hydrology; model; climate-forcing
Domain:
Climate
Language:
Python
Description:
Use the gdptools python package to aggregate gridMET climate forcing dataset to HUC12 polygons within a drainage basin
In this example, the ClimateR-catalog (CRC) developed by Mike Johnson is used to discover a gridded dataset of interest. The CRC encompasses a wealth of data sources and the metadata required to simplify the process of grid-to-polygon area weighted intersections. The CRC here uses a json catalog schema with common metadata for geospatial analysis. In this example we aggregate gridMET Daily Maximum Temperature data to Delaware River Basin.
gdptools
gdptools is a python package for calculating area-weighted statistics of grid-to-polygon intersections. gdptools contains methods for incorporating the json parameter and grid entries for a given dataset and variable to simplify the process. That is demonstrated in this notebook. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. An example can be found here: X.
NLDI
We use a client to the U.S. Geological Survey’s Hydro Network-Linked Data Index and a python client (pynhd) available through the HyRiver suite of packages.
name: GridMET Daily Maximum Temperature aggregation
Example grid-to-polygon interpolation. A) Huc12 basins for Delaware River Watershed. B) Gridded monthly water evaporation amount (mm) from TerraClimate dataset. C) Area-weighted-average interpolation of gridded TerraClimate data to Huc12 polygons.
::: {#90cd65d2 .cell execution_count=1}
``` {.python .cell-code}
import warnings
from osgeo import gdal
import pandas as pd
import numpy as np
from pynhd import NLDI
from pynhd import WaterData
from bokeh.io import export_svgs
import hvplot.xarray
import hvplot.pandas
import hvplot.dask
from gdptools import WeightGen
from gdptools import AggGen
from gdptools import ClimRCatData
warnings.filterwarnings("ignore")
:::
For reference list the version of gdptools used in this notebook.
!conda list gdptools
# packages in environment at /opt/homebrew/Caskroom/miniforge/base/envs/gdptools-examples:
#
# Name Version Build Channel
gdptools 0.2.7 pyhd8ed1ab_0 conda-forge
Use HyRiver pynhd package to find the HUC12 basins comprising the Delaware River upstream of gage # 01482100 at Delaware Memorial Bridge.
# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington Degage_id ="01482100"nldi = NLDI()del_basins = nldi.get_basins(gage_id)huc12_basins = WaterData("wbd12").bygeom(del_basins.geometry[0])
Query the OPeNDAP catalog for variables aet, pet, PDSI available in the TerraClimate dataset
In the following steps we will import the json catalog files into pandas to enable the query function to return the desired json entries for the variables in the dataset of interest.
Query the OPeNDAP catalog for variables tmmn (daily_maximum_temperature), tmmn (daily_minimum_temperature), and pr (precipitation)
# Create a dictionary of parameter dataframes for each variable_id ="gridmet"tvars = ["tmmn", "tmmx", "pr"]cat_params = [ cat.query("id == @_id & variable == @_var" ).to_dict(orient="records")[0] for _var in tvars]cat_dict =dict(zip(tvars, cat_params))# Output an example of the cat_param.json entry for "tmmn".cat_dict.get("tmmn")
Calculate the weights that will be used in the area-weighted statistics
A note on the methods applied here:
There are currently two data classes, ODAPCatData, for use with input data derived from the OPeNDAP catalog, and UserCatData for user-defined input data. The non-catalog examples demonstrate the use of UserCatData, and here ODAPCatData is demonstrated.
WeightGen() is the tool for generating weights. There is both a “serial” and “parallel” method available. If generating weights over a scale similar to the DRB huc12 basins here, serial using the serial method is appropriate. If for example weights are generated for huc12 basins over all of CONUS then “parallel” will provide a significant speed-up.
The calculate_weights() method has the option to set interesection=True, which will save the intersected polygon geometries to wght_gen and the can be retrieved with the wght_gen.intersections property. We set it to True here so we could use it to evaluate the calculated weights and resulting aggregated values later in this notebook. Normally, one could just calculate weights() and the default value for intersections is False.
Using serial engine
grid cells generated in 0.3189 seconds
Data preparation finished in 2.1297 seconds
Reprojecting to epsg:EPSG:6931 finished in 0.04 second(s)
Validating polygons
- validating source polygons
- fixing 0 invalid polygons.
- validating target polygons
- fixing 0 invalid polygons.
Validate polygons finished in 0.0260 seconds
Intersections finished in 0.4715 seconds
Weight gen finished in 0.4986 seconds
Calculate the area-weighted averages
For each variable in the TerraClimate dataset and for each HUC12 in the Delaware River basin
Output the result as a netcdf file.
The data can also be output as a .csv file.
Returned values
ngdf: The original huc12_basins geodataframe is sorted by user_data.id_feature, and dissolved by user_data.id_feature. So the returned geodataframe is a modified version of the original huc12_basins, and is returned for easy evaluation and for plotting.
vallist: A list of numpy arrays representing the aggregated values and dimensioned by (time, user_data.id_feature). Again return for evaluation and plotting purposes.
properties
agg_data. The user_data prepared for aggregating. Can be used to get the gridded dataset subsetted by time and feature bounding box.
Processing: tmmn
Data prepped for aggregation in 0.9020 seconds
Data aggregated in 0.4716 seconds
Processing: tmmx
Data prepped for aggregation in 0.7424 seconds
Data aggregated in 0.4209 seconds
Processing: pr
Data prepped for aggregation in 0.7779 seconds
Data aggregated in 0.5393 seconds
Saving netcdf file to testing_gm_drb_huc12.nc
2024_04_16_09_50_27 Original filec created by gdptools package: https://code.usgs.gov/wma/nhgf/toolsteam/gdptools
agg_data
agg_data is a property of AggGen(). It is the prepared user_data for aggregation and includes the dataarrays that have been subsetted by feature bounding box and time-period of the aggregated data. Here we use it to generate a plot that illustrated the aggregated data.