Beta

The current list of use cases is a snap shot of what this page will contain in the future. The team is working on improving the use case pages to provide meaningful examples of how active WMA projects are using data and tools. Tell us about your experience using the catalog!

Use Cases

gdptools-nlcd

Keywords: NLCD; gdptools; spatial-interpolation; zonal statistics

Domain: Land Cover; Hydrology

Language: Python

Description:

The 2019 Land Cover product from the National Land Cover Database is interpolated to the Delaware River Basin HUC12s using the Geo Data Processing Tools (gdptools) grid-to-polygon area-weighted statistics tools.

Additional Information:

NHGF Tools and applications demo: Interpolating NLCD data to Delaware HUC12s

Introduction

In this example, the 2019 Land Cover product from the National Land Cover Database is interpolated to the Delaware River Basin HUC12s using the Geo Data Processing Tools (gdptools) grid-to-polygon area-weighted statistics tools.

Geo Data Processing Tools (gdptools)

gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the OPeNDAP catalog to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.

Open Source Python Tools

xarray: Package to work with labelled multi-dimensional arrays.
geopandas: Package to work with geospatial data.
HyRiver: Hydroclimate Data Retriever, suite of libraries for retrieving geospatial data from various web services. pynhd library used to retrieve NHDplus water basin polygons and data from the Hydro Network-Linked Data Index (NLDI).
Holoviz: Python data visualization suite. Hvplot used to plot geospatial data.

Data

2019 NLCD Land Cover: 2019 conterminous U.S. land cover at a 30-meter spatial resolution with a 16-class legend based on a modified Anderson Level II classification system.

Maintenance Schedule

The intent of this notebook is simply a demonstration of an application or workflow. As we progress in the build-out of NGHF project tools, this notebook may be updated or replaced. There is no other planned maintenance at this time.

Description

This notebook demonstrates usage of gdptools to interpolate NLCD 30m land cover data to HUC12s for the Delaware River Basin. There are three sections: 1) Data exploration and pre-processing 2) Calculating zonal statistics for gridded land cover on HUC12 watershed basins 3) Visualization of results

Requirements

Linux OS or Windows OS
Conda-based package manager such as miniconda or mamba

Create conda environment

Download environment-examples.yml
To create the conda environment, follow the steps below.

conda env create -f environment-examples.yml
conda activate gdptools-examples
jupyter lab

1) Data Exploration and data preparation

Import Libraries

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import pandas as pd
from shapely import box
import cartopy
import hvplot.pandas
import hvplot.xarray
import numpy as np
import warnings
import rasterio as rio
import xarray as xr
import rioxarray as rxr
from pynhd import NLDI
from pynhd import WaterData
import pygeohydro as gh
from pygeohydro import helpers

import hvplot.xarray
import hvplot.pandas

from gdptools import WeightGen
from gdptools import AggGen
from gdptools import UserTiffData
from gdptools import UserCatData

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.colheader_justify', 'center')
pd.set_option('display.precision', 3)

warnings.filterwarnings("ignore")

Create GeoDataFrame of Delaware River Basin HUC12s

# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington De
gage_id = "01482100"
nldi = NLDI()
del_basins = nldi.get_basins(gage_id)
huc12_basins = WaterData("wbd12").bygeom(del_basins.geometry[0])
huc12_basins = huc12_basins[huc12_basins.huc12.str.match('020401')]
shp_crs = huc12_basins.crs.to_wkt()
# huc12_basins = huc12_basins.head(n=200)
len(huc12_basins)

huc12_basins.head()

	geometry	huc12	tohuc	areaacres	areasqkm	name	hutype	humod	states
0	MULTIPOLYGON (((-74.57376 40.20723, -74.57695 …	020401050802	020401050803	8470.653	34.280	Miry Run	S	TF	NJ
14	MULTIPOLYGON (((-74.81968 40.35140, -74.81962 …	020401050910	020401050911	23997.977	97.116	Jacobs Creek-Delaware River	S	TF	NJ,PA
24	MULTIPOLYGON (((-75.53251 40.49621, -75.53704 …	020401060702	020401060703	28624.195	115.838	Liebert Creek-Little Lehigh Creek	S	KA, NM	PA
25	MULTIPOLYGON (((-75.26173 40.61666, -75.25607 …	020401060811	020401060813	37057.205	149.965	Saucon Creek	S	KA, NM	PA
26	MULTIPOLYGON (((-75.00760 40.73668, -75.00801 …	020401050402	020401050605	22267.354	90.113	Lower Pohatcong Creek	S	NM	NJ

Create a plot of the HUC12 basins.

drb = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
    title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal',
    tiles='EsriTerrain',
)

drb

Pull NLCD raster data using ClimateR Catalog

The following dataset assets are available in the ClimateR catalog from the National Land Cover Database (NLCD) for the Contiguous United States (L48): - Land Cover - Tree Canopy - Impervious Surface

L48 Land Cover is available for 2001, 2004, 2006, 2008, 2011, 2016, and 2019 in the ClimateR catalog, with more limited temporal coverage for other variables at the time of writing in 2023, and more limited variables and years for Alaska (AK), Hawaii (HI), and Puerto Rico (PR).

Pandas is used to query the catalog based on the asset of interest, and the json metadata is returned for the asset.

#%% 
climater_cat = "https://mikejohnson51.github.io/climateR-catalogs/catalog.parquet"
cat = pd.read_parquet(climater_cat)
cat.head()

	id	asset	URL	type	varname	variable	description	units	model	ensemble	scenario	T_name	duration	interval	nT	X_name	Y_name	X1	Xn	Y1	Yn	resX	resY	ncols	nrows	crs	toptobottom	tiled
0	maca_day	agg_macav2metdata_huss_BNU-ESM_r1i1p1_historic…	http://thredds.northwestknowledge.net:8080/thr…	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	BNU-ESM	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474…	True	T
1	maca_day	agg_macav2metdata_huss_CNRM-CM5_r1i1p1_histori…	http://thredds.northwestknowledge.net:8080/thr…	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	CNRM-CM5	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474…	True	T
2	maca_day	agg_macav2metdata_huss_CSIRO-Mk3-6-0_r1i1p1_hi…	http://thredds.northwestknowledge.net:8080/thr…	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	CSIRO-Mk3-6-0	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474…	True	T
3	maca_day	agg_macav2metdata_huss_bcc-csm1-1_r1i1p1_histo…	http://thredds.northwestknowledge.net:8080/thr…	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	bcc-csm1-1	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474…	True	T
4	maca_day	agg_macav2metdata_huss_CanESM2_r1i1p1_historic…	http://thredds.northwestknowledge.net:8080/thr…	opendap	specific_humidity	huss	Daily Mean Near-Surface Specific Humidity	kg kg-1	CanESM2	r1i1p1	historical	time	1950-01-01/2005-12-31	1 days	20454.0	lon	lat	-124.772	-67.065	25.063	49.396	0.042	0.042	1386.0	585.0	+proj=longlat +a=6378137 +f=0.0033528106647474…	True	T

Query catalog for NLCD 2019 Land Cover L48

_id = "NLCD"
_asset = "2019 Land Cover L48"

# an example query returns a pandas dataframe.
pd.set_option('display.max_colwidth', 100)
tc = cat.query("id == @_id & asset == @_asset")
tc

	id	asset	URL	type	varname	variable	description	units	model	ensemble	scenario	T_name	duration	interval	nT	X_name	Y_name	X1	Xn	Y1	Yn	resX	resY	ncols	nrows	crs	toptobottom	tiled
6257	NLCD	2019 Land Cover L48	/vsicurl/https://storage.googleapis.com/feddata-r/nlcd/2019_Land_Cover_L48.tif	vrt	Land_Cover	Land_Cover	USGS NLCD Land Cover 2019 L48		None	None	None	None	None	None	NaN	None	None	-2.493e+06	2.343e+06	-2.493e+06	3.310e+06	30.0	30.0	161190.0	104424.0	+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs	None

Load queried NLCD catalog asset into Rioxarray

ds = rxr.open_rasterio(tc.URL.values[0])
data_crs = ds.spatial_ref.crs_wkt
ds

Set up input data

For climater-catalog entries that are of type vrt or for any rasters, gdptools provides the UserTiffData class. The UserTiffData class requires the user to define parameters for both the source and target data. Here the source data parameters are: - var: Not currently used so a dummy variable can be passed - ds: the xarray dataset - proj_ds: projection for the xarray dataset which can be retrieved from the crs_wkt attribute of the spatial_ref var, or it can be retrieved from the crs attribute of the climater-catalog query - x_coord, y_coord: Coordinate names determined from the xarray dataset - band, bname: Both determined from the xarray dataset

The target data parameters are: - f_feature: The Geopandas dataframe - id_feature: The string of the id used to associate the results of the zonal stats - proj_feature: The projection or crs of the source data.

x_coord = 'x'
y_coord = 'y'
band='band'
var = 'descriptor_2019'
shp_poly_idx = 'huc12'

user_data = UserTiffData(
    var=var,
    ds=ds,
    proj_ds=data_crs,
    x_coord=x_coord,
    y_coord=y_coord,
    band=1,
    bname=band,
    f_feature=huc12_basins,
    proj_feature=shp_crs,
    id_feature=shp_poly_idx,
)

user_data

<gdptools.data.user_data.UserTiffData at 0x7f5b10f373a0>

2) Calculating zonal statistics for gridded land cover on HUC12 watershed basins

Calculate zonal majority land cover classes

ZonalGen() class is the object to calculate the zonal mean using the calculate_zonal() method.

#| tags: []
from gdptools import ZonalGen

zonal_gen = ZonalGen(
    user_data=user_data,
    zonal_engine="serial",
    zonal_writer="csv",
    out_path=".",
    file_prefix="Delaware_NLCD_stats"
)

stats = zonal_gen.calculate_zonal(categorical=True)

data prepped for zonal in 0.0339 seconds
  converted tiff to points in 17.4888 seconds
       - fixing 0 invalid polygons.
  overlaps calculated in 16.5268 seconds
  categorical zonal stats calculated in 1.2142 seconds
  []
  number of missing values: 0
  fill missing values with nearest neighbors in 0.0048 seconds
  Total time for serial zonal stats calculation 40.7026 seconds

print(stats.head(n=20))

               count  unique  top   freq 
huc12                                    
020401010101   93767    15    41    39921
020401010102   72336    15    41    41343
020401010103   73422    15    41    44120
020401010104   78813    15    41    48206
020401010105  150143    15    41   101734
020401010106   98550    15    41    51262
020401010201   86184    15    41    52400
020401010202   86753    15    41    54007
020401010203   72031    15    41    45984
020401010204   64022    15    41    37792
020401010205  140925    15    41    85334
020401010206  142433    15    41    80717
020401010207  151789    14    41    92045
020401010301   88614    15    41    51577
020401010302  100324    15    41    55824
020401010303   93982    15    41    48143
020401010304   63607    15    41    27262
020401010305  105802    15    41    66762
020401010306   52155    14    41    32936
020401010307  104980    14    41    64376

Add majority land cover category to geodataframe

huc12_basins.sort_values(by=shp_poly_idx, inplace=True)
stats.sort_values(by=shp_poly_idx, inplace=True)
huc12_basins["TEXT"] = stats["top"].values
huc12_basins.head()

	geometry	huc12	tohuc	areaacres	areasqkm	name	hutype	humod	states	TEXT
417	MULTIPOLYGON (((-74.59898 42.46011, -74.59742 42.45643, -74.59778 42.45526, -74.59679 42.45407, …	020401010101	020401010102	20836.201	84.321	Town Brook-Headwaters West Brach Delaware River	S	NM	NY	41
416	MULTIPOLYGON (((-74.69614 42.42722, -74.68906 42.42808, -74.68111 42.42365, -74.67442 42.42411, …	020401010102	020401010103	16095.450	65.136	Betty Brook-Headwaters West Brach Delaware River	S	NM	NY	41
338	MULTIPOLYGON (((-74.77568 42.37921, -74.77194 42.37934, -74.77123 42.37737, -74.76888 42.37515, …	020401010103	020401010104	16327.168	66.074	Rose Brook-Headwaters West Brach Delaware River	S	NM	NY	41
355	MULTIPOLYGON (((-74.80026 42.40748, -74.79412 42.40771, -74.79004 42.40642, -74.78631 42.40798, …	020401010104	020401010106	17537.197	70.971	Elk Creek-Headwaters West Brach Delaware River	S	NM	NY	41
406	MULTIPOLYGON (((-74.73673 42.31516, -74.72832 42.31467, -74.72311 42.31544, -74.71862 42.31483, …	020401010105	020401010106	33390.297	135.126	Upper Little Delaware River	S	NM	NY	41

3) Visualization of results

Import NLCD land cover lookup information

nlcd_lookup = helpers.nlcd_helper()['classes']
del nlcd_lookup['127']
del nlcd_lookup['51']
del nlcd_lookup['72']
del nlcd_lookup['73']
del nlcd_lookup['74']
print(len(nlcd_lookup))
nlcd_lookup

{'11': 'Open Water - All areas of open water, generally with less than 25% cover or vegetation or soil',
 '12': 'Perennial Ice/Snow - All areas characterized by a perennial cover of ice and/or snow, generally greater than 25% of total cover.',
 '21': 'Developed, Open Space - Includes areas with a mixture of some constructed materials, but mostly vegetation in the form of lawn grasses.  Impervious surfaces account for less than 20 percent of total cover.  These areas most commonly include large-lot single-family housing units, parks, golf courses, and vegetation planted in developed settings for recreation, erosion control, or aesthetic purposes.',
 '22': 'Developed, Low Intensity -Includes areas with a mixture of constructed materials and vegetation.  Impervious surfaces account for 20-49 percent of total cover.  These areas most commonly include single-family housing units.',
 '23': 'Developed, Medium Intensity - Includes areas with a mixture of constructed materials and vegetation. Impervious surfaces account for 50-79 percent of the total cover.  These areas most commonly include single-family housing units.',
 '24': 'Developed, High Intensity - Includes highly developed areas where people reside or work in high numbers. Examples include apartment complexes, row houses and commercial/industrial.  Impervious surfaces account for 80 to 100 percent of the total cover.',
 '31': 'Barren Land (Rock/Sand/Clay) - Barren areas of bedrock, desert pavement, scarps, talus, slides, volcanic material, glacial debris, sand dunes, strip mines, gravel pits and other accumulations of earthen material. Generally, vegetation accounts for less than 15% of total cover.',
 '41': 'Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75 percent of the tree species shed foliage simultaneously in response to seasonal change.',
 '42': 'Evergreen Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75 percent of the tree species maintain their leaves all year. Canopy is never without green foliage.',
 '43': 'Mixed Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. Neither deciduous nor evergreen species are greater than 75 percent of total tree cover.',
 '45': 'Shrub-Forest - Areas identified as currently shrub, but showing spectral properties of transitioning to future forest.',
 '46': 'Herbaceous-Forest - Areas identified as currently grass, but showing spectral properties of transitioning from being either a past forest or to future shrub-forest.',
 '52': 'Shrub/Scrub - Areas dominated by shrubs; less than 5 meters tall with shrub canopy typically greater than 20% of total vegetation. This class includes true shrubs and perennially stunted trees from environmental conditions.',
 '71': 'Grassland/Herbaceous - Areas dominated by grammanoid or herbaceous vegetation, generally greater than 80% of total vegetation.  These areas are not subject to intensive management such as tilling, but can be utilized for grazing.',
 '81': 'Pasture/Hay - Areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing or the production of seed or hay crops, typically on a perennial cycle. Pasture/hay vegetation accounts for greater than 20 percent of total vegetation.',
 '82': 'Cultivated Crops - Areas used for the production of annual crops, such as corn, soybeans, vegetables, tobacco, and cotton, and also perennial woody crops such as orchards and vineyards. Crop vegetation accounts for greater than 20 percent of total vegetation. This class also includes all land being actively tilled.',
 '90': 'Woody Wetlands - Areas where forest or shrub land vegetation accounts for greater than 20 percent of vegetative cover and the soil or substrate is periodically saturated with or covered with water.',
 '95': 'Emergent Herbaceous Wetlands - Areas where perennial herbaceous vegetation accounts for greater than 80 percent of vegetative cover and the soil or substrate is periodically saturated with or covered with water.'}

Set colors for land cover classes

hex codes follow color scheme from us-land-cover.

len(nlcd_lookup)
nlcd_colors = ['#5475a8', '#ffffff', '#e8d1d1', '#e29e8c', '#ff0000', '#b50000', '#d2cdc0', '#85c77e', '#38814e', '#d4e7b0', '#dcca8f', '#d4e7b0', '#dcca8f', '#e2e2c1', '#fbf65d', '#ca9146', '#c8e6f8', '#64b3d5']
len(nlcd_colors)

Create dataframe from NLCD lookup dictionary

nlcd_lookup_df = pd.DataFrame.from_dict({'code': [int(i) for i in list(nlcd_lookup.keys())], 'category': list(nlcd_lookup.values()), 'colors': nlcd_colors})
nlcd_lookup_df['land cover'] = nlcd_lookup_df['category'].apply(lambda name: name.split(' -')[0])
nlcd_lookup_df

	code	category	colors	land cover
0	11	Open Water - All areas of open water, generally with less than 25% cover or vegetation or soil	#5475a8	Open Water
1	12	Perennial Ice/Snow - All areas characterized by a perennial cover of ice and/or snow, generally …	#ffffff	Perennial Ice/Snow
2	21	Developed, Open Space - Includes areas with a mixture of some constructed materials, but mostly …	#e8d1d1	Developed, Open Space
3	22	Developed, Low Intensity -Includes areas with a mixture of constructed materials and vegetation….	#e29e8c	Developed, Low Intensity
4	23	Developed, Medium Intensity - Includes areas with a mixture of constructed materials and vegetat…	#ff0000	Developed, Medium Intensity
5	24	Developed, High Intensity - Includes highly developed areas where people reside or work in high …	#b50000	Developed, High Intensity
6	31	Barren Land (Rock/Sand/Clay) - Barren areas of bedrock, desert pavement, scarps, talus, slides, …	#d2cdc0	Barren Land (Rock/Sand/Clay)
7	41	Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…	#85c77e	Deciduous Forest
8	42	Evergreen Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…	#38814e	Evergreen Forest
9	43	Mixed Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 2…	#d4e7b0	Mixed Forest
10	45	Shrub-Forest - Areas identified as currently shrub, but showing spectral properties of transitio…	#dcca8f	Shrub-Forest
11	46	Herbaceous-Forest - Areas identified as currently grass, but showing spectral properties of tran…	#d4e7b0	Herbaceous-Forest
12	52	Shrub/Scrub - Areas dominated by shrubs; less than 5 meters tall with shrub canopy typically gre…	#dcca8f	Shrub/Scrub
13	71	Grassland/Herbaceous - Areas dominated by grammanoid or herbaceous vegetation, generally greater…	#e2e2c1	Grassland/Herbaceous
14	81	Pasture/Hay - Areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing …	#fbf65d	Pasture/Hay
15	82	Cultivated Crops - Areas used for the production of annual crops, such as corn, soybeans, vegeta…	#ca9146	Cultivated Crops
16	90	Woody Wetlands - Areas where forest or shrub land vegetation accounts for greater than 20 percen…	#c8e6f8	Woody Wetlands
17	95	Emergent Herbaceous Wetlands - Areas where perennial herbaceous vegetation accounts for greater …	#64b3d5	Emergent Herbaceous Wetlands

Create categorical colormap from lookup dataframe

color_lookup = {}
for i in range(len(nlcd_lookup_df)):
    key = nlcd_lookup_df['land cover'][i]
    color_lookup[key]=nlcd_lookup_df['colors'][i]

print(color_lookup)

{'Open Water': '#5475a8', 'Perennial Ice/Snow': '#ffffff', 'Developed, Open Space': '#e8d1d1', 'Developed, Low Intensity': '#e29e8c', 'Developed, Medium Intensity': '#ff0000', 'Developed, High Intensity': '#b50000', 'Barren Land (Rock/Sand/Clay)': '#d2cdc0', 'Deciduous Forest': '#85c77e', 'Evergreen Forest': '#38814e', 'Mixed Forest': '#d4e7b0', 'Shrub-Forest': '#dcca8f', 'Herbaceous-Forest': '#d4e7b0', 'Shrub/Scrub': '#dcca8f', 'Grassland/Herbaceous': '#e2e2c1', 'Pasture/Hay': '#fbf65d', 'Cultivated Crops': '#ca9146', 'Woody Wetlands': '#c8e6f8', 'Emergent Herbaceous Wetlands': '#64b3d5'}

Add land cover lookup information to geodataframe

huc12_basins = huc12_basins.merge(nlcd_lookup_df, left_on='TEXT', right_on='code', how='inner', copy=True)
huc12_basins['TEXT'] = huc12_basins['TEXT'].apply(str)
huc12_basins.head()

	geometry	huc12	tohuc	areaacres	areasqkm	name	hutype	humod	states	TEXT	code	category	colors	land cover
0	MULTIPOLYGON (((-74.59898 42.46011, -74.59742 42.45643, -74.59778 42.45526, -74.59679 42.45407, …	020401010101	020401010102	20836.201	84.321	Town Brook-Headwaters West Brach Delaware River	S	NM	NY	41	41	Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…	#85c77e	Deciduous Forest
1	MULTIPOLYGON (((-74.69614 42.42722, -74.68906 42.42808, -74.68111 42.42365, -74.67442 42.42411, …	020401010102	020401010103	16095.450	65.136	Betty Brook-Headwaters West Brach Delaware River	S	NM	NY	41	41	Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…	#85c77e	Deciduous Forest
2	MULTIPOLYGON (((-74.77568 42.37921, -74.77194 42.37934, -74.77123 42.37737, -74.76888 42.37515, …	020401010103	020401010104	16327.168	66.074	Rose Brook-Headwaters West Brach Delaware River	S	NM	NY	41	41	Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…	#85c77e	Deciduous Forest
3	MULTIPOLYGON (((-74.80026 42.40748, -74.79412 42.40771, -74.79004 42.40642, -74.78631 42.40798, …	020401010104	020401010106	17537.197	70.971	Elk Creek-Headwaters West Brach Delaware River	S	NM	NY	41	41	Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…	#85c77e	Deciduous Forest
4	MULTIPOLYGON (((-74.73673 42.31516, -74.72832 42.31467, -74.72311 42.31544, -74.71862 42.31483, …	020401010105	020401010106	33390.297	135.126	Upper Little Delaware River	S	NM	NY	41	41	Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…	#85c77e	Deciduous Forest

Plot zonal majority land cover classes

huc12_basins.hvplot(
    geo=True, 
    c='land cover', 
    cmap=color_lookup,
    coastline='50m',
    alpha=0.8, 
    frame_width=300,
    xlabel="longitude", 
    ylabel="latitude", 
    title="Test HUC12 basin ", 
    aspect='equal',
    tiles='OSM',
)

Validation

Clip land cover raster to a single HUC12 (first in the huc12_basins geodataframe) and reproject data
Determine the faction of each class in the clipped raster
Plot the resulting clipped raster and HUC12

Clip Raster

gdf_bounds = huc12_basins.head(n=1).to_crs(data_crs).total_bounds
minx, miny, maxx, maxy = gdf_bounds

clipped = ds.rio.clip_box(minx, miny, maxx, maxy)

clipped_reprojected = clipped.rio.reproject("EPSG:4326")
clipped_reprojected

Use categorical describe() method to determine top value in clipped raster

filtered_array = clipped_reprojected.values[0,:,:]
filtered_array = filtered_array[filtered_array != 255]
pd.DataFrame(filtered_array.ravel(), dtype="category").describe()
# pd.DataFrame(clipped_reprojected.values[0,:,:].ravel(), dtype="category").describe()

	0
count	199593
unique	15
top	41
freq	89769

Determine the fraction of each category in the clipped raster

Above, we see that the most common category is 41, Deciduous Forest. Clipping the land cover raster to the bounds of the HUC12 watershed polygon leaves in grid cells that fall outside the polygon, but we can now start to understand why 41 is the top land cover class in the zonal statistics results for this HUC12 polygon. Below, each of the land cover classes are listed along with the corresponding proportion of cells in the subset made up of that land cover class. We see that nearly half of the cells, 45.0%, are Deciduous Forest, while the next most-abundant land cover class is 81, Pasture/Hay, at 26.9%.

# Getting unique values and their counts
unique_values, counts = np.unique(filtered_array, return_counts=True)

# Calculating frequencies
frequencies = counts / filtered_array.size

list(zip(unique_values, frequencies))

[(11, 0.0045242067607581425),
 (21, 0.04165977764751269),
 (22, 0.015416372317666452),
 (23, 0.005325838080493805),
 (24, 0.0013577630478022777),
 (31, 0.0003056219406492212),
 (41, 0.44976026213344156),
 (42, 0.030216490558286114),
 (43, 0.1523951240774977),
 (52, 0.00445907421602962),
 (71, 0.002725546487101251),
 (81, 0.26870681837539395),
 (82, 0.004053248360413441),
 (90, 0.016759104778223684),
 (95, 0.002334751218730116)]

Plot single HUC12 and clipped gridded land cover data

newds = clipped_reprojected.to_dataset(name="nlcd").isel(band=0)
newds

In the figure below, note the prevalence of the medium green corresponding to code 41, Deciduous Forest, and yellow, corresponding to 81, Pasture/Hay.

import matplotlib.pyplot as plt
cmap, norm, levels = gh.plot.cover_legends()
fig, ax = plt.subplots(figsize=(10,8))
newds['nlcd'].plot(ax=ax, cmap=cmap, levels=levels, cbar_kwargs={"ticks": levels[:-1]})
huc12_basins.head(n=1).plot(ax=ax, facecolor='none', edgecolor='black', linewidth=2)

<Axes: title={'center': 'band = 1, spatial_ref = 0'}, xlabel='longitude [degrees_east]', ylabel='latitude [degrees_north]'>

Plot of the raw data and determining the fraction of each landcover type — Figure 3

Summary

In this notebook we have shown how to access NLCD data through the ClimateR-Catalog. In addition we’ve shown how to generate zonal statistics using GDPTools. The results of the top or most prevalent value in each polygon was added to the GeoDataFrame and plotted. As a simple verification, we plotted the raw data and determined the fraction of each landcover type. The dominant type in the clipped raster associated with the first geometry in the GeoDataFrame conincides with the zonal stats value for “top” or most prevalent value.

Note: we used the ClimateR-Catalog to provide a link to NLCD data, The complete NLCD data are not available in the Catalog. One could also download the NLCD data of interest and run a similar analysis. An Example of calculating raster statistics using GDPTools can be found here