Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Beta

The current list of use cases is a snap shot of what this page will contain in the future. The team is working on improving the use case pages to provide meaningful examples of how active WMA projects are using data and tools. Tell us about your experience using the catalog!

Use Cases


gdptools-nlcd

Keywords: NLCD; gdptools; spatial-interpolation; zonal statistics

Domain: Land Cover; Hydrology

Language: Python

Description:

The 2019 Land Cover product from the National Land Cover Database is interpolated to the Delaware River Basin HUC12s using the Geo Data Processing Tools (gdptools) grid-to-polygon area-weighted statistics tools.


NHGF Tools and applications demo: Interpolating NLCD data to Delaware HUC12s

Introduction

In this example, the 2019 Land Cover product from the National Land Cover Database is interpolated to the Delaware River Basin HUC12s using the Geo Data Processing Tools (gdptools) grid-to-polygon area-weighted statistics tools.

Geo Data Processing Tools (gdptools)

gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the OPeNDAP catalog to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.

Open Source Python Tools

  • xarray: Package to work with labelled multi-dimensional arrays.
  • geopandas: Package to work with geospatial data.
  • HyRiver: Hydroclimate Data Retriever, suite of libraries for retrieving geospatial data from various web services. pynhd library used to retrieve NHDplus water basin polygons and data from the Hydro Network-Linked Data Index (NLDI).
  • Holoviz: Python data visualization suite. Hvplot used to plot geospatial data.

Data

  • 2019 NLCD Land Cover: 2019 conterminous U.S. land cover at a 30-meter spatial resolution with a 16-class legend based on a modified Anderson Level II classification system.

Maintenance Schedule

  • The intent of this notebook is simply a demonstration of an application or workflow. As we progress in the build-out of NGHF project tools, this notebook may be updated or replaced. There is no other planned maintenance at this time.

Description

This notebook demonstrates usage of gdptools to interpolate NLCD 30m land cover data to HUC12s for the Delaware River Basin. There are three sections: 1) Data exploration and pre-processing 2) Calculating zonal statistics for gridded land cover on HUC12 watershed basins 3) Visualization of results

Requirements

  • Linux OS or Windows OS
  • Conda-based package manager such as miniconda or mamba

Create conda environment

conda env create -f environment-examples.yml
conda activate gdptools-examples
jupyter lab

1) Data Exploration and data preparation

Import Libraries

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import pandas as pd
from shapely import box
import cartopy
import hvplot.pandas
import hvplot.xarray
import numpy as np
import warnings
import rasterio as rio
import xarray as xr
import rioxarray as rxr
from pynhd import NLDI
from pynhd import WaterData
import pygeohydro as gh
from pygeohydro import helpers

import hvplot.xarray
import hvplot.pandas

from gdptools import WeightGen
from gdptools import AggGen
from gdptools import UserTiffData
from gdptools import UserCatData

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.colheader_justify', 'center')
pd.set_option('display.precision', 3)

warnings.filterwarnings("ignore")

Create GeoDataFrame of Delaware River Basin HUC12s

# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington De
gage_id = "01482100"
nldi = NLDI()
del_basins = nldi.get_basins(gage_id)
huc12_basins = WaterData("wbd12").bygeom(del_basins.geometry[0])
huc12_basins = huc12_basins[huc12_basins.huc12.str.match('020401')]
shp_crs = huc12_basins.crs.to_wkt()
# huc12_basins = huc12_basins.head(n=200)
len(huc12_basins)
215
huc12_basins.head()
geometryhuc12tohucareaacresareasqkmnamehutypehumodstatesnoncontrib
0MULTIPOLYGON (((-74.57376 40.20723, -74.57695 …0204010508020204010508038470.65334.280Miry RunSTFNJ0
14MULTIPOLYGON (((-74.81968 40.35140, -74.81962 …02040105091002040105091123997.97797.116Jacobs Creek-Delaware RiverSTFNJ,PA0
24MULTIPOLYGON (((-75.53251 40.49621, -75.53704 …02040106070202040106070328624.195115.838Liebert Creek-Little Lehigh CreekSKA, NMPA0
25MULTIPOLYGON (((-75.26173 40.61666, -75.25607 …02040106081102040106081337057.205149.965Saucon CreekSKA, NMPA0
26MULTIPOLYGON (((-75.00760 40.73668, -75.00801 …02040105040202040105060522267.35490.113Lower Pohatcong CreekSNMNJ0

Create a plot of the HUC12 basins.

drb = huc12_basins.hvplot(
    geo=True, coastline='50m', alpha=0.2, c='r', frame_width=300,
    xlabel="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
    title="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal',
    tiles='EsriTerrain',
)

drb

Figure 1

Pull NLCD raster data using ClimateR Catalog

The following dataset assets are available in the ClimateR catalog from the National Land Cover Database (NLCD) for the Contiguous United States (L48): - Land Cover - Tree Canopy - Impervious Surface

L48 Land Cover is available for 2001, 2004, 2006, 2008, 2011, 2016, and 2019 in the ClimateR catalog, with more limited temporal coverage for other variables at the time of writing in 2023, and more limited variables and years for Alaska (AK), Hawaii (HI), and Puerto Rico (PR).

Pandas is used to query the catalog based on the asset of interest, and the json metadata is returned for the asset.

#%% 
climater_cat = "https://mikejohnson51.github.io/climateR-catalogs/catalog.parquet"
cat = pd.read_parquet(climater_cat)
cat.head()
idassetURLtypevarnamevariabledescriptionunitsmodelensemblescenarioT_namedurationintervalnTX_nameY_nameX1XnY1YnresXresYncolsnrowscrstoptobottomtiled
0maca_dayagg_macav2metdata_huss_BNU-ESM_r1i1p1_historic…http://thredds.northwestknowledge.net:8080/thr…opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1BNU-ESMr1i1p1historicaltime1950-01-01/2005-12-311 days20454.0lonlat-124.772-67.06525.06349.3960.0420.0421386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474…TrueT
1maca_dayagg_macav2metdata_huss_CNRM-CM5_r1i1p1_histori…http://thredds.northwestknowledge.net:8080/thr…opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1CNRM-CM5r1i1p1historicaltime1950-01-01/2005-12-311 days20454.0lonlat-124.772-67.06525.06349.3960.0420.0421386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474…TrueT
2maca_dayagg_macav2metdata_huss_CSIRO-Mk3-6-0_r1i1p1_hi…http://thredds.northwestknowledge.net:8080/thr…opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1CSIRO-Mk3-6-0r1i1p1historicaltime1950-01-01/2005-12-311 days20454.0lonlat-124.772-67.06525.06349.3960.0420.0421386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474…TrueT
3maca_dayagg_macav2metdata_huss_bcc-csm1-1_r1i1p1_histo…http://thredds.northwestknowledge.net:8080/thr…opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1bcc-csm1-1r1i1p1historicaltime1950-01-01/2005-12-311 days20454.0lonlat-124.772-67.06525.06349.3960.0420.0421386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474…TrueT
4maca_dayagg_macav2metdata_huss_CanESM2_r1i1p1_historic…http://thredds.northwestknowledge.net:8080/thr…opendapspecific_humidityhussDaily Mean Near-Surface Specific Humiditykg kg-1CanESM2r1i1p1historicaltime1950-01-01/2005-12-311 days20454.0lonlat-124.772-67.06525.06349.3960.0420.0421386.0585.0+proj=longlat +a=6378137 +f=0.0033528106647474…TrueT

Query catalog for NLCD 2019 Land Cover L48

_id = "NLCD"
_asset = "2019 Land Cover L48"

# an example query returns a pandas dataframe.
pd.set_option('display.max_colwidth', 100)
tc = cat.query("id == @_id & asset == @_asset")
tc
idassetURLtypevarnamevariabledescriptionunitsmodelensemblescenarioT_namedurationintervalnTX_nameY_nameX1XnY1YnresXresYncolsnrowscrstoptobottomtiled
6257NLCD2019 Land Cover L48/vsicurl/https://storage.googleapis.com/feddata-r/nlcd/2019_Land_Cover_L48.tifvrtLand_CoverLand_CoverUSGS NLCD Land Cover 2019 L48NoneNoneNoneNoneNoneNoneNaNNoneNone-2.493e+062.343e+06-2.493e+063.310e+0630.030.0161190.0104424.0+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defsNone

Load queried NLCD catalog asset into Rioxarray

ds = rxr.open_rasterio(tc.URL.values[0])
data_crs = ds.spatial_ref.crs_wkt
ds
<xarray.DataArray (band: 1, y: 104424, x: 161190)>
[16832104560 values with dtype=uint8]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -2.493e+06 -2.493e+06 ... 2.343e+06 2.343e+06
  * y            (y) float64 3.31e+06 3.31e+06 3.31e+06 ... 1.773e+05 1.773e+05
    spatial_ref  int64 0
Attributes: (12/18)
    AREA_OR_POINT:              Area
    LAYER_TYPE:                 thematic
    OVERVIEWS_ALGORITHM:        IMAGINE Nearest Neighbor Resampling
    STATISTICS_HISTOBINVALUES:  7853863229|0|0|0|0|0|0|0|0|0|0|472399232|9624...
    STATISTICS_HISTOMAX:        255
    STATISTICS_HISTOMIN:        0
    ...                         ...
    STATISTICS_SKIPFACTORX:     1
    STATISTICS_SKIPFACTORY:     1
    STATISTICS_STDDEV:          32.689899151583
    scale_factor:               1.0
    add_offset:                 0.0
    long_name:                  Layer_1

Set up input data

For climater-catalog entries that are of type vrt or for any rasters, gdptools provides the UserTiffData class. The UserTiffData class requires the user to define parameters for both the source and target data. Here the source data parameters are: - var: Not currently used so a dummy variable can be passed - ds: the xarray dataset - proj_ds: projection for the xarray dataset which can be retrieved from the crs_wkt attribute of the spatial_ref var, or it can be retrieved from the crs attribute of the climater-catalog query - x_coord, y_coord: Coordinate names determined from the xarray dataset - band, bname: Both determined from the xarray dataset

The target data parameters are: - f_feature: The Geopandas dataframe - id_feature: The string of the id used to associate the results of the zonal stats - proj_feature: The projection or crs of the source data.

x_coord = 'x'
y_coord = 'y'
band='band'
var = 'descriptor_2019'
shp_poly_idx = 'huc12'

user_data = UserTiffData(
    var=var,
    ds=ds,
    proj_ds=data_crs,
    x_coord=x_coord,
    y_coord=y_coord,
    band=1,
    bname=band,
    f_feature=huc12_basins,
    proj_feature=shp_crs,
    id_feature=shp_poly_idx,
)

user_data
<gdptools.data.user_data.UserTiffData at 0x7f5b10f373a0>

2) Calculating zonal statistics for gridded land cover on HUC12 watershed basins

Calculate zonal majority land cover classes

ZonalGen() class is the object to calculate the zonal mean using the calculate_zonal() method.

#| tags: []
from gdptools import ZonalGen

zonal_gen = ZonalGen(
    user_data=user_data,
    zonal_engine="serial",
    zonal_writer="csv",
    out_path=".",
    file_prefix="Delaware_NLCD_stats"
)

stats = zonal_gen.calculate_zonal(categorical=True)
data prepped for zonal in 0.0339 seconds
  converted tiff to points in 17.4888 seconds
       - fixing 0 invalid polygons.
  overlaps calculated in 16.5268 seconds
  categorical zonal stats calculated in 1.2142 seconds
  []
  number of missing values: 0
  fill missing values with nearest neighbors in 0.0048 seconds
  Total time for serial zonal stats calculation 40.7026 seconds
print(stats.head(n=20))
               count  unique  top   freq 
huc12                                    
020401010101   93767    15    41    39921
020401010102   72336    15    41    41343
020401010103   73422    15    41    44120
020401010104   78813    15    41    48206
020401010105  150143    15    41   101734
020401010106   98550    15    41    51262
020401010201   86184    15    41    52400
020401010202   86753    15    41    54007
020401010203   72031    15    41    45984
020401010204   64022    15    41    37792
020401010205  140925    15    41    85334
020401010206  142433    15    41    80717
020401010207  151789    14    41    92045
020401010301   88614    15    41    51577
020401010302  100324    15    41    55824
020401010303   93982    15    41    48143
020401010304   63607    15    41    27262
020401010305  105802    15    41    66762
020401010306   52155    14    41    32936
020401010307  104980    14    41    64376

Add majority land cover category to geodataframe

huc12_basins.sort_values(by=shp_poly_idx, inplace=True)
stats.sort_values(by=shp_poly_idx, inplace=True)
huc12_basins["TEXT"] = stats["top"].values
huc12_basins.head()
geometryhuc12tohucareaacresareasqkmnamehutypehumodstatesnoncontribTEXT
417MULTIPOLYGON (((-74.59898 42.46011, -74.59742 42.45643, -74.59778 42.45526, -74.59679 42.45407, …02040101010102040101010220836.20184.321Town Brook-Headwaters West Brach Delaware RiverSNMNY041
416MULTIPOLYGON (((-74.69614 42.42722, -74.68906 42.42808, -74.68111 42.42365, -74.67442 42.42411, …02040101010202040101010316095.45065.136Betty Brook-Headwaters West Brach Delaware RiverSNMNY041
338MULTIPOLYGON (((-74.77568 42.37921, -74.77194 42.37934, -74.77123 42.37737, -74.76888 42.37515, …02040101010302040101010416327.16866.074Rose Brook-Headwaters West Brach Delaware RiverSNMNY041
355MULTIPOLYGON (((-74.80026 42.40748, -74.79412 42.40771, -74.79004 42.40642, -74.78631 42.40798, …02040101010402040101010617537.19770.971Elk Creek-Headwaters West Brach Delaware RiverSNMNY041
406MULTIPOLYGON (((-74.73673 42.31516, -74.72832 42.31467, -74.72311 42.31544, -74.71862 42.31483, …02040101010502040101010633390.297135.126Upper Little Delaware RiverSNMNY041

3) Visualization of results

Import NLCD land cover lookup information

nlcd_lookup = helpers.nlcd_helper()['classes']
del nlcd_lookup['127']
del nlcd_lookup['51']
del nlcd_lookup['72']
del nlcd_lookup['73']
del nlcd_lookup['74']
print(len(nlcd_lookup))
nlcd_lookup
18
{'11': 'Open Water - All areas of open water, generally with less than 25% cover or vegetation or soil',
 '12': 'Perennial Ice/Snow - All areas characterized by a perennial cover of ice and/or snow, generally greater than 25% of total cover.',
 '21': 'Developed, Open Space - Includes areas with a mixture of some constructed materials, but mostly vegetation in the form of lawn grasses.  Impervious surfaces account for less than 20 percent of total cover.  These areas most commonly include large-lot single-family housing units, parks, golf courses, and vegetation planted in developed settings for recreation, erosion control, or aesthetic purposes.',
 '22': 'Developed, Low Intensity -Includes areas with a mixture of constructed materials and vegetation.  Impervious surfaces account for 20-49 percent of total cover.  These areas most commonly include single-family housing units.',
 '23': 'Developed, Medium Intensity - Includes areas with a mixture of constructed materials and vegetation. Impervious surfaces account for 50-79 percent of the total cover.  These areas most commonly include single-family housing units.',
 '24': 'Developed, High Intensity - Includes highly developed areas where people reside or work in high numbers. Examples include apartment complexes, row houses and commercial/industrial.  Impervious surfaces account for 80 to 100 percent of the total cover.',
 '31': 'Barren Land (Rock/Sand/Clay) - Barren areas of bedrock, desert pavement, scarps, talus, slides, volcanic material, glacial debris, sand dunes, strip mines, gravel pits and other accumulations of earthen material. Generally, vegetation accounts for less than 15% of total cover.',
 '41': 'Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75 percent of the tree species shed foliage simultaneously in response to seasonal change.',
 '42': 'Evergreen Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75 percent of the tree species maintain their leaves all year. Canopy is never without green foliage.',
 '43': 'Mixed Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. Neither deciduous nor evergreen species are greater than 75 percent of total tree cover.',
 '45': 'Shrub-Forest - Areas identified as currently shrub, but showing spectral properties of transitioning to future forest.',
 '46': 'Herbaceous-Forest - Areas identified as currently grass, but showing spectral properties of transitioning from being either a past forest or to future shrub-forest.',
 '52': 'Shrub/Scrub - Areas dominated by shrubs; less than 5 meters tall with shrub canopy typically greater than 20% of total vegetation. This class includes true shrubs and perennially stunted trees from environmental conditions.',
 '71': 'Grassland/Herbaceous - Areas dominated by grammanoid or herbaceous vegetation, generally greater than 80% of total vegetation.  These areas are not subject to intensive management such as tilling, but can be utilized for grazing.',
 '81': 'Pasture/Hay - Areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing or the production of seed or hay crops, typically on a perennial cycle. Pasture/hay vegetation accounts for greater than 20 percent of total vegetation.',
 '82': 'Cultivated Crops - Areas used for the production of annual crops, such as corn, soybeans, vegetables, tobacco, and cotton, and also perennial woody crops such as orchards and vineyards. Crop vegetation accounts for greater than 20 percent of total vegetation. This class also includes all land being actively tilled.',
 '90': 'Woody Wetlands - Areas where forest or shrub land vegetation accounts for greater than 20 percent of vegetative cover and the soil or substrate is periodically saturated with or covered with water.',
 '95': 'Emergent Herbaceous Wetlands - Areas where perennial herbaceous vegetation accounts for greater than 80 percent of vegetative cover and the soil or substrate is periodically saturated with or covered with water.'}

Set colors for land cover classes

hex codes follow color scheme from us-land-cover.

len(nlcd_lookup)
nlcd_colors = ['#5475a8', '#ffffff', '#e8d1d1', '#e29e8c', '#ff0000', '#b50000', '#d2cdc0', '#85c77e', '#38814e', '#d4e7b0', '#dcca8f', '#d4e7b0', '#dcca8f', '#e2e2c1', '#fbf65d', '#ca9146', '#c8e6f8', '#64b3d5']
len(nlcd_colors)
18

Create dataframe from NLCD lookup dictionary

nlcd_lookup_df = pd.DataFrame.from_dict({'code': [int(i) for i in list(nlcd_lookup.keys())], 'category': list(nlcd_lookup.values()), 'colors': nlcd_colors})
nlcd_lookup_df['land cover'] = nlcd_lookup_df['category'].apply(lambda name: name.split(' -')[0])
nlcd_lookup_df
codecategorycolorsland cover
011Open Water - All areas of open water, generally with less than 25% cover or vegetation or soil#5475a8Open Water
112Perennial Ice/Snow - All areas characterized by a perennial cover of ice and/or snow, generally …#ffffffPerennial Ice/Snow
221Developed, Open Space - Includes areas with a mixture of some constructed materials, but mostly …#e8d1d1Developed, Open Space
322Developed, Low Intensity -Includes areas with a mixture of constructed materials and vegetation….#e29e8cDeveloped, Low Intensity
423Developed, Medium Intensity - Includes areas with a mixture of constructed materials and vegetat…#ff0000Developed, Medium Intensity
524Developed, High Intensity - Includes highly developed areas where people reside or work in high …#b50000Developed, High Intensity
631Barren Land (Rock/Sand/Clay) - Barren areas of bedrock, desert pavement, scarps, talus, slides, …#d2cdc0Barren Land (Rock/Sand/Clay)
741Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…#85c77eDeciduous Forest
842Evergreen Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…#38814eEvergreen Forest
943Mixed Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 2…#d4e7b0Mixed Forest
1045Shrub-Forest - Areas identified as currently shrub, but showing spectral properties of transitio…#dcca8fShrub-Forest
1146Herbaceous-Forest - Areas identified as currently grass, but showing spectral properties of tran…#d4e7b0Herbaceous-Forest
1252Shrub/Scrub - Areas dominated by shrubs; less than 5 meters tall with shrub canopy typically gre…#dcca8fShrub/Scrub
1371Grassland/Herbaceous - Areas dominated by grammanoid or herbaceous vegetation, generally greater…#e2e2c1Grassland/Herbaceous
1481Pasture/Hay - Areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing …#fbf65dPasture/Hay
1582Cultivated Crops - Areas used for the production of annual crops, such as corn, soybeans, vegeta…#ca9146Cultivated Crops
1690Woody Wetlands - Areas where forest or shrub land vegetation accounts for greater than 20 percen…#c8e6f8Woody Wetlands
1795Emergent Herbaceous Wetlands - Areas where perennial herbaceous vegetation accounts for greater …#64b3d5Emergent Herbaceous Wetlands

Create categorical colormap from lookup dataframe

color_lookup = {}
for i in range(len(nlcd_lookup_df)):
    key = nlcd_lookup_df['land cover'][i]
    color_lookup[key]=nlcd_lookup_df['colors'][i]

print(color_lookup)
{'Open Water': '#5475a8', 'Perennial Ice/Snow': '#ffffff', 'Developed, Open Space': '#e8d1d1', 'Developed, Low Intensity': '#e29e8c', 'Developed, Medium Intensity': '#ff0000', 'Developed, High Intensity': '#b50000', 'Barren Land (Rock/Sand/Clay)': '#d2cdc0', 'Deciduous Forest': '#85c77e', 'Evergreen Forest': '#38814e', 'Mixed Forest': '#d4e7b0', 'Shrub-Forest': '#dcca8f', 'Herbaceous-Forest': '#d4e7b0', 'Shrub/Scrub': '#dcca8f', 'Grassland/Herbaceous': '#e2e2c1', 'Pasture/Hay': '#fbf65d', 'Cultivated Crops': '#ca9146', 'Woody Wetlands': '#c8e6f8', 'Emergent Herbaceous Wetlands': '#64b3d5'}

Add land cover lookup information to geodataframe

huc12_basins = huc12_basins.merge(nlcd_lookup_df, left_on='TEXT', right_on='code', how='inner', copy=True)
huc12_basins['TEXT'] = huc12_basins['TEXT'].apply(str)
huc12_basins.head()
geometryhuc12tohucareaacresareasqkmnamehutypehumodstatesnoncontribTEXTcodecategorycolorsland cover
0MULTIPOLYGON (((-74.59898 42.46011, -74.59742 42.45643, -74.59778 42.45526, -74.59679 42.45407, …02040101010102040101010220836.20184.321Town Brook-Headwaters West Brach Delaware RiverSNMNY04141Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…#85c77eDeciduous Forest
1MULTIPOLYGON (((-74.69614 42.42722, -74.68906 42.42808, -74.68111 42.42365, -74.67442 42.42411, …02040101010202040101010316095.45065.136Betty Brook-Headwaters West Brach Delaware RiverSNMNY04141Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…#85c77eDeciduous Forest
2MULTIPOLYGON (((-74.77568 42.37921, -74.77194 42.37934, -74.77123 42.37737, -74.76888 42.37515, …02040101010302040101010416327.16866.074Rose Brook-Headwaters West Brach Delaware RiverSNMNY04141Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…#85c77eDeciduous Forest
3MULTIPOLYGON (((-74.80026 42.40748, -74.79412 42.40771, -74.79004 42.40642, -74.78631 42.40798, …02040101010402040101010617537.19770.971Elk Creek-Headwaters West Brach Delaware RiverSNMNY04141Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…#85c77eDeciduous Forest
4MULTIPOLYGON (((-74.73673 42.31516, -74.72832 42.31467, -74.72311 42.31544, -74.71862 42.31483, …02040101010502040101010633390.297135.126Upper Little Delaware RiverSNMNY04141Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th…#85c77eDeciduous Forest

Plot zonal majority land cover classes

huc12_basins.hvplot(
    geo=True, 
    c='land cover', 
    cmap=color_lookup,
    coastline='50m',
    alpha=0.8, 
    frame_width=300,
    xlabel="longitude", 
    ylabel="latitude", 
    title="Test HUC12 basin ", 
    aspect='equal',
    tiles='OSM',
)

Figure 2

Validation

  • Clip land cover raster to a single HUC12 (first in the huc12_basins geodataframe) and reproject data
  • Determine the faction of each class in the clipped raster
  • Plot the resulting clipped raster and HUC12

Clip Raster

gdf_bounds = huc12_basins.head(n=1).to_crs(data_crs).total_bounds
minx, miny, maxx, maxy = gdf_bounds

clipped = ds.rio.clip_box(minx, miny, maxx, maxy)

clipped_reprojected = clipped.rio.reproject("EPSG:4326")
clipped_reprojected
<xarray.DataArray (band: 1, y: 506, x: 574)>
array([[[255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        ...,
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255]]], dtype=uint8)
Coordinates:
  * x            (x) float64 -74.7 -74.7 -74.7 -74.69 ... -74.53 -74.53 -74.53
  * y            (y) float64 42.47 42.47 42.47 42.47 ... 42.33 42.33 42.33 42.33
  * band         (band) int64 1
    spatial_ref  int64 0
Attributes: (12/19)
    AREA_OR_POINT:              Area
    LAYER_TYPE:                 thematic
    OVERVIEWS_ALGORITHM:        IMAGINE Nearest Neighbor Resampling
    STATISTICS_HISTOBINVALUES:  7853863229|0|0|0|0|0|0|0|0|0|0|472399232|9624...
    STATISTICS_HISTOMAX:        255
    STATISTICS_HISTOMIN:        0
    ...                         ...
    STATISTICS_SKIPFACTORY:     1
    STATISTICS_STDDEV:          32.689899151583
    scale_factor:               1.0
    add_offset:                 0.0
    long_name:                  Layer_1
    _FillValue:                 255

Use categorical describe() method to determine top value in clipped raster

filtered_array = clipped_reprojected.values[0,:,:]
filtered_array = filtered_array[filtered_array != 255]
pd.DataFrame(filtered_array.ravel(), dtype="category").describe()
# pd.DataFrame(clipped_reprojected.values[0,:,:].ravel(), dtype="category").describe()
0
count199593
unique15
top41
freq89769

Determine the fraction of each category in the clipped raster

Above, we see that the most common category is 41, Deciduous Forest. Clipping the land cover raster to the bounds of the HUC12 watershed polygon leaves in grid cells that fall outside the polygon, but we can now start to understand why 41 is the top land cover class in the zonal statistics results for this HUC12 polygon. Below, each of the land cover classes are listed along with the corresponding proportion of cells in the subset made up of that land cover class. We see that nearly half of the cells, 45.0%, are Deciduous Forest, while the next most-abundant land cover class is 81, Pasture/Hay, at 26.9%.

# Getting unique values and their counts
unique_values, counts = np.unique(filtered_array, return_counts=True)

# Calculating frequencies
frequencies = counts / filtered_array.size

list(zip(unique_values, frequencies))
[(11, 0.0045242067607581425),
 (21, 0.04165977764751269),
 (22, 0.015416372317666452),
 (23, 0.005325838080493805),
 (24, 0.0013577630478022777),
 (31, 0.0003056219406492212),
 (41, 0.44976026213344156),
 (42, 0.030216490558286114),
 (43, 0.1523951240774977),
 (52, 0.00445907421602962),
 (71, 0.002725546487101251),
 (81, 0.26870681837539395),
 (82, 0.004053248360413441),
 (90, 0.016759104778223684),
 (95, 0.002334751218730116)]

Plot single HUC12 and clipped gridded land cover data

newds = clipped_reprojected.to_dataset(name="nlcd").isel(band=0)
newds
<xarray.Dataset>
Dimensions:      (x: 574, y: 506)
Coordinates:
  * x            (x) float64 -74.7 -74.7 -74.7 -74.69 ... -74.53 -74.53 -74.53
  * y            (y) float64 42.47 42.47 42.47 42.47 ... 42.33 42.33 42.33 42.33
    band         int64 1
    spatial_ref  int64 0
Data variables:
    nlcd         (y, x) uint8 255 255 255 255 255 255 ... 255 255 255 255 255

In the figure below, note the prevalence of the medium green corresponding to code 41, Deciduous Forest, and yellow, corresponding to 81, Pasture/Hay.

import matplotlib.pyplot as plt
cmap, norm, levels = gh.plot.cover_legends()
fig, ax = plt.subplots(figsize=(10,8))
newds['nlcd'].plot(ax=ax, cmap=cmap, levels=levels, cbar_kwargs={"ticks": levels[:-1]})
huc12_basins.head(n=1).plot(ax=ax, facecolor='none', edgecolor='black', linewidth=2)
<Axes: title={'center': 'band = 1, spatial_ref = 0'}, xlabel='longitude [degrees_east]', ylabel='latitude [degrees_north]'>

Figure 3

Summary

In this notebook we have shown how to access NLCD data through the ClimateR-Catalog. In addition we’ve shown how to generate zonal statistics using GDPTools. The results of the top or most prevalent value in each polygon was added to the GeoDataFrame and plotted. As a simple verification, we plotted the raw data and determined the fraction of each landcover type. The dominant type in the clipped raster associated with the first geometry in the GeoDataFrame conincides with the zonal stats value for “top” or most prevalent value.

Note: we used the ClimateR-Catalog to provide a link to NLCD data, The complete NLCD data are not available in the Catalog. One could also download the NLCD data of interest and run a similar analysis. An Example of calculating raster statistics using GDPTools can be found here