import os
'USE_PYGEOS'] = '0'
os.environ[import geopandas as gpd
import pandas as pd
from shapely import box
import cartopy
import hvplot.pandas
import hvplot.xarray
import numpy as np
import warnings
import rasterio as rio
import xarray as xr
import rioxarray as rxr
from pynhd import NLDI
from pynhd import WaterData
import pygeohydro as gh
from pygeohydro import helpers
import hvplot.xarray
import hvplot.pandas
from gdptools import WeightGen
from gdptools import AggGen
from gdptools import UserTiffData
from gdptools import UserCatData
'display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.colheader_justify', 'center')
pd.set_option('display.precision', 3)
pd.set_option(
"ignore") warnings.filterwarnings(
Use Cases
gdptools-nlcd
Keywords: NLCD; gdptools; spatial-interpolation; zonal statistics
Domain: Land Cover; Hydrology
Language: Python
Description:
The 2019 Land Cover product from the National Land Cover Database is interpolated to the Delaware River Basin HUC12s using the Geo Data Processing Tools (gdptools) grid-to-polygon area-weighted statistics tools.
NHGF Tools and applications demo: Interpolating NLCD data to Delaware HUC12s
Introduction
In this example, the 2019 Land Cover product from the National Land Cover Database is interpolated to the Delaware River Basin HUC12s using the Geo Data Processing Tools (gdptools) grid-to-polygon area-weighted statistics tools.
Geo Data Processing Tools (gdptools)
gdptools is a python package for calculating area-weighted statistics of grid-to-polygon. grid-to-line, and polygon-to-polygon interpolations. gdptools utilizes metadata from the OPeNDAP catalog to aid generalizing gdptools processing capabilities. There are also methods for calculating area-weighted statistics for any gridded dataset that can be opened by xarray, however some of the metadata encapsulated in the json entries will have to be added by the user. Examples can be found in Non-Catalog Examples.
Open Source Python Tools
- xarray: Package to work with labelled multi-dimensional arrays.
- geopandas: Package to work with geospatial data.
- HyRiver: Hydroclimate Data Retriever, suite of libraries for retrieving geospatial data from various web services. pynhd library used to retrieve NHDplus water basin polygons and data from the Hydro Network-Linked Data Index (NLDI).
- Holoviz: Python data visualization suite. Hvplot used to plot geospatial data.
Data
- 2019 NLCD Land Cover: 2019 conterminous U.S. land cover at a 30-meter spatial resolution with a 16-class legend based on a modified Anderson Level II classification system.
Maintenance Schedule
- The intent of this notebook is simply a demonstration of an application or workflow. As we progress in the build-out of NGHF project tools, this notebook may be updated or replaced. There is no other planned maintenance at this time.
Description
This notebook demonstrates usage of gdptools to interpolate NLCD 30m land cover data to HUC12s for the Delaware River Basin. There are three sections: 1) Data exploration and pre-processing 2) Calculating zonal statistics for gridded land cover on HUC12 watershed basins 3) Visualization of results
Requirements
- Linux OS or Windows OS
- Conda-based package manager such as miniconda or mamba
Create conda environment
- Download environment-examples.yml
- To create the conda environment, follow the steps below.
conda env create -f environment-examples.yml
conda activate gdptools-examples
jupyter lab
1) Data Exploration and data preparation
Import Libraries
Create GeoDataFrame of Delaware River Basin HUC12s
# USGS gage 01482100 Delaware River at Del Mem Bridge at Wilmington De
= "01482100"
gage_id = NLDI()
nldi = nldi.get_basins(gage_id)
del_basins = WaterData("wbd12").bygeom(del_basins.geometry[0])
huc12_basins = huc12_basins[huc12_basins.huc12.str.match('020401')]
huc12_basins = huc12_basins.crs.to_wkt()
shp_crs # huc12_basins = huc12_basins.head(n=200)
len(huc12_basins)
215
huc12_basins.head()
geometry | huc12 | tohuc | areaacres | areasqkm | name | hutype | humod | states | noncontrib | |
---|---|---|---|---|---|---|---|---|---|---|
0 | MULTIPOLYGON (((-74.57376 40.20723, -74.57695 … | 020401050802 | 020401050803 | 8470.653 | 34.280 | Miry Run | S | TF | NJ | 0 |
14 | MULTIPOLYGON (((-74.81968 40.35140, -74.81962 … | 020401050910 | 020401050911 | 23997.977 | 97.116 | Jacobs Creek-Delaware River | S | TF | NJ,PA | 0 |
24 | MULTIPOLYGON (((-75.53251 40.49621, -75.53704 … | 020401060702 | 020401060703 | 28624.195 | 115.838 | Liebert Creek-Little Lehigh Creek | S | KA, NM | PA | 0 |
25 | MULTIPOLYGON (((-75.26173 40.61666, -75.25607 … | 020401060811 | 020401060813 | 37057.205 | 149.965 | Saucon Creek | S | KA, NM | PA | 0 |
26 | MULTIPOLYGON (((-75.00760 40.73668, -75.00801 … | 020401050402 | 020401050605 | 22267.354 | 90.113 | Lower Pohatcong Creek | S | NM | NJ | 0 |
Create a plot of the HUC12 basins.
= huc12_basins.hvplot(
drb =True, coastline='50m', alpha=0.2, c='r', frame_width=300,
geo="longitude", ylabel="latitude", clabel="monthly water evaporation (mm)",
xlabel="Delaware River HUC12 basins", xlim=(-76.8, -73.8), aspect='equal',
title='EsriTerrain',
tiles
)
drb
Pull NLCD raster data using ClimateR Catalog
The following dataset assets are available in the ClimateR catalog from the National Land Cover Database (NLCD) for the Contiguous United States (L48): - Land Cover - Tree Canopy - Impervious Surface
L48 Land Cover is available for 2001, 2004, 2006, 2008, 2011, 2016, and 2019 in the ClimateR catalog, with more limited temporal coverage for other variables at the time of writing in 2023, and more limited variables and years for Alaska (AK), Hawaii (HI), and Puerto Rico (PR).
Pandas is used to query the catalog based on the asset of interest, and the json metadata is returned for the asset.
#%%
= "https://mikejohnson51.github.io/climateR-catalogs/catalog.parquet"
climater_cat = pd.read_parquet(climater_cat)
cat cat.head()
id | asset | URL | type | varname | variable | description | units | model | ensemble | scenario | T_name | duration | interval | nT | X_name | Y_name | X1 | Xn | Y1 | Yn | resX | resY | ncols | nrows | crs | toptobottom | tiled | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | maca_day | agg_macav2metdata_huss_BNU-ESM_r1i1p1_historic… | http://thredds.northwestknowledge.net:8080/thr… | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | BNU-ESM | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474… | True | T |
1 | maca_day | agg_macav2metdata_huss_CNRM-CM5_r1i1p1_histori… | http://thredds.northwestknowledge.net:8080/thr… | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | CNRM-CM5 | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474… | True | T |
2 | maca_day | agg_macav2metdata_huss_CSIRO-Mk3-6-0_r1i1p1_hi… | http://thredds.northwestknowledge.net:8080/thr… | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | CSIRO-Mk3-6-0 | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474… | True | T |
3 | maca_day | agg_macav2metdata_huss_bcc-csm1-1_r1i1p1_histo… | http://thredds.northwestknowledge.net:8080/thr… | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | bcc-csm1-1 | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474… | True | T |
4 | maca_day | agg_macav2metdata_huss_CanESM2_r1i1p1_historic… | http://thredds.northwestknowledge.net:8080/thr… | opendap | specific_humidity | huss | Daily Mean Near-Surface Specific Humidity | kg kg-1 | CanESM2 | r1i1p1 | historical | time | 1950-01-01/2005-12-31 | 1 days | 20454.0 | lon | lat | -124.772 | -67.065 | 25.063 | 49.396 | 0.042 | 0.042 | 1386.0 | 585.0 | +proj=longlat +a=6378137 +f=0.0033528106647474… | True | T |
Query catalog for NLCD 2019 Land Cover L48
= "NLCD"
_id = "2019 Land Cover L48"
_asset
# an example query returns a pandas dataframe.
'display.max_colwidth', 100)
pd.set_option(= cat.query("id == @_id & asset == @_asset")
tc tc
id | asset | URL | type | varname | variable | description | units | model | ensemble | scenario | T_name | duration | interval | nT | X_name | Y_name | X1 | Xn | Y1 | Yn | resX | resY | ncols | nrows | crs | toptobottom | tiled | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6257 | NLCD | 2019 Land Cover L48 | /vsicurl/https://storage.googleapis.com/feddata-r/nlcd/2019_Land_Cover_L48.tif | vrt | Land_Cover | Land_Cover | USGS NLCD Land Cover 2019 L48 | None | None | None | None | None | None | NaN | None | None | -2.493e+06 | 2.343e+06 | -2.493e+06 | 3.310e+06 | 30.0 | 30.0 | 161190.0 | 104424.0 | +proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs | None |
Load queried NLCD catalog asset into Rioxarray
= rxr.open_rasterio(tc.URL.values[0])
ds = ds.spatial_ref.crs_wkt
data_crs ds
<xarray.DataArray (band: 1, y: 104424, x: 161190)> [16832104560 values with dtype=uint8] Coordinates: * band (band) int64 1 * x (x) float64 -2.493e+06 -2.493e+06 ... 2.343e+06 2.343e+06 * y (y) float64 3.31e+06 3.31e+06 3.31e+06 ... 1.773e+05 1.773e+05 spatial_ref int64 0 Attributes: (12/18) AREA_OR_POINT: Area LAYER_TYPE: thematic OVERVIEWS_ALGORITHM: IMAGINE Nearest Neighbor Resampling STATISTICS_HISTOBINVALUES: 7853863229|0|0|0|0|0|0|0|0|0|0|472399232|9624... STATISTICS_HISTOMAX: 255 STATISTICS_HISTOMIN: 0 ... ... STATISTICS_SKIPFACTORX: 1 STATISTICS_SKIPFACTORY: 1 STATISTICS_STDDEV: 32.689899151583 scale_factor: 1.0 add_offset: 0.0 long_name: Layer_1
Set up input data
For climater-catalog entries that are of type vrt
or for any rasters, gdptools provides the UserTiffData
class. The UserTiffData
class requires the user to define parameters for both the source and target data. Here the source data parameters are: - var: Not currently used so a dummy variable can be passed - ds: the xarray dataset - proj_ds: projection for the xarray dataset which can be retrieved from the crs_wkt attribute of the spatial_ref var, or it can be retrieved from the crs attribute of the climater-catalog query - x_coord, y_coord: Coordinate names determined from the xarray dataset - band, bname: Both determined from the xarray dataset
The target data parameters are: - f_feature: The Geopandas dataframe - id_feature: The string of the id used to associate the results of the zonal stats - proj_feature: The projection or crs of the source data.
= 'x'
x_coord = 'y'
y_coord ='band'
band= 'descriptor_2019'
var = 'huc12'
shp_poly_idx
= UserTiffData(
user_data =var,
var=ds,
ds=data_crs,
proj_ds=x_coord,
x_coord=y_coord,
y_coord=1,
band=band,
bname=huc12_basins,
f_feature=shp_crs,
proj_feature=shp_poly_idx,
id_feature
)
user_data
<gdptools.data.user_data.UserTiffData at 0x7f5b10f373a0>
2) Calculating zonal statistics for gridded land cover on HUC12 watershed basins
Calculate zonal majority land cover classes
ZonalGen() class is the object to calculate the zonal mean using the calculate_zonal() method.
#| tags: []
from gdptools import ZonalGen
= ZonalGen(
zonal_gen =user_data,
user_data="serial",
zonal_engine="csv",
zonal_writer=".",
out_path="Delaware_NLCD_stats"
file_prefix
)
= zonal_gen.calculate_zonal(categorical=True) stats
data prepped for zonal in 0.0339 seconds
converted tiff to points in 17.4888 seconds
- fixing 0 invalid polygons.
overlaps calculated in 16.5268 seconds
categorical zonal stats calculated in 1.2142 seconds
[]
number of missing values: 0
fill missing values with nearest neighbors in 0.0048 seconds
Total time for serial zonal stats calculation 40.7026 seconds
print(stats.head(n=20))
count unique top freq
huc12
020401010101 93767 15 41 39921
020401010102 72336 15 41 41343
020401010103 73422 15 41 44120
020401010104 78813 15 41 48206
020401010105 150143 15 41 101734
020401010106 98550 15 41 51262
020401010201 86184 15 41 52400
020401010202 86753 15 41 54007
020401010203 72031 15 41 45984
020401010204 64022 15 41 37792
020401010205 140925 15 41 85334
020401010206 142433 15 41 80717
020401010207 151789 14 41 92045
020401010301 88614 15 41 51577
020401010302 100324 15 41 55824
020401010303 93982 15 41 48143
020401010304 63607 15 41 27262
020401010305 105802 15 41 66762
020401010306 52155 14 41 32936
020401010307 104980 14 41 64376
Add majority land cover category to geodataframe
=shp_poly_idx, inplace=True)
huc12_basins.sort_values(by=shp_poly_idx, inplace=True)
stats.sort_values(by"TEXT"] = stats["top"].values
huc12_basins[ huc12_basins.head()
geometry | huc12 | tohuc | areaacres | areasqkm | name | hutype | humod | states | noncontrib | TEXT | |
---|---|---|---|---|---|---|---|---|---|---|---|
417 | MULTIPOLYGON (((-74.59898 42.46011, -74.59742 42.45643, -74.59778 42.45526, -74.59679 42.45407, … | 020401010101 | 020401010102 | 20836.201 | 84.321 | Town Brook-Headwaters West Brach Delaware River | S | NM | NY | 0 | 41 |
416 | MULTIPOLYGON (((-74.69614 42.42722, -74.68906 42.42808, -74.68111 42.42365, -74.67442 42.42411, … | 020401010102 | 020401010103 | 16095.450 | 65.136 | Betty Brook-Headwaters West Brach Delaware River | S | NM | NY | 0 | 41 |
338 | MULTIPOLYGON (((-74.77568 42.37921, -74.77194 42.37934, -74.77123 42.37737, -74.76888 42.37515, … | 020401010103 | 020401010104 | 16327.168 | 66.074 | Rose Brook-Headwaters West Brach Delaware River | S | NM | NY | 0 | 41 |
355 | MULTIPOLYGON (((-74.80026 42.40748, -74.79412 42.40771, -74.79004 42.40642, -74.78631 42.40798, … | 020401010104 | 020401010106 | 17537.197 | 70.971 | Elk Creek-Headwaters West Brach Delaware River | S | NM | NY | 0 | 41 |
406 | MULTIPOLYGON (((-74.73673 42.31516, -74.72832 42.31467, -74.72311 42.31544, -74.71862 42.31483, … | 020401010105 | 020401010106 | 33390.297 | 135.126 | Upper Little Delaware River | S | NM | NY | 0 | 41 |
3) Visualization of results
Import NLCD land cover lookup information
= helpers.nlcd_helper()['classes']
nlcd_lookup del nlcd_lookup['127']
del nlcd_lookup['51']
del nlcd_lookup['72']
del nlcd_lookup['73']
del nlcd_lookup['74']
print(len(nlcd_lookup))
nlcd_lookup
18
{'11': 'Open Water - All areas of open water, generally with less than 25% cover or vegetation or soil',
'12': 'Perennial Ice/Snow - All areas characterized by a perennial cover of ice and/or snow, generally greater than 25% of total cover.',
'21': 'Developed, Open Space - Includes areas with a mixture of some constructed materials, but mostly vegetation in the form of lawn grasses. Impervious surfaces account for less than 20 percent of total cover. These areas most commonly include large-lot single-family housing units, parks, golf courses, and vegetation planted in developed settings for recreation, erosion control, or aesthetic purposes.',
'22': 'Developed, Low Intensity -Includes areas with a mixture of constructed materials and vegetation. Impervious surfaces account for 20-49 percent of total cover. These areas most commonly include single-family housing units.',
'23': 'Developed, Medium Intensity - Includes areas with a mixture of constructed materials and vegetation. Impervious surfaces account for 50-79 percent of the total cover. These areas most commonly include single-family housing units.',
'24': 'Developed, High Intensity - Includes highly developed areas where people reside or work in high numbers. Examples include apartment complexes, row houses and commercial/industrial. Impervious surfaces account for 80 to 100 percent of the total cover.',
'31': 'Barren Land (Rock/Sand/Clay) - Barren areas of bedrock, desert pavement, scarps, talus, slides, volcanic material, glacial debris, sand dunes, strip mines, gravel pits and other accumulations of earthen material. Generally, vegetation accounts for less than 15% of total cover.',
'41': 'Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75 percent of the tree species shed foliage simultaneously in response to seasonal change.',
'42': 'Evergreen Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75 percent of the tree species maintain their leaves all year. Canopy is never without green foliage.',
'43': 'Mixed Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. Neither deciduous nor evergreen species are greater than 75 percent of total tree cover.',
'45': 'Shrub-Forest - Areas identified as currently shrub, but showing spectral properties of transitioning to future forest.',
'46': 'Herbaceous-Forest - Areas identified as currently grass, but showing spectral properties of transitioning from being either a past forest or to future shrub-forest.',
'52': 'Shrub/Scrub - Areas dominated by shrubs; less than 5 meters tall with shrub canopy typically greater than 20% of total vegetation. This class includes true shrubs and perennially stunted trees from environmental conditions.',
'71': 'Grassland/Herbaceous - Areas dominated by grammanoid or herbaceous vegetation, generally greater than 80% of total vegetation. These areas are not subject to intensive management such as tilling, but can be utilized for grazing.',
'81': 'Pasture/Hay - Areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing or the production of seed or hay crops, typically on a perennial cycle. Pasture/hay vegetation accounts for greater than 20 percent of total vegetation.',
'82': 'Cultivated Crops - Areas used for the production of annual crops, such as corn, soybeans, vegetables, tobacco, and cotton, and also perennial woody crops such as orchards and vineyards. Crop vegetation accounts for greater than 20 percent of total vegetation. This class also includes all land being actively tilled.',
'90': 'Woody Wetlands - Areas where forest or shrub land vegetation accounts for greater than 20 percent of vegetative cover and the soil or substrate is periodically saturated with or covered with water.',
'95': 'Emergent Herbaceous Wetlands - Areas where perennial herbaceous vegetation accounts for greater than 80 percent of vegetative cover and the soil or substrate is periodically saturated with or covered with water.'}
Set colors for land cover classes
hex codes follow color scheme from us-land-cover.
len(nlcd_lookup)
= ['#5475a8', '#ffffff', '#e8d1d1', '#e29e8c', '#ff0000', '#b50000', '#d2cdc0', '#85c77e', '#38814e', '#d4e7b0', '#dcca8f', '#d4e7b0', '#dcca8f', '#e2e2c1', '#fbf65d', '#ca9146', '#c8e6f8', '#64b3d5']
nlcd_colors len(nlcd_colors)
18
Create dataframe from NLCD lookup dictionary
= pd.DataFrame.from_dict({'code': [int(i) for i in list(nlcd_lookup.keys())], 'category': list(nlcd_lookup.values()), 'colors': nlcd_colors})
nlcd_lookup_df 'land cover'] = nlcd_lookup_df['category'].apply(lambda name: name.split(' -')[0])
nlcd_lookup_df[ nlcd_lookup_df
code | category | colors | land cover | |
---|---|---|---|---|
0 | 11 | Open Water - All areas of open water, generally with less than 25% cover or vegetation or soil | #5475a8 | Open Water |
1 | 12 | Perennial Ice/Snow - All areas characterized by a perennial cover of ice and/or snow, generally … | #ffffff | Perennial Ice/Snow |
2 | 21 | Developed, Open Space - Includes areas with a mixture of some constructed materials, but mostly … | #e8d1d1 | Developed, Open Space |
3 | 22 | Developed, Low Intensity -Includes areas with a mixture of constructed materials and vegetation…. | #e29e8c | Developed, Low Intensity |
4 | 23 | Developed, Medium Intensity - Includes areas with a mixture of constructed materials and vegetat… | #ff0000 | Developed, Medium Intensity |
5 | 24 | Developed, High Intensity - Includes highly developed areas where people reside or work in high … | #b50000 | Developed, High Intensity |
6 | 31 | Barren Land (Rock/Sand/Clay) - Barren areas of bedrock, desert pavement, scarps, talus, slides, … | #d2cdc0 | Barren Land (Rock/Sand/Clay) |
7 | 41 | Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th… | #85c77e | Deciduous Forest |
8 | 42 | Evergreen Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th… | #38814e | Evergreen Forest |
9 | 43 | Mixed Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 2… | #d4e7b0 | Mixed Forest |
10 | 45 | Shrub-Forest - Areas identified as currently shrub, but showing spectral properties of transitio… | #dcca8f | Shrub-Forest |
11 | 46 | Herbaceous-Forest - Areas identified as currently grass, but showing spectral properties of tran… | #d4e7b0 | Herbaceous-Forest |
12 | 52 | Shrub/Scrub - Areas dominated by shrubs; less than 5 meters tall with shrub canopy typically gre… | #dcca8f | Shrub/Scrub |
13 | 71 | Grassland/Herbaceous - Areas dominated by grammanoid or herbaceous vegetation, generally greater… | #e2e2c1 | Grassland/Herbaceous |
14 | 81 | Pasture/Hay - Areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing … | #fbf65d | Pasture/Hay |
15 | 82 | Cultivated Crops - Areas used for the production of annual crops, such as corn, soybeans, vegeta… | #ca9146 | Cultivated Crops |
16 | 90 | Woody Wetlands - Areas where forest or shrub land vegetation accounts for greater than 20 percen… | #c8e6f8 | Woody Wetlands |
17 | 95 | Emergent Herbaceous Wetlands - Areas where perennial herbaceous vegetation accounts for greater … | #64b3d5 | Emergent Herbaceous Wetlands |
Create categorical colormap from lookup dataframe
= {}
color_lookup for i in range(len(nlcd_lookup_df)):
= nlcd_lookup_df['land cover'][i]
key =nlcd_lookup_df['colors'][i]
color_lookup[key]
print(color_lookup)
{'Open Water': '#5475a8', 'Perennial Ice/Snow': '#ffffff', 'Developed, Open Space': '#e8d1d1', 'Developed, Low Intensity': '#e29e8c', 'Developed, Medium Intensity': '#ff0000', 'Developed, High Intensity': '#b50000', 'Barren Land (Rock/Sand/Clay)': '#d2cdc0', 'Deciduous Forest': '#85c77e', 'Evergreen Forest': '#38814e', 'Mixed Forest': '#d4e7b0', 'Shrub-Forest': '#dcca8f', 'Herbaceous-Forest': '#d4e7b0', 'Shrub/Scrub': '#dcca8f', 'Grassland/Herbaceous': '#e2e2c1', 'Pasture/Hay': '#fbf65d', 'Cultivated Crops': '#ca9146', 'Woody Wetlands': '#c8e6f8', 'Emergent Herbaceous Wetlands': '#64b3d5'}
Add land cover lookup information to geodataframe
= huc12_basins.merge(nlcd_lookup_df, left_on='TEXT', right_on='code', how='inner', copy=True)
huc12_basins 'TEXT'] = huc12_basins['TEXT'].apply(str)
huc12_basins[ huc12_basins.head()
geometry | huc12 | tohuc | areaacres | areasqkm | name | hutype | humod | states | noncontrib | TEXT | code | category | colors | land cover | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | MULTIPOLYGON (((-74.59898 42.46011, -74.59742 42.45643, -74.59778 42.45526, -74.59679 42.45407, … | 020401010101 | 020401010102 | 20836.201 | 84.321 | Town Brook-Headwaters West Brach Delaware River | S | NM | NY | 0 | 41 | 41 | Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th… | #85c77e | Deciduous Forest |
1 | MULTIPOLYGON (((-74.69614 42.42722, -74.68906 42.42808, -74.68111 42.42365, -74.67442 42.42411, … | 020401010102 | 020401010103 | 16095.450 | 65.136 | Betty Brook-Headwaters West Brach Delaware River | S | NM | NY | 0 | 41 | 41 | Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th… | #85c77e | Deciduous Forest |
2 | MULTIPOLYGON (((-74.77568 42.37921, -74.77194 42.37934, -74.77123 42.37737, -74.76888 42.37515, … | 020401010103 | 020401010104 | 16327.168 | 66.074 | Rose Brook-Headwaters West Brach Delaware River | S | NM | NY | 0 | 41 | 41 | Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th… | #85c77e | Deciduous Forest |
3 | MULTIPOLYGON (((-74.80026 42.40748, -74.79412 42.40771, -74.79004 42.40642, -74.78631 42.40798, … | 020401010104 | 020401010106 | 17537.197 | 70.971 | Elk Creek-Headwaters West Brach Delaware River | S | NM | NY | 0 | 41 | 41 | Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th… | #85c77e | Deciduous Forest |
4 | MULTIPOLYGON (((-74.73673 42.31516, -74.72832 42.31467, -74.72311 42.31544, -74.71862 42.31483, … | 020401010105 | 020401010106 | 33390.297 | 135.126 | Upper Little Delaware River | S | NM | NY | 0 | 41 | 41 | Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater th… | #85c77e | Deciduous Forest |
Plot zonal majority land cover classes
huc12_basins.hvplot(=True,
geo='land cover',
c=color_lookup,
cmap='50m',
coastline=0.8,
alpha=300,
frame_width="longitude",
xlabel="latitude",
ylabel="Test HUC12 basin ",
title='equal',
aspect='OSM',
tiles )
Validation
- Clip land cover raster to a single HUC12 (first in the huc12_basins geodataframe) and reproject data
- Determine the faction of each class in the clipped raster
- Plot the resulting clipped raster and HUC12
Clip Raster
= huc12_basins.head(n=1).to_crs(data_crs).total_bounds
gdf_bounds = gdf_bounds
minx, miny, maxx, maxy
= ds.rio.clip_box(minx, miny, maxx, maxy)
clipped
= clipped.rio.reproject("EPSG:4326")
clipped_reprojected clipped_reprojected
<xarray.DataArray (band: 1, y: 506, x: 574)> array([[[255, 255, 255, ..., 255, 255, 255], [255, 255, 255, ..., 255, 255, 255], [255, 255, 255, ..., 255, 255, 255], ..., [255, 255, 255, ..., 255, 255, 255], [255, 255, 255, ..., 255, 255, 255], [255, 255, 255, ..., 255, 255, 255]]], dtype=uint8) Coordinates: * x (x) float64 -74.7 -74.7 -74.7 -74.69 ... -74.53 -74.53 -74.53 * y (y) float64 42.47 42.47 42.47 42.47 ... 42.33 42.33 42.33 42.33 * band (band) int64 1 spatial_ref int64 0 Attributes: (12/19) AREA_OR_POINT: Area LAYER_TYPE: thematic OVERVIEWS_ALGORITHM: IMAGINE Nearest Neighbor Resampling STATISTICS_HISTOBINVALUES: 7853863229|0|0|0|0|0|0|0|0|0|0|472399232|9624... STATISTICS_HISTOMAX: 255 STATISTICS_HISTOMIN: 0 ... ... STATISTICS_SKIPFACTORY: 1 STATISTICS_STDDEV: 32.689899151583 scale_factor: 1.0 add_offset: 0.0 long_name: Layer_1 _FillValue: 255
Use categorical describe() method to determine top value in clipped raster
= clipped_reprojected.values[0,:,:]
filtered_array = filtered_array[filtered_array != 255]
filtered_array ="category").describe()
pd.DataFrame(filtered_array.ravel(), dtype# pd.DataFrame(clipped_reprojected.values[0,:,:].ravel(), dtype="category").describe()
0 | |
---|---|
count | 199593 |
unique | 15 |
top | 41 |
freq | 89769 |
Determine the fraction of each category in the clipped raster
Above, we see that the most common category is 41, Deciduous Forest. Clipping the land cover raster to the bounds of the HUC12 watershed polygon leaves in grid cells that fall outside the polygon, but we can now start to understand why 41 is the top land cover class in the zonal statistics results for this HUC12 polygon. Below, each of the land cover classes are listed along with the corresponding proportion of cells in the subset made up of that land cover class. We see that nearly half of the cells, 45.0%, are Deciduous Forest, while the next most-abundant land cover class is 81, Pasture/Hay, at 26.9%.
# Getting unique values and their counts
= np.unique(filtered_array, return_counts=True)
unique_values, counts
# Calculating frequencies
= counts / filtered_array.size
frequencies
list(zip(unique_values, frequencies))
[(11, 0.0045242067607581425),
(21, 0.04165977764751269),
(22, 0.015416372317666452),
(23, 0.005325838080493805),
(24, 0.0013577630478022777),
(31, 0.0003056219406492212),
(41, 0.44976026213344156),
(42, 0.030216490558286114),
(43, 0.1523951240774977),
(52, 0.00445907421602962),
(71, 0.002725546487101251),
(81, 0.26870681837539395),
(82, 0.004053248360413441),
(90, 0.016759104778223684),
(95, 0.002334751218730116)]
Plot single HUC12 and clipped gridded land cover data
= clipped_reprojected.to_dataset(name="nlcd").isel(band=0)
newds newds
<xarray.Dataset> Dimensions: (x: 574, y: 506) Coordinates: * x (x) float64 -74.7 -74.7 -74.7 -74.69 ... -74.53 -74.53 -74.53 * y (y) float64 42.47 42.47 42.47 42.47 ... 42.33 42.33 42.33 42.33 band int64 1 spatial_ref int64 0 Data variables: nlcd (y, x) uint8 255 255 255 255 255 255 ... 255 255 255 255 255
In the figure below, note the prevalence of the medium green corresponding to code 41, Deciduous Forest, and yellow, corresponding to 81, Pasture/Hay.
import matplotlib.pyplot as plt
= gh.plot.cover_legends()
cmap, norm, levels = plt.subplots(figsize=(10,8))
fig, ax 'nlcd'].plot(ax=ax, cmap=cmap, levels=levels, cbar_kwargs={"ticks": levels[:-1]})
newds[=1).plot(ax=ax, facecolor='none', edgecolor='black', linewidth=2) huc12_basins.head(n
<Axes: title={'center': 'band = 1, spatial_ref = 0'}, xlabel='longitude [degrees_east]', ylabel='latitude [degrees_north]'>
Summary
In this notebook we have shown how to access NLCD data through the ClimateR-Catalog. In addition we’ve shown how to generate zonal statistics using GDPTools. The results of the top or most prevalent value in each polygon was added to the GeoDataFrame and plotted. As a simple verification, we plotted the raw data and determined the fraction of each landcover type. The dominant type in the clipped raster associated with the first geometry in the GeoDataFrame conincides with the zonal stats value for “top” or most prevalent value.
Note: we used the ClimateR-Catalog to provide a link to NLCD data, The complete NLCD data are not available in the Catalog. One could also download the NLCD data of interest and run a similar analysis. An Example of calculating raster statistics using GDPTools can be found here