Use Cases
Open Storage Network (OSN) Pod Access
Keywords: osn; data-access; xarray; aws-cli
Domain: Domain Agnostic
Language: NA
Description:
This notebook describes how to access data on the Open Storage Network (OSN) Pod through the AWS command line interface and Python/xarray.
Open Storage Network (OSN) Pod Access
The USGS Water Mission Area is storing some of its spatio temporal data holdings on an Open Storage Network (OSN) Pod set up by the USGS-led Hydro-Terrestrial Earth System Testbed (HyTEST) community. This pod provides 1 PB of usable Ceph Object Storage and is housed at the Massachusetts Green High Performance Computing Center on a high-speed (100+ GbE) network. The data stored on this pod are freely and publicly accessible, meaning there are no egress fees and no credentials are required to access them.
Access
Ceph object storage supports an API that is compatible with the basic data access model of the Amazon S3 API. That means you can access these data in the same way you would access any other data stored in an S3 bucket. You will just need to include the endpoint-url
parameter of https://usgs.osn.mghpcc.org/
when making the API call.
AWS Command Line Interface
If you want to use the AWS Command Line Interface to list the contents of a bucket, you would use a command such as:
aws s3 ls s3://mdmf/gdp/ --endpoint-url https://usgs.osn.mghpcc.org/ --no-sign-request
Python/xarray
If you want to open a dataset stored on the OSN Pod using Python, you can use the fsspec
and xarray
packages:
import fsspec
import xarray as xr
zarr_url = 's3://mdmf/gdp/LOCA_historical.zarr/'
fs = fsspec.filesystem('s3', anon=True, endpoint_url='https://usgs.osn.mghpcc.org/')
ds = xr.open_dataset(fs.get_mapper(zarr_url), engine='zarr',
backend_kwargs={'consolidated':True}, chunks={})
The data holdings on the OSN Pod will be cataloged in the USGS Water Mission Area’s spatio-temporal asset catalog (STAC) , which is currently under development in this public repository . This combination of tools - data storage on the OSN pod, cataloged for access through the STAC - will replace the USGS WMA THREDDS Catalog around Summer 2024, further described in the USGS Geo Data Portal migration to labs.waterdata.usgs.gov blog.