Skip to main content

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Beta

The current list of use cases is a snap shot of what this page will contain in the future. The team is working on improving the use case pages to provide meaningful examples of how active WMA projects are using data and tools. Tell us about your experience using the catalog!

Use Cases


Open Storage Network (OSN) Pod Access

Keywords: osn; data-access; xarray; aws-cli

Domain: Domain Agnostic

Language: NA

Description:

This notebook describes how to access data on the Open Storage Network (OSN) Pod through the AWS command line interface and Python/xarray.


Open Storage Network (OSN) Pod Access

The USGS Water Mission Area is storing some of its spatio temporal data holdings on an Open Storage Network (OSN) Pod set up by the USGS-led Hydro-Terrestrial Earth System Testbed (HyTEST) community. This pod provides 1 PB of usable Ceph Object Storage and is housed at the Massachusetts Green High Performance Computing Center on a high-speed (100+ GbE) network. The data stored on this pod are freely and publicly accessible, meaning there are no egress fees and no credentials are required to access them.

Access

Ceph object storage supports an API that is compatible with the basic data access model of the Amazon S3 API. That means you can access these data in the same way you would access any other data stored in an S3 bucket. You will just need to include the endpoint-url parameter of https://usgs.osn.mghpcc.org/ when making the API call.

AWS Command Line Interface

If you want to use the AWS Command Line Interface to list the contents of a bucket, you would use a command such as:

aws s3 ls s3://mdmf/gdp/ --endpoint-url https://usgs.osn.mghpcc.org/ --no-sign-request

Python/xarray

If you want to open a dataset stored on the OSN Pod using Python, you can use the fsspec and xarray packages:

import fsspec
import xarray as xr

zarr_url = 's3://mdmf/gdp/LOCA_historical.zarr/'

fs = fsspec.filesystem('s3', anon=True, endpoint_url='https://usgs.osn.mghpcc.org/')

ds = xr.open_dataset(fs.get_mapper(zarr_url), engine='zarr', 
                             backend_kwargs={'consolidated':True}, chunks={})

The data holdings on the OSN Pod will be cataloged in the USGS Water Mission Area’s spatio-temporal asset catalog (STAC) , which is currently under development in this public repository . This combination of tools - data storage on the OSN pod, cataloged for access through the STAC - will replace the USGS WMA THREDDS Catalog around Summer 2024, further described in the USGS Geo Data Portal migration to labs.waterdata.usgs.gov blog.