In Reply Refer To:
Mail Stop 440
December 13, 2012
WATER MISSION AREA POLICY MEMORANDUM NO. 13.03
Subject: Policy Prohibiting the Creation of Duplicate Sites and Data in the National Water Information System
NWIS is a distributed database used for the storage and retrieval of USGS data for the Water Mission Area. All data in NWIS are associated with a site, which is an agency code plus a station identification number. The purposes of this memo are to establish a policy prohibiting the creation of new duplicate sites in the National Water Information System (NWIS) and to provide guidance on local policy and workflow changes that will allow more than one Water Science Center to share database access to a site. For this memorandum, a duplicate site is defined as a site that is assigned an identical agency code and station identification number in two or more NWIS hosts or NWIS databases.
BACKGROUND
As of 2007, there are 45 NWIS databases distributed across the Water Science Center servers. Prior to the implementation of the distributed system (which began in the early 1980s), all sites and their associated data were unique in the one national database and any user with appropriate security access could edit or add sites or data to the database. After the initial distributed system was created, few users had access to NWIS data on the databases beyond the user’s own Water Science Center. Increased focus on system security led to even further limits on the access of remote users to other NWIS databases. NWISWeb was launched in 2000 to provide public access to nationally aggregated data from NWIS, but has never provided access to all USGS data.
In response to workflow barriers resulting from the distributed nature of NWIS servers, users create duplicate sites when more than one Water Science Center collects, enters, edits, or analyzes data for the same site. Copies of the data and information from the original site usually are included when the duplicate is created, but duplicate sites rarely remained exact duplicates for long. The site information and water data stored for such sites on the different NWIS hosts become inconsistent when one copy of the data is updated or changed, and the other is not. The scope of the duplicate site problem continued to grow over time. As of late 2011, there were more than 109,700 sites with the same agency code + station identification number that had data in more than one physical database.Data from each NWIS database are aggregated routinely at the national level for display and retrieval via the NWISWeb system ( http://waterdata.usgs.gov/nwis). Database consolidation is difficult when duplicate sites and data exist because there are no current tools for easily merging datasets. Some of the most common issues related to the duplicate sites and data are:
Long-term plans for NWIS (the NWIS Vision project) include a transition
from a distributed system to a single aggregated national database. One of
the critical steps in this process is the management of duplicate sites and
data in NWIS. NWIS and the NWIS user groups are analyzing the scope of the
problem and have begun to design automated tools and rules that will merge
duplicate datasets. Phase 1 of the project will result in a retrieve-only,
internal USGS NWIS database that will allow USGS users to retrieve the
preferred set of data from duplicated sites. Subsequent phases of the
project will create a USGS NWIS database that excludes duplicate data and
internal-use only data. This aggregated national database will become the
data source for public USGS applications such as NWISWeb, the NAWQA data
warehouse, and other applications. The final phase of the project will
result in a transactional, centralized national NWIS database system which
will replace the current distributed NWIS system.
For these and other reasons, a policy is needed now to prevent further
escalation of the duplicate sites and data problems in NWIS. This
memorandum sets the policy and provides guidance for how to establish
shared sites in NWIS.
GUIDANCE
The creation of new duplicate sites in NWIS is no longer permitted. Water Science Centers must use due diligence to ensure that creating a new NWIS site in their NWIS host does not duplicate any other site existing in another NWIS host. Creation of duplicate water-use sites to describe interstate transfers of water is allowed. The WaterSMART Public Supply database planners will supply guidance for using predetermined site numbers that will allow for aggregation to a national database. The NWIS host for any new site should be determined after discussion among the affected Water Science Centers, with an emphasis on minimizing the overall workflow required to maintain that particular site. If the workflow to maintain the site is approximately evenly distributed among one or more hosts, then the logical NWIS host will be the state in which the site is located.
Project personnel located outside the NWIS host area of the new site must
be allowed to share in the use of that site in NWIS. The new shared site
will exist on only one NWIS host, but access to update the information and
data must be shared by all Water Science Centers that contribute data and
information for the new site. Sharing access to a site does not violate
any NWIS or DOI security policies, nor does it change any existing USGS
policies for data processing, review, and publication.
Original records for a shared site, such as field forms, photos, site
folders with processed NWIS records and review comments, etc., may reside
in files of more than one Water Science Center only if all Water Science
Centers maintain a cross-reference identifying the location of additional
original record in their files. Otherwise, all original records for a
shared site will be stored with the Water Science Center hosting the NWIS
server with that site.
Communication and collaboration are keys to the implementation of this
policy. Before a new site is created, the database administrator must
ensure that the site does not currently exist in any other NWIS server by
checking the national list of current NWIS sites located at
http://nwis.usgs.gov/site_check. If the site currently exists, then a duplicate site may not be created. Instead, the Database Administrator,
Project Chief, or Data Chief will begin communications with their
counterparts in the Water Science Center where the current site is hosted.
Guidelines on how to set up shared access to sites and data in NWIS are
located here:
http://nwis.usgs.gov/communications/2012news/121212WSC_access_Memo13.03.html
Please direct any questions about how to implement this policy to: GS-W_NWIS_SOS@usgs.gov
Terri "TJ" Moore //s// Terri "TJ" Moore
Acting Chief, Office of Water Information