Preservation of Original Digital Field-Recorded Time-Series Data

Date: Tue, 28 Sep 1999 15:24:37 -0400
From: Jane Rose 
To: "A  - Division Chief and Staff",
        "B  - Branch Chiefs and Offices",
        "FO - State, District, Subdistrict and other Field Offices"
        "PO - Project Offices"
CC: "  WRD Archive File,  ",
        "L. Jane Rose" 
Subject: WRD Memorandum No. 99.33, Preservation of Original Digital 
         Field-Recorded Time-Series Data

In Reply Refer To:
Mail Stop 415                                        September 27, 1999


SUBJECT:  Preservation of Original Digital Field-Recorded Time-Series Data

The purpose of this memorandum is to define U.S. Geological Survey (USGS)
Water Resources Division (WRD) policy for preservation of original field-
recorded digital time-series data. This memorandum reaffirms existing
policy requiring preservation of original unaltered field data, with
minor modifications to accommodate improved digital data-collection
technology.  The requirements of WRD Memorandum 92.59 and Open-File
Report 92-56 are still in effect, with two exceptions.  First, to ensure
that the preserved data are usable, a requirement that the data be
preserved in computer-readable (electronic) digital format is added.
Second, the requirement for production and preservation of printed paper
copies of digital field-recorded time-series data is rescinded. 
Additional guidance is provided on procedures to be followed for
archiving of data in electronic format.

It is WRD policy that original hydrologic time-series data
field-recorded in digital formats and used in automatic digital computation 
of hydrologic  records be preserved indefinitely in electronic digital format.  Use of
the electronic digital format is required for efficiency in data storage
and retrieval and to ensure that the data will be usable for computation
if needed in the future. This policy is effective immediately and applies 
retroactively to all of water year 1999 and prior years to the extent
that the necessary electronic digital data files are available for
archiving.  At the District's option, paper copies of original data may
be achieved in addition to the electronic copies.  Paper-copy archives 
produced in prior water years should continue to be preserved.

This memorandum focuses primarily on the preservation of original data 
obtained from digital field recording instruments such as
punched-paper-tape automatic digital recorders (ADR), electronic data
loggers (EDL), satellite data-collection platforms (DCP), and other
telemetry systems. Original data are archived so that finalized
hydrologic records can be validated in case of dispute or re-analyzed if
new improved information should become available.

Original data are defined as the numerical data values received from the
field recording instruments at the first point where they are readily
accessible to USGS hydrologic personnel for inspection and verification,
and before any editing or application of corrections or adjustments is
performed.  The numerical values of the data items are the critical
characteristic, not the recording medium or data format.  If the data
are received in formats that are not readily interpretable by
data-collection personnel, the raw data should be translated or
converted to human-readable format (such as can be displayed by the
"more" command in Unix or on a PC) and should be preserved in that
format.  The conversion of the data from special instrumentation formats
to human-readable format does not constitute a change to the numerical
values of the data and thus does not affect the status of the data as
original data.  Because the translated data set is the one actually
inspected, verified, and used in computation by USGS hydrologic
personnel, this is the principal original data record that should be
preserved.  After this original data record has been inspected,
verified, and archived, the raw data sets, including automatic digital
recorder (ADR) paper tapes and electronic data logger (EDL) diskettes,
need not be preserved permanently. However, the raw data sets should be
retained at least long enough to permit them to be reprocessed in case
errors are discovered in the records-preparation process or in the year
following publication of the final data.  The inspection and
verification process for DCP and EDL data should include checks to
ensure that no corrections or adjustments are applied by the DCP or EDL
or by the software used to process the data for input to ADAPS and to
ensure that format conversions have been performed correctly; the
process for ADR tapes should include checks of peak and minimum stages
to ensure no loss of data in the tape-translation process.  Old ADR
tapes from water years 1998 and before should continue to be retained
indefinitely.  Data not actually used in the computation of records, and
duplicate copies of data used in record computation, need not be
preserved indefinitely.

The National Water Information System (NWIS) does not at present have 
automated capabilities that fully implement the policy for preservation of 
original data.  The NWIS development team is undertaking a thorough
re-examination of ADAPS data-input procedures.  Archiving of original data 
is one of the issues being considered. It is expected that a future
release of ADAPS will include fully automated procedures for archiving
original data.  In the interim, measures will be adopted to make use of
existing ADAPS capabilities to achieve the best feasible level of
archiving.  These measures are described below.

Districts should use the "archive" functions provided by the ADAPS
system as the primary means of archiving original field-recorded digital
data.  ADAPS provides automated archiving functions as part of the
normal processing of ADR paper-tape data and DCP satellite-telemetry
data; these archiving functions are described in the ADAPS
Administrators' manual.  "Archiving" in this sense means copying the
data to a protected directory from which it can be re-copied by the
ADAPS administrator to permanent archival media.  In brief, the DCP data
are archived in the WRD Standard Format produced by the DECODES program,
which translates the raw DCP data and prepares it for input to the ADAPS
Unit Values (UV) file by program std_stor; the ADR data are archived in
the paper-tape-image format produced by program tp_read and input to the
UV file by tp_edit.  In addition, electronic data logger data and
non-DCP telemetry data that are processed through DECODES and std_stor
are archived in substantially the same way as the ADR data, but in the
WRD Standard Format. Both the WRD Standard Format and the ADR-tape-image
format are human readable and are expected to be computer readable
indefinitely; thus they are suitable for archiving.

Districts should have procedures in effect to ensure that data are moved 
from the ADAPS archive directories to permanent archives on a regular 
schedule.  Procedures should be established to ensure that the data 
automatically "archived" by ADAPS are not subsequently modified or
deleted; such procedures might include frequent transfer of the data
from the ADAPS "archive" directories to directories accessible only to
the data base administrator.  Districts should regularly save the
"archived" data to off-line permanent archival storage media.   The
choice of medium may be governed by District-specific circumstances,
but, in general, CD-ROM is an acceptable medium.  The organization of
the data on the archive medium also may be governed by District-specific
circumstances, but, in general, should be as simple as possible to
minimize the need for maintenance or conversion as computer hardware and
software evolve.  The original-data archive will not be heavily used, so
efficient access is of less concern than simplicity and durability. 
Therefore, a sequential tape-like organization such as is produced by
the Unix "tar" command is preferable to a direct-access file-system
organization. To ensure future usability of the data, data-compression
utilities should not be used.  The timing and frequency generally may be
governed by factors such as availability of on-line file storage space
and capacity of archival storage media; if system backup procedures are
effective, data security will not be the overriding 
consideration.  For maintenance of a routine archiving operation and for 
ease of retrieval, the data should be moved to permanent archival media at 
yearly intervals in water-year units that are organized by station number, 
recorder type (DCP, EDL, ADR, and any others), and date within the year. 

Checking procedures should be established to ensure that the archive 
operations have been completed satisfactorily and that the archived data
are retrievable and usable.  For security against fire, flood, and other 
disasters, multiple copies of the archived data should be made and
redundant copies should be stored in multiple secure storage facilities,
at least one of which should be off site.  All procedures should be
documented in the District's quality-assurance (QA) plan and should be
followed rigorously.

Districts should recognize that there is no truly permanent archival 
medium.  Therefore, all Districts should have electronic digital archive 
maintenance procedures in effect to ensure no loss of archived data as a 
result of deterioration of media, obsolescence of hardware or software, or 
other factors.  The procedures should include regular retrieval and
checking of data from archive media and copying to fresh media as
required.  The procedures should be documented in the District QA plan
and should be adhered to rigorously.

To ensure the security of data until it is written onto a permanent
archival medium, all Districts should have system-backup procedures in
effect to ensure no loss of irreplaceable data in the event of fire,
flood, computer system malfunction, or user error.  The procedures
should include checks to ensure that the backups have been completed
satisfactorily and that the backed-up data are usable.  Provision should
be made for storage of multiple copies of the backup data in secure
offsite storage facilities until permanent copies have been made.   An
absolutely reliable backup procedure is essential to the integrity of
any electronic archiving system that does not depend on hard copy
media.  The backup procedures should be documented in the District QA
plans and should be adhered to rigorously.

Districts should consult as necessary with their Regional computer
specialists in the development of adequate procedures for archive
creation and maintenance and system backup.

In addition to requiring preservation of electronic digital data, WRD
and USGS policies require archiving of other hydrologic records.  These
requirements are set forth in WRD Memorandum 92.59, Open-File Report 92-56, the
USGS records management handbook, and the WRD records disposition
schedule.  The WRD Memorandum and the open-file report are available on
the World-Wide Web.  The latter two items are available on the World Wide Web 
at the USGS Office of Program Support home page.  These requirements,
except for the requirement to produce paper copy of 
electronically recorded digital data, are still in effect.

Since copies of the raw data (ADR paper tapes, EDL files, and so forth)
need not be retained permanently, it is particularly important that
records be made of any changes made during input of the original data. 
Station analyses should contain a detailed description of any manual
editing or data estimation applied to the original data, in addition to
the usual discussions of ADAPS data corrections and shifts.  Any edits
of ADR tape data to correct for incomplete or dropped punches should be
carefully described; any values inserted for this purpose should be
clearly recognizable as artificial values (for example, 9999); any such
values should be further corrected as necessary in ADAPS using the
uv_edit or Hydra programs.   Any ADR tapes or raw data files saved in
water year 1998 or before should continue to be retained indefinitely.

If station analyses are produced by word processors or other computer 
procedures, it would be highly desirable to save the final versions of
these documents in digital format along with the original data on the
permanent archive medium.  The archival value of these documents would
be much enhanced by also saving copies of ADAPS listing files of the
final daily-values table, the rating descriptors, and the shift and
data-correction files for the water year.   To ensure that the documents
will be readable in the future despite changes in word-processor
formats, it is recommended that the documents be saved in plain ASCII
character format.

In addition to preserving original field-recorded data, Districts should 
recognize that the computed unit values of discharge, corrected gage
height, and other parameters are valuable scientific resources, even
though they are not published in the annual data report.  All Districts
should store and preserve these unit values with the same care with
which the daily-values records and original data are preserved.   It is
recognized that these values are not always determined with the same
degree of refinement as the published daily discharge data; nonetheless,
these values can be released with suitable caveats to parties who make
specific requests.   The ADAPS programs "uv_archive" and "uv_restore"
should be used to preserve the unit values if it is desired to save them
off line.  The off-line storage media used for unit values should be
included in the District archive maintenance plan.

Finally, the data on old ADR tapes from water years 1998 and before are 
original data recorded in digital format.  These data have continuing 
scientific value as irreplaceable observations of hydrologic events.  We 
have a mandate to preserve such data.  Our technical ability to fulfill
this mandate will fade as the paper tape media and tape readers
deteriorate.  It is recognized that recovery of old ADR tape data is a
monumental task.  Nonetheless, Districts are encouraged to seek out
opportunities to recover and preserve the old ADR data in electronic
format.  Highest priority should be given to periods of record
containing extreme hydrologic events.

                                      Thomas H. Yorke
                                      Chief, Office of Surface Water

WRD Distribution:  A, B, FO, PO