In Reply Refer To:
Mail Stop 412 January 9, 2002
Office of Water Quality Technical Memorandum 2002.06
Subject: Data Bases--Electronic transmission of analytical laboratory data for entry into QWDATA Release 4_1
The Water Resources Discipline (WRD) of the U.S. Geological Survey (USGS) collects a vast array of samples from numerous media (water, sediment, biological tissue, etc.), which are analyzed at various laboratories. The Water-Quality System (QWDATA) of the National Water Information System (NWIS) is the primary, official repository of all discrete sample data collected by WRD. The National Water Quality Laboratory (NWQL) and the Ocala Water Quality and Research Laboratory (OWQRL) already allow for electronic transmission and loading of analytical data into QWDATA. Data transmissions for numerous other laboratories, including both internal USGS labs (Geologic Division, WRD research labs, District labs) and external (university, other government agency, and commercial) labs are in a variety of paper and electronic formats. The purpose of this memorandum is to provide guidance for the electronic transmission of analytical data into QWDATA for these other laboratories.
Prior to the QWDATA 4_1 Release, the batch format (1- and
*-cards) for QWDATA was difficult for laboratories to produce. This has had several undesirable
consequences. These include: (1) data
not being entered into QWDATA; (2) data being incompletely or incorrectly
entered into QWDATA (desired fields are not populated; transcription errors,
etc.); (3) District users and database administrators independently inventing
methods to manage data; and (4) District users and database administrators
spending an inordinate amount of time entering data into QWDATA by hand. The QWDATA 4_1 Release provides a new
tab-delimited batch format for electronic transmission of data. This data format should be easier for
laboratories to produce and hopefully will yield benefits, in terms of time
saved, consistency, completeness, and accuracy of data loading into
QWDATA. The “1- and *-card” format can
still be used for batch processing, however, its use is being phased out. The “1- and-* card” format is not addressed
in this memo, but the format is described in the User’s Manual. For additional information, users can
reference the QWDATA User’s Manual available on-line at: http://wwwnwis.er.usgs.gov/conversion/nwisdocs4_1/index.html.
Attachments to this memo provide information on the QWDATA 4_1 Release batch format. Attachment 1 is designed to be an independent document that can be provided directly to laboratories that will be producing the batch files. References to internal documents are not included in this attachment. Attachment 2 is for internal WRD users and provides more detail on
the batch processing.
It includes references to internal documents and QWDATA software
operations. Users will need to provide
information for selected attributes to laboratories since all documentation on
QWDATA processing is not available to the public. This memo and attachments are available on-line (internal USGS
only) at: http://wwwok.cr.usgs.gov/nawqa/phoenix/training/training.html
Questions can be emailed to phoenix@usgs.gov.
Stephen K. Sorenson
Acting Chief, Office of Water Quality
This memorandum does not supersede any other Office of Water Quality Technical Memorandum.
Distribution: All WRD Employees
Attachment 1. QWDATA tab-delimited batch-file format
Data transmittal for automated data loading into the U.S. Geological Survey’s Water-Quality System (QWDATA) is done through the use of batch files. The batch processing utilizes two files: a sample-level batch file and a result-level batch file. In general, the sample-level file contains information about the sample collection activity, such as the site identification number, sample collection date and time, and sample medium. A sample collection activity may produce multiple bottles that are sent to laboratories for different analytical requests. The result-level file contains information about the individual analytical results. One row in the result table is for the analytical result for one chemical constituent. The relationship between the two-files is one-to-many; that is, a row of sample-level data may be related to many rows of result-level data.
The batch-format files are tab-delimited ASCII text files that require specific information to be displayed in specific fields. Each attribute must be separated from the previous attribute with a <tab>. Missing attributes must have a <tab> inserted. All fields are left justified. The files are related through a surrogate primary key called the sample integer (SINT). The description of the format of the sample-level and result-level files follows.
Sample-level batch file:
The format of the sample-level batch file is defined in table 1. The tab-delimited format is designed to accommodate 19 attributes for a row of data. Many of these attributes are specific to field-collection activities and are not required for the batch processing of laboratory data. Therefore, these attributes do not need to be stored in a laboratory’s database and will not have values transmitted in the batch files. However, the positions must be held in the file, so a <tab> must be inserted to separate these fields. Mandatory fields that the laboratory must supply in the batch file are indicated.
Table 1: Sample-level batch file format
|
(SINT) |
Integer used to link sample and result information between the two batch files |
|||
|
|
|
|||
|
|
|
|||
|
|
||||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
|
|||
|
|
||||
|
|
Further information for the sample-level attributes follows:
Column 1: Sample integer (Mandatory, Primary Key)
The laboratory must generate the sample integer. This integer is critical for linking the sample-level data to the result-level data. The maximum length for this integer is 18 digits and they must be in sequential order in the file.
Column 2: User code
Nothing is required for this field, but it must be separated with a <tab>.
Column 3: Agency code
Nothing is required for this field, but it must be separated with a <tab>.
Column 4: Station number (Mandatory)
Station number is an 8-digit or 15-digit number assigned to a sampling site. The field contains only numeric characters. The USGS personnel must supply the station number to the laboratory when samples are submitted for analysis.
Column 5: Begin date and time (Mandatory)
Begin date and begin time describe when the sample was collected (or the begin time for a sample that is collected over a range of dates and/or times). The date format is yyyymmdd. The time format is hhmm. These fields are concatenated together (not separated by a <tab>). The USGS personnel must supply this information to the laboratory when samples are submitted for analysis.
Column 6: End date and end time
The end date is the end date of a sample that is collected or over a range of dates. The format is yyyymmdd. The end time is the end time of a sample that is collected over a range of dates and/or times. The format is hhmm. These fields are concatenated together (not separated by a <tab>). The end date and time are usually blank. If it is not blank, the USGS personnel must supply this information to the laboratory when samples are submitted for analysis.
Column 7: Medium code (Mandatory)
Medium code is the medium from which the sample was collected (i.e. surface water, ground water, sediment, tissue). This field has a valid domain in the QWDATA software. The USGS personnel must supply a valid value for the medium code to the laboratory when samples are submitted for analysis.
Column 8: Lab number
This field is reserved for laboratory use. Lab number is the analytical laboratory identification number given to a bottle or group of bottles for samples that are received together in a shipment. The field can contain numeric and alpha characters.
Column 9: Project number
Nothing is required for this field, but it must be separated with a <tab>.
Column 10: Aquifer code
Nothing is required for this field, but it must be separated with a <tab>.
Column 11: Sample type
Nothing is required for this field, but it must be separated with a <tab>.
Column 12: Analysis status
Nothing is required for this field, but it must be separated with a <tab>.
Column 13: Hydrologic condition
Nothing is required for this field, but it must be separated with a <tab>.
Column 14: Hydrologic event
Nothing is required for this field, but it must be separated with a <tab>.
Column 15: Analysis source
Nothing is required for this field, but it must be separated with a <tab>.
Column 16: Tissue_id
Nothing is required for this field, but it must be separated with a <tab>.
Column 17: Body part code
Nothing is required for this field, but it must be separated with a <tab>.
Column 18: Lab sample comment
Lab sample comment is a free-form text field to hold information from the lab about a sample that cannot be defined by the coded information in the system. This field is reserved for laboratory use. Laboratories can send text up to 300 characters in length. The lab sample comment is not mandatory.
Column 19: Field sample comment
Nothing is required for this field, but it must be separated with a <tab>.
Result-level batch file:
The result-level file includes analytical results and metadata for each analytical result. The tab-delimited format is designed to accommodate 18 attributes for a row of data. Mandatory fields that the laboratory must supply in the batch file are indicated. The format of the result-level batch file is defined in table 2.
Table 2. Result-level batch file format
|
Column name |
Description |
Format |
Mandatory field
for batch file |
|
|
(SINT) |
Integer used only to link sample and result information between the two batch files |
|||
|
|
|
|||
|
|
||||
|
|
|
|||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
||||
|
|
|
Column 1: Sample integer (Mandatory)
The laboratory must generate the sample integer. This integer is critical for linking the result-level data to the sample-level data. The maximum length for this integer is 18 digits and the integers must be in sequential order in the file.
Column 2: Parameter code (Mandatory)
Each analytical result value must be reported with a parameter code that defines the constituent name, phase, reporting form, and reporting units. USGS personnel will provide a look-up table for use by the laboratory for the needed parameter codes.
Column 3: Result value (Mandatory)
The result value is a numeric value produced by an analytical method for a constituent. The field is stored as float. The ‘#’ sign can be used to tell the software that the value is “null” (missing). The laboratory must supply a null value qualifier code or a null value remark code (M, N, or U) if the laboratory sends a null value.
Column 4: Remark code (Only mandatory if result value is null and null-value qualifier code is
not supplied)
Remark codes provide additional information about the magnitude (or absence) of a value. A remark code is not required on every value; it is only required where it is needed to avoid misinterpretation of a value. If a remark is supplied, the remark code is viewed with the value for publication or data dissemination. If the laboratory chooses to transmit null values, the laboratory must also provide a null value remark code (M, N, U) or Null-value qualifier code to identify why the value is missing. The remark codes are case sensitive. The remark codes are listed in table 3.
Table 3: Remark codes
|
Code |
|
|
E |
|
|
< |
|
|
> |
|
|
M |
|
|
N |
|
|
U |
|
|
A |
|
|
Analyte was detected in both the environmental sample and the associated blanks. |
|
|
S |
Column 5: Quality-assurance code
Nothing is required for this field, but it must be separated with a <tab>.
Column 6: Method codes
An analytical method can be coded in the database by use of 1-character code in the method code field. The USGS personnel must supply the laboratory with a look-up table of the parameter code-method code pair that defines the chemical constituent and analytical method. A blank method code is allowed. When reported, method codes are all upper case.
Column 7: Result rounding code
Nothing is required for this field, but it must be separated with a <tab>.
Column 8: Value-qualifier codes
Value-qualifier codes provide additional qualifying information about the value. Up to three value qualifiers can be stored with any single result. The value qualifiers are case sensitive (all lower case). If a laboratory supplies more than one value qualifier code, the individual codes are concatenated (not separated by <tab>s). Value-qualifier codes are not mandatory. The value- qualifier codes are listed in table 4.
Table 4. Value-qualifier codes
|
Value-qualifier
codes |
Description |
|
d |
Diluted sample: method high range exceeded |
|
x |
Analyte interference from environmental sample matrix |
|
v |
Analyte detected in laboratory blank |
|
s |
Instrument sensitivity problem |
|
q |
Insufficient sample received |
|
m |
Highly variable compound using this method, questionable precision and (or) accuracy. Citation of OFR or NWQL Technical Memo in result comment. |
|
w |
High variability: questionable precision and (or) accuracy. Cause explained in result comment. |
|
f |
Sample field preparation problem. Problem described in result comment. |
|
l |
Sample lab preparation problem. Problem described in result comment. |
|
o |
Result determined by alternate method. Reason described in result comment. |
|
i |
Result may be affected by interference(s). |
|
a |
Value was extrapolated above highest calibration standard, method range, or instrument linear range. |
|
b |
Value was extrapolated below lowest calibration standard, method range, or instrument linear range |
|
n |
Below the NDV. |
|
t |
Below the long-term MDL. |
|
r |
Quantification verified by rerun using the same method |
|
z |
Quantification verified by rerun using a different method |
|
h |
Compound identification verified by rerun using a different method; Alternate method identified in result comments. |
|
p |
Value reported is preferred; explanation in result comments |
|
u |
Value reported not confirmable due to interference |
|
y |
Sample variability described in result comments. |
|
c |
See laboratory comments for this result |
|
k |
Results based upon colony counts outside the acceptable range |
|
g |
Biological organism count less than 0.5 percent; may be only observed. |
|
j |
Biological organism count greater than or equal to 15 percent (dominant) |
|
& |
Biological organism estimated as dominant |
Column 9: Report level (Only mandatory if report level type code is populated)
The report level is associated with the analytical method. This is usually the minimum value that is reported for the laboratory method at the time the sample is analyzed, based on some sort of statistical analysis of the method. If a value falls above this value, usually the value is reported without a remark code. If a report level is included in a batch file, a report level type code must also be present.
Column 10: Report level type code (Only mandatory if report level is populated
The report level type code identifies the type of report level used for the method that is associated with the report level. If a report level type code is included in a batch file, a report level must also be present. The domain for the report level type code is shown in table 5.
Table 5. Report level type codes
Column 11: Data-quality indicator code
Nothing is required for this field, but it must be separated with a <tab>. If unapproved methods are used, this field should be coded with a ‘T’ (in review).
Column 12: Null-value qualifier code (Only mandatory if result value is null and a null value
remark code is not supplied)
The laboratory has the ability to transmit null results for a given parameter code. A null result means that a requested analytical procedure was not performed or the result was not quantified. If the laboratory chooses to transmit null values, the laboratory must also provide a null-value qualifier code or a null value remark code (M, N, U) to identify why the value is missing. The one-character code is case sensitive (lower case). The domain for the null-value qualifier code is listed in table 6.
Table 6: Null-Value Qualifiers
Column 13: Preparation set identifier
Preparation set identifier is a field used to store the set identification code of the preparation set at the laboratory. The field can be variable in length, up to a maximum of 12 characters. The field can contain mixed numeric and alpha characters. The preparation set identifier is not mandatory.
Column 14: Analysis set identifier
Analysis set identifier is a field used to store the set identification code of the analysis set at the laboratory. The field can be variable length, up to a maximum of 12 characters. The field can contain mixed numeric and alpha characters. The analysis set identifier is not mandatory.
Column 15: Analysis date
The analysis date field is used to identify the date when an analysis was performed at the laboratory. The format of the date is yyyymmdd. The analysis date is not mandatory.
Column 16: Preparation date
The preparation date field is used to identify the date when an analysis was performed at the laboratory. The format of the date is yyyymmdd. The preparation date is not mandatory.
Column 17: Laboratory result comment
Result lab comment is a free-form text field to hold information from the lab about a result that cannot be defined by the coded information in the system. This field is reserved for laboratory use. Laboratories can send text up to 300 characters in length. The result lab comment is not mandatory.
Column 18: Field result comment
Nothing is required for this field, but it must be separated with a <tab>.
The following examples are provided to illustrate how the sample-level batch file (table 7) and the result-level batch file (table 8) look with test data populated in the fields.
Table 7: Sample-level batch file format example
|
|
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
Table 8: Result-level batch file format example
|
0200100376 |
00028 |
4015 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
||||||||||||
|
|
|
|
|
|
|||||||||||||
|
|
|
|
|
||||||||||||||
|
|
|
|
|
|
|
This attachment describes additional information on batch processing of files that WRD users should be familiar with when requesting transmission of electronic data from laboratories. Certain fields are required in QWDATA that are not required for the laboratory batch file. If users log samples into QWDATA prior to the batch entry of laboratory results, then the user can properly code all the required QWDATA fields. If samples are not logged into QWDATA prior to batch entry of laboratory results, the QWDATA software will set default values for selected attributes since laboratories will not be storing or transmitting these attributes. This section further describes the mandatory fields identified in Attachment 1 and the defaults for selected attributes that will be set during batch processing for samples not logged in prior to batch entry of laboratory data.
Sample-level batch file:
Sample integer
This field is used only for batch processing and is not stored in QWDATA.
Agency code
This field will default to ‘USGS’ for samples not logged in prior to batch processing of laboratory data.
Station number
Station number is the unique 8-digit or 15-digit number that is assigned to a sampling site. The field contains only numeric characters. The 8-digit numbers are generally downstream order numbers used for surface-water locations on streams or rivers. The 15-digit numbers are generally a combination of the latitude, longitude, and sequence number of the location, most often used for wells. A sequence number is used to separate locations that are very close to each other. This field must be supplied to the laboratory when samples are submitted for analysis.
Begin date and time
Begin date and begin time describe when the sample was collected (or the begin time for a sample that is collected over a range of dates and/or times). This field must be supplied to the laboratory when samples are submitted for analysis.
End date and time
The end date is the end date and end time of a sample that is collected over a range of dates and/or times. The end date and time are usually blank. If it is not blank, this field must be supplied to the laboratory when samples are submitted for analysis.
Medium code
Medium code is the medium from which the sample was collected (i.e. surface water, ground water, sediment, tissue). This field must be supplied to the laboratory when samples are submitted for analysis. Medium codes 0-9 and A-P are used for environmental samples. Medium codes Q-Z are used for quality-control samples. The most commonly used codes are 6 for ground water and 9 for surface water. A complete list of the valid medium codes can be found in Appendix A of the User Manual.
Project number
This field will default to blank for samples not logged in prior to batch processing of laboratory data.
Aquifer code
This field will default to blank for samples not logged in prior to batch processing of laboratory data.
Sample type
This field will default to ‘9’ (regular) for samples not logged in prior to batch processing of laboratory data.
Analysis status
This field will default to ‘H’ (initial entry) for samples not logged in prior to batch processing of laboratory data.
Hydrologic condition
This field will default to ‘A’ (not determined) or ‘X’ (not applicable for medium codes 6 or S) for samples not logged in prior to batch processing of laboratory data.
Hydrologic event
This field will default to ‘9’ (routine) for samples not logged in prior to batch processing of laboratory data.
Analysis source
This field will default to ‘A’ (not determined) for samples not logged in prior to batch processing of laboratory data.
Tissue_id
This field will default to ‘0’ (unknown) for tissue samples (medium codes C, D, X, or Y) not logged in prior to batch processing of laboratory data.
Body part code
This field will default to ‘94’ (unknown) for tissue samples (medium codes C, D, X, or Y) not logged in prior to batch processing of laboratory data.
Field sample comment
No sample field comment is created during batch processing of laboratory data.
Sample integer
This field is used only for batch processing and is not stored in QWDATA.
Parameter code
Each analytical result must be reported QWDATA with a parameter code that defines the constituent name, phase, reporting form, and reporting units. The Parameter Code Dictionary (PCD) should be used for looking-up valid parameter codes in QWDATA. The PCD is available through the QWDATA menu “Support Files” option. The District users must work with the laboratory to assign the proper parameter codes for the analytical data being produced. The District user should produce a look-up table for use by the laboratory for the required parameter codes.
New parameter codes may need to be requested from the Office of Water Quality for constituents that do not fit within an existing parameter-code definition (new chemical, different units, etc.). The PCD should be searched prior to requesting new parameter codes. Some lead-time is necessary to obtain new parameter codes.
A specific parameter code (00028) is used to identify the laboratory that performed the chemical analysis. The value assigned to this parameter is a numeric code (fixed value) assigned by the NWIS Program Office. The listing of fixed-value codes for laboratories can be found in Appendix B of the User Manual. The USGS user must enter this code when logging in samples or add it to a record that is logged in through batch entry. To request a new laboratory fixed-value code, please contact Ken Pearsall (pearsall@usgs.gov) (518-285-5669). As with new parameter codes, some lead-time is necessary to obtain new codes.
Quality-assurance code
This field will default to ‘A’ (not determined) during batch processing of laboratory data.
Method codes
Result rounding code
This field will default to a rounding code obtained from the PCD during batch processing of laboratory data for laboratories other than NWQL and Ocala.
Data-quality indicator code
Data-quality indicator (DQI) code indicates the review status of a result, controls the ability of a batch input program to overwrite a value, and affects the inclusion of a result in output. This field will default to ‘S’ during batch processing of laboratory data. The DQI for data for analytes from methods that are under development (i.e. methods that have not been approved by the USGS or the USEPA) should be coded from the laboratory as an ‘I’ (in review).
Field result comment
No field result comment is created during batch processing of laboratory data.