USGS - science for a changing world

USGS Groundwater Information

*  Home *  Monthly Highlights *  Data & Information *  Publications *  Methods & Modeling *  Selected Topics *  Programs *  About *  Contact Us

USGS Groundwater Policy > Groundwater Model Archiving > Setup Archive

Documenting, Archiving, and Public Release of Numerical Groundwater Flow and Transport Models: Setup Archive

Modeler Workflow - Setup

Modeler Workflow - Setup

This page describes the archive structure, organization, and naming conventions modelers must use in preparation for the model archive review.

For consistency in directory structure among all USGS groundwater model archives released online, the archive should use the folder structure and naming conventions outlined on this page:

Directory structure

The directory structure allows the model to be archived in a well-documented manner to ensure continued availability. Not all subdirectories will be necessary for all models.

Diagram of directory structure

Diagram of directory structure

Return to setup table of contents.

Explanation of directory structure

The outline below provides an overview of the folder structure to be used in the archive, as well as minimum required components.

Note:

  • the directory and file names cannot contain spaces or any special characters other than a period (.), underscore (_), or hyphen (-). NDash (–) and MDash (—) characters are not equivalent to a hyphen and should not be used in file or directory names. If you copy a report number from SPN, be sure to convert the file name MDash to a hyphen in the report number.
  • Once posted online, directory and file names are case sensitive.

The archive directory should contain:

  • readme.txt - text file that provides a description of the archive folder structure and of the files contained in each subfolder, instructions for running the model(s), instructions for reconstructing the model archive from the downloaded model files, and a brief description of each model run. (see example)
  • modelgeoref.txt - text file of geo-reference information, including data release title and DOI, publication title and DOI, datum, latitude and longitude in decimal degrees of the corners of a rectangle outlining the model study area. See below for explanation and example.
  • bin/ - This directory contains the compiled executable(s) for the numerical model(s) used in the study and any executable for utility programs that process the model data in a way that cannot be done with other publicly available tools (or would be difficult).
  • georef/ - This directory contains one shapefile with polygons showing the active and inactive extent of the model domain. The active extent will a polygon of the union of all of the active areas in each model layer. If the model study has multiple model domains, such as a regional and local model, only the shapefile of local model would be included. If multiple model domains were included in the study, a shapefile of a bounding polygon would included.
    • use a standardized name---.shp example sir2016_5022.shp
    • add an item to the shapefile defined as 'Area'. Define the active and inactive areas of the model domain.
  • model/ - This directory contains model input files.
    • model1/ (subdirectory)
      • usgs.model.reference - Required file contains information that is specific to the model contained in the subdirectory. See below for explanation and example.
      • model1.nam
      • model1.dis
      • model1.bas
      • model1.oc
      • etc
    • modeln/ (optional) - continue with as many model directories as needed. One for each model simulations described in the documenting publication.
    • externalfiles/ (optional) - This subdirectory should contain common files used by all of the models. For example, horizontal hydraulic conductivity arrays. Relative pathnames should connect these files in the model.nam file so that the model(s) can be run from the model/ directories without moving or copying files from this directory.
  • output/ - This directory contains model output files for the model files included in the model directory.
    • output.model1/ (subdirectory)
      • model1.lst
      • model1.hds
      • model1.ddn
      • model1.cbc
      • etc
    • output.modeln/ (optional) - continue with as many model directories as needed. One for each model simulations described in the documenting publication.
  • source/ - This directory contains the complete source code for the models used in the study and any source code that was developed as part of the project to facilitate some analysis or data processing step and could not be done with other publicly available software. For example, python scripts that were used to pre- or post-process model results but could also be done in Excel or ArcGIS would not have to be included in source/ directory. Authors should feel free to include non-essential software in the /ancillary directory as a convenience for endusers of the data release along with an brief explanation of how to use the software in the readme.txt included with the data release. The source code should be organized by compiled executables. Proprietary code should be placed in the nonpublic directory.
  • webrelease/ This directory contains the FGDC XML metadata file which describes the whole model archive and the browse graphic used for display when the data release is posted to data catalog sites. Refer to the metadata page for instructions on preparation and requirements for both the metadata and browse graphic.

  • ancillary/ (optional) - This directory contains any additional data that may be of use for people using this archive.
  • nonpublic/ (optional) - This directory is optional and would contain model archive content that should not be released to the public, such as PII, proprietary data, or proprietary code. Information that may be beneficial for future USGS projects, but has not been thoroughly documented, may be placed in this nonpublic folder. THIS DIRECTORY MUST BE REMOVED FROM THE COPY OF THE ARCHIVE BEFORE PUBLIC RELEASE.

Return to setup table of contents.

How to archive other model types

Modeling studies commonly include additional analyses that are run with separate models or postprocessors that use results from a flow model. The approach for archiving commonly used models or postprocessors are described below.

Local Grid Refinement (MODFLOW-LGR) models

Like MODFLOW-based flow models, MODFLOW-LGR models and associated output files should be archived in the model/ and output/ directories. A naming convention comparable to the one used for the flow model(s) should be used. For example, if there is a parent and two child models for a calibration run then it would appropriate to have parent.calibration, child1.calibration, and child2.calibration subdirectories in the model/ directory and output.parent.calibration, output.child1.calibration, and output.child2.calibration subdirectories in the output/ directory. The MODFLOW-LGR control file (typcally having a .lgr file extension) can be placed in a separate directory in the model/ directory (for example, lgr.calibration) or within the parent directory if there is a problem using relative paths in the MODFLOW-LGR control file. There would be no output directory for the MODFLOW-LGR control file directory in output/ directory since results for the MODFLOW-LGR simulation would be written in the parent and child directories. Similar to standard MODFLOW models the MODFLOW-LGR model output can be written to the appropriate subdirectory in the output/ directory or created in the MODFLOW-LGR parent and child model input subdirectories and moved to the appropriate subdirectory in the output/ directory.

MODPATH models

Like MODFLOW-based flow models, MODPATH models and associated output files should be archived in the model/ and output/ directories. A naming convention comparable to the one used for the flow model(s) should be used. For example, if there is a steady-state flow model called steady then it would appropriate to have a steady-modpath subdirectory in the model/ directory and output.steady-modpath subdirectory in the output/ directory. Similar to the flow models the MODPATH model output can be written to the appropriate subdirectory in the output/ directory or created in the MODPATH model input subdirectory and moved to the appropriate subdirectory in the output/ directory. An example archive directory structure with MODPATH models and associated output files is shown below.

Example of an archive with MODFLOW and MODPATH models

Example of an archive with MODFLOW and MODPATH models

MT3D models

Like MODFLOW-based flow models, MT3D models and associated output files should be archived in the model/ and output/ directories. A naming convention comparable to the one used for the flow model(s) should be used. For example, if there is a steady-state flow model called steady then it would appropriate to have a steady-mt3d subdirectory in the model/ directory and output.steady-mt3d subdirectory in the output/ directory. Similar to the flow models, the MT3D model output can be written to the appropriate subdirectory in the output/ directory or created in the MT3D model input subdirectory and moved to the appropriate subdirectory in the output/ directory. The MT3D models would have a comparable archive directory structure to the MODPATH example shown above.

ZONEBUDGET models

For ZONEBUDGET analyses, it is recommended that input and output files be included in the appropriate subdirectories in the model/ and output/ directories. ZONEBUDGET input and output files could also be included in subdirectories in the ancillary/ directory (for example, in ancillary/zonebudget/). ZONEBUDGET input files should be setup so that input files can be executed without moving or copying files in the archive. Execution of ZONEBUDGET input files should be described in the readme.txt file.

PEST and UCODE models

PEST and UCODE are commonly used to calibrate flow and transport models and can also be used to perform subsequent analyses (for example, uncertainty or data worth analyses). If PEST or UCODE are used exclusively to develop the calibrated model it is recommended that the PEST or UCODE input files be included in the ancillary/ or nonpublic/ directories. If PEST or UCODE are used to perform subsequent analyses that are used support study findings, and are described in the corresponding information product that describes the model, then the PEST or UCODE input files should be included in the ancillary/ directory or appropriately named subdirectory in the model/ directory.

If PEST or UCODE files are included in the model archive the input files should be set up so that they can be executed without moving or copying files in the archive. Furthermore, if PEST or UCODE files are included in the model/ or ancillary/ directories, execution of PEST or UCODE should be described in the readme.txt file.

Return to setup table of contents.

Specifying the modelgeoref.txt file

Each model archive must contain a 'modelgeoref.txt' file which contains data to register the maximum spatial extent of the model(s) developed in the study. This file and included information is required by Office of Groundwater Technical Memorandum 2015.02.

# Reference 1
# Include the reference to the model documentation report and doi.

# Reference 2
# Include the reference to the model data release and doi.
# Datum

upper_left  -80.605485 26.040685 
upper_right -80.097930 26.040685
lower_right -80.097930 25.185154
lower_left  -80.605485 25.185154

The latitude and longitude of the four corners of the bounding box that covers the study area should be provided to at least four decimal places. If the model is a 2-dimensional cross-section, geographic coordinates of the end points of the model should be given. If the model is a 1-dimensional model or describes processes at a single point in space, the single geographic coordinate pair should be repeated four times.

For hypothetical models, the four corners should be based on the area the model(s) represents. For example, if the model(s) represents conditions in New England the four corners would be specified to cover all of New England. For conceptual hypothetical models that do not cover any specific area, it would be appropriate to specify the coordinates of primary authors office location (for example, USGS Headquarters, the Austin office of the Texas Water Science Center, etc.).

Return to setup table of contents.

Specifying usgs.model.reference files

Each subdirectory in the model directory must contain a 'usgs.model.reference' file which contains data to register the model in space and time. This information documents the area under study and allows for future map and (or) other displays of models developed by and available from the USGS.

xul 1157053.959        # upper left x-coordinate
yul 405727.084         # upper left y-coordinate
rotation -45.95796     # model grid rotation (degrees)
length_units feet      # model length units (feet, meters, etc.)
time_units days        # model time units (days, years, etc.)
start_date 1/1/1900    # state date of the model
start_time 00:00:00    # start time of the model
model modflow-2000     # MODFLOW model type (MODFLOW-NWT, etc.)
epsg 102733            # epsg code 
# proj4 'proj4 string' # or proj4 string      

The keywords (for example, xul, rotation, etc.) in each usgs.model.reference file can be specified in any order. The file can also have comment lines delineated by a # in the first column at the top of the file. Comments can also be added within the file and after the data on each line of data.

For MODFLOW-based models, the upper left coordinates in this file correspond to row = 1 column = 1 of the model grid. For other types of models with nodes/centroids with coordinates defined in real-world coordinates (for example, SUTRA), the upper left coordinates and rotation would be defined as xul=0., yul=0., and rotation=0..

The start date may change depending on the model simulation, for example transient simulations with different start dates. The starting date for a steady-state model can be the initial date of the data used to calibrate the model or hydrologic conditions represented in the model. For example, if a steady-state model uses average hydrologic conditions for the period from 1965 through 1990 the start date could be 1/1/1965.

The projection of the model in real-world coordinates is defined using the epsg or proj4 string keywords; only one of these is needed. The projection can be the native projection used for the study. The epsg code or proj4 string can be found either in the ArcGis data definition or by using a web service (for example, spatialreference.org). Other web services can be found using an online search.

For hypothetical models, xul, yul, and rotation keywords should be specified to be 0 and the epsg code or proj4 string should be specified to be NA.

Return to setup table of contents.

Preparing a model archive for review and upload to the NSDI Water Node

Once the model archive is organized and structured according to OGW policy, several directories should be compressed. The model archive directories will be compressed for review and for release on the NSDI Water Node. Compressing the directories will reduce the total size of the directories stored on the NSDI Water Node and reduce upload and download times.

Within the main archive directory, the bin, georef, model, output, source, and ancillary (optional) directories should be compressed into separate compressed (zip) files. For the entire model directory or output directory that exceed 2.5GB when compressed, it is recommended that the subdirectories be compressed as individual compressed files. The format of the archive directory after it is prepared for upload to the NSDI Water Node is shown below.

Return to setup table of contents.

Organization of archive for upload to NSDI Water Node

Organization of archive for upload to NSDI Water Node

The data release for SIR 2014-5162 is an example of the structure of a large archive with output subdirectories and individual output files split into separate zip files.

Return to setup table of contents.

Sample directory for download

For convenience, a sample archive directory structure [1.14MB ZIP], including a sample readme file template, has been developed that you can download and use as a starting point for your archive. You can also browse groundwater models data releases already online.

Return to setup table of contents.

Special Considerations for Large Files

Since large output files can be difficult to upload and download, it may be necessary/beneficial to limit model output only to results that are required to create the results presented in the published report. For example, if water levels for the last time step are used to create a potentiometric surface presented in the report it may be sufficient to only save head data for this time step rather than save every simulated time step.

Return to setup table of contents.


USGS Home Water Climate and Land Use Change Core Science Systems Ecosystems Energy and Minerals Environmental Health Natural Hazards

Accessibility FOIA Privacy Policies and Notices

USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: http://water.usgs.gov/ogw/policy/gw-model/modelers-setup.html
Page Contact Information: Contact the USGS Office of Groundwater
Page Last Modified: Wednesday, 28-Dec-2016 01:48:22 EST