The REServoir SEDimentation (RESSED) Database
Perspectives on Spreadsheets vs a Relational Database
Jenifer Bracewell and Kevin Laurent, U.S. Geological Survey RESSED Programming Team, July 10, 2013
The RESSED (http://water.usgs.gov/osw/ressed/) Programming Team has fielded a number of queries regarding Excel spreadsheet data entry into the RESSED Filemaker Pro relational database management system. This explanation was composed as a general response to those queries.
Excel spreadsheets are not very efficient vehicles for getting data into a fully relational and complex database such as RESSED. Some problems posed by most spreadsheets include a lack of:
- a consistent format/structure - merged columns/rows, unlabeled columns, rows used for header/footer text, etc.
- primary keys - no unique identifier for every single record.
- relationship data - no cross referencing of related records between worksheets.
- a data organization - related data are mixed side-by-side rather than parsing similar data entities into separate spreadsheets.
In general the structure and references required for a fully relational database do not come naturally to spreadsheets. Additionally, those developing and populating the spreadsheets tend to 'customize' them along the way to suite their visual-display and -printing needs.
As an analogy, one may consider a relational database as a multi-storied building. Try to imagine the full set of blueprints required to construct that building. The ream of blueprints would detail various aspects of the building - structure, HV/AC, Lighting,... some in plan view, others in elevation, perspective, section... all cross referenced in order for the building contractors to be able to make sense of them.
That's what a set of spreadsheets would have to be like to be able to use them to input data into a complex, relational database. It’s no simple matter to go from a 2-D structure to a complex and ordered 3-D structure.
From a broader perspective, the key question is not whether data entry from spreadsheet to RESSED can be done, but one of efficiency and resources. For example, a cooperating Federal agency provided data for 83 reservoirs stored in semi-standardized and somewhat normalized Excel spreadsheets for entry into RESSED. A USGS programmer wrote a number of custom C# scripts to facilitate the process for about 60% of the data import. For the remaining 40% it was more efficient for the programmer to handle the data porting manually (the other 60% also required a good deal of manual intervention).
That data port was a one-off effort. The scripts developed for that port are only functional in conjunction with that exact spreadsheet template. They contain no capabilities for error handling, input validation, or other standard user interface niceties, and there are a number of places where the user has to manually enter data or it’s hard-coded, making the potential for user errors quite high. It would take a substantial effort to transform those scripts into a fully functional program to automate data entry for just one of the spreadsheet templates.
In summary, based on the preceding explanation, for technical and efficiency reasons the RESSED team cannot include as part of the on-going RESSED upgrade effort the capability for spreadsheet data-entry.