[link or image removed] indicates that a reference has been removed from this document in order to prevent the exposure of internal resources.
SW2016.07, WQ2016.10 Policy and guidance for approval of surrogate regression models for computation of time series suspended-sediment concentrations and loads
BACKGROUND
The definition of a surrogate measure is a measurement taken with the intent to gain insight into a variable that is either impractical to measure directly, or not possible to measure at the desired continuous time interval. With a direct and uncomplicated causal relation, surrogate measurements can be nearly as useful as direct measurements although uncertainty associated with individual computed values generally is larger than discrete sample data. Increased temporal data richness could compensate for the larger uncertainty associated with computed data compared to laboratory results from actual samples.
In-situ turbidity, acoustic, and streamflow data, combined with discrete sample data, can be used to compute a time series of suspended-sediment concentrations and loads at stream sites. Two standard surrogate methods for computing time-series suspended sediment have been documented in U.S. Geological Survey (USGS) formal series Techniques and Methods reports. Rasmussen and others (2009) describe methods for developing regression models using in-situ turbidity and streamflow data, along with discrete samples of suspended-sediment concentrations. Landers and others (2016) describe sediment acoustic index methods for computing suspended-sediment concentrations. Both reports include detailed descriptions of methods for data collection, quality assurance and quality control, and data computation.
This technical memorandum describes reliable methods to develop statistical models specifically for suspended-sediment concentrations and loads based on surrogate measurements. The technical information herein does not apply to surrogate models for concentrations or loads of other water-quality constituents. This policy will facilitate a more consistent and streamlined approach for developing, documenting, and approving surrogate suspended-sediment regression models. Data computed as described in this memorandum meet USGS requirements for noninterpretive information. Users can refer to the Office of Surface Water (OSW) Web site on Sediment [link or image removed] for updated model examples, R scripts, and other tools as they become available.
PURPOSE
The purpose of this memorandum is to provide policy and guidance for developing and approving regression models used to compute high temporal frequency time series suspendedsediment concentrations and loads from continuous turbidity or acoustic backscatter data and continuous streamflow data that can be published without the need for documentation in a Bureau-approved interpretive report. As such, computed data from a statistical surrogate model must meet the definition of data rather than interpreted data. This policy applies to surrogate approaches in which continuous (measurement interval every hour or less) in-situ turbidity, acoustic, and/or streamflow data are used in combination with discrete suspended-sediment samples to develop regression models for computing similar continuous suspended-sediment concentration or load data.
POLICY
Computed suspended-sediment concentration and load data qualify as “non-interpretive” when the following conditions are met:
- Surrogate data and calibration samples are collected and laboratory analyses were performed using consistent sensor technologies (Rasmussen and others, 2009; Landers and others, 2016) and consistent USGS-approved and publicly available field methods for collection of suspendedsediment samples and continuous sensor data including U.S. Geological Survey (2006), Edwards and Glysson (1999), U.S. Geological Survey (2006), and Wagner and others (2006) and are analyzed for suspended-sediment concentration using USGS-approved laboratory methods, such as Guy (1969). Surrogate and calibration data used to develop the model must be available in the National Water Information System (NWIS) database [link or image removed]. [link or image removed] When guidance provided in this technical memorandum deviates from methods described in the previously-published methods reports, instructions in this memorandum should be followed.
- Computed data are derived from linear, log-linear, or log-log statistical models developed according to Ordinary Least Squares (OLS) regression methods described in published techniques and methods reports by Rasmussen and others (2009) and Landers and others (2016).
- Each model is documented in an electronic model archive summary (MAS) following guidance in this memorandum. The MAS meets model documentation requirements described in OSW Technical Memorandum (TM) 01 [link or image removed] and Office of Water Quality (OWQ) TM 2015.01 [link or image removed], [link or image removed] and is submitted for technical peer review, verification by the Water Science Field Team (WSFT) Specialist, and approval by the Center Director in lieu of the model archive contents described in Attachment 2 of TM 2015.01 [link or image removed]. [link or image removed] The MAS is stored in a reliable and publicly available location such as ScienceBase, the National Real-Time Water Quality (NRTWQ) Web site [link or image removed], [link or image removed] or a future centralized Water Mission Area archive or repository.
The recommended steps for review and approval of the model and calibration dataset are:
- The MAS, which includes the associated calibration dataset, is tracked in the Information Product Data System (IPDS) as a single information product designated as a data release.
- The MAS is reviewed by two technical peer reviewers at least one of whom is outside the originating Center.
- The reviewed MAS and Model Archive Verification and Approval Form (Attachment 1 of TM 2015.01 [link or image removed]) [link or image removed] are submitted to the WSFT Specialist for verification that models have been adequately reviewed and archiving requirements are met, and the approval form is uploaded to IPDS.
- The MAS is assigned a Digital Object Identifier (DOI) and stored in a reliable and publicly available location.
- The MAS is approved by the Center Director following Fundamental Science Practices and Survey Manual (SM) SM 502.4 [link or image removed] and SM 205.18 [link or image removed]. [link or image removed]
- Once the MAS has been approved and publicly released, the computed suspended-sediment data may be disseminated to the public along with a link to the MAS in a USGS-approved database such as the NRTWQ Web site [link or image removed] and NWIS using appropriate parameter codes as described in the Techniques and Methods reports (Rasmussen and others, 2009; Landers and others, 2016) without need for documentation in a Bureau-approved interpretive report.
- Continued sampling is required after model development to validate model performance if models are used to estimate suspended-sediment concentrations or loads on an on-going basis beyond the period of time that the model calibration samples were collected. Model validation is described in Attachment A of this memorandum and by Rasmussen and others (2009). Consistent with OWQ TM 2012.03 [link or image removed], [link or image removed] suspended-sediment samples must be submitted for laboratory analysis as soon as possible after collection and resulting data should be reviewed and approved promptly for use in model validation.
- Data interpolation defined as estimation between measured unit values and extrapolation defined as computation beyond the range of the model calibration dataset are permitted to a limited extent as described for acoustic methods by Landers and others (2016) and as otherwise described in Attachment A of this memorandum.
- Surrogate regression models as described in this memorandum are used to compute suspended-sediment concentration or load on the basis of observed explanatory variable(s) and cannot be used alone to predict future suspended-sediment concentration in the absence of a Bureau-approved interpretive report.
- Surrogate models and applications of this policy are reviewed during triennial technical reviews.
- This policy describes the standard approach for surrogate regression models for suspended sediment, and it must be followed when methods described by Rasmussen and others (2009) and Landers and others (2016) are used. Models documented in interpretive reports published previous to this memorandum may continue to be used if validation sampling and ongoing model evaluation (steps 4-8 in attachment A) are completed as described in this memorandum.
- A Bureau-approved interpretive report is required when conditions described in this memorandum are not met. This includes using alternative methods for collection of continuous and discrete data, sensor technologies, laboratory analyses, statistical model-building, and data computation. When alternative methods are used and documented in a separate report, a MAS similar to that described in this memorandum is required to document the model. If the model will be used for ongoing data computation, model validation and ongoing evaluation as described in items 4-8 of Attachment A is required unless circumstances warrant another approach.
Robert R. Mason Donna N. Myers
Chief, Office of Surface Water Chief, Office of Water Quality
Distribution: All WMA Employees
ATTACHMENTS (For complete attachments Download PDF version [link or image removed])
- Policy and Guidance for Surrogate Suspended-Sediment Models
- Example Model Archive Summary for a Turbidity Suspended-Sediment Model
- Example Model Archive Summary for an Acoustic Suspended-Sediment Model
REFERENCES
Edwards, T.K., and Glysson, G.D., 1999, Field methods for measurement of fluvial sediment:
U.S. Geological Survey Techniques of Water-Resources Investigations, book 3, chap. C2, 89
p., accessed March 7, 2016, at https://pubs [link or image removed].er.usgs.gov/publication/twri03C2 [link or image removed]. [link or image removed]
Guy, H.P., 1969, Laboratory theory and methods for sediment analysis: U.S. Geological Survey Techniques of Water-Resources Investigations, book 5, chapter C1, 58 p.
Instructional Memorandum No. 2015.03, Review and approval of scientific data for release, accessed June 25, 2015 at http://www.usgs.gov/usgs-manual/im/IM-OSQI-2015-03.html [link or image removed] Landers, M.N., Straub, T.D., Wood, M.S., and Domanski, M.M., 2016, Sediment acoustic index method for computing continuous suspended-sediment concentrations: U.S. Geological
Survey Techniques and Methods, book 3, chap. C5 at https://pubs.er.usgs.gov/publication/tm3C5 [link or image removed]
Office of Surface Water Technical Memorandum No. 2015.01, Policy and guidelines for archival of surface-water, groundwater, and water-quality model applications, accessed June 25, 2015 at http://water.usgs.gov/admin/memo/SW/sw2015.01.pdf [link or image removed]
Office of Water Quality Technical Memorandum No. 2012.03, Update of policy on review and publication of discrete water data, accessed June 25, 2015 at http://water.usgs.gov/admin/memo/QW/qw12.03.pdf [link or image removed]
Office of Water Quality Technical Memorandum No. 2015.01, Policy and guidelines for archival of surface-water, groundwater, and water-quality model applications, accessed June 25, 2015 at http://water.usgs.gov/admin/memo/QW/qw2015.01.pdf [link or image removed]
Rasmussen, P.P., Gray, J.R., Glysson, G.D., and Ziegler, A.C., 2009, Guidelines and procedures for computing time-series suspended-sediment concentrations and loads from in-stream turbidity-sensor and streamflow data: U.S. Geological Survey Techniques and Methods book
3, chap. C4, 53 p, accessed June 25, 2016 at http://pubs.usgs.gov/tm/tm3c4/ [link or image removed]
Survey Manual 205.18, Authority to approve information products, accessed June 25, 2015 at http://www.usgs.gov/usgs-manual/200/205-18.html [link or image removed]
Survey Manual 502.4, Fundamental Science Practices: Review, Approval, and Release of Information Products, accessed May 17, 2016 at http://www2.usgs.gov/usgs [link or image removed]manual/500/502-4.html [link or image removed]
U.S. Geological Survey, 2006, Collection of water samples (ver. 2.0): U.S. Geological Survey Techniques of Water-Resources Investigations, book 9, chap. A4, September 2006, accessed June 25, 2016, at http://pubs.water.usgs.gov/twri9A4/ [link or image removed]. [link or image removed]
Wagner, R.J., Boulger, R.W., Jr., Oblinger, C.J., and Smith, B.A., 2006, Guidelines and standard procedures for continuous water-quality monitors—Station operation, record computation, and data reporting: U.S. Geological Survey Techniques and Methods 1-D3, 51 p. at https://pubs.usgs.gov/tm/2006/tm1D3/pdf/TM1D3.pdf [link or image removed]
Attachment A. Requirements and Guidance for Surrogate Suspended-Sediment Models
The following information is provided to supplement methods described by Rasmussen and others (2009) and Landers and others (2016), and to provide additional policy implementation details. It includes a combination of requirements for meeting policy, emphasized using bold font, and guidance for best practices. In addition, the USGS National Training Center class “Environmental Statistics for Data Analysis,” QW1075, is recommended for data analysts. Experience or training in the R programming language is recommended. The Surrogate Analysis and Index Developer Tool (SAID) is available for developing surrogate models, particularly using acoustic methods, and SAID produces all of the information needed for the MAS (but not in the same MAS format). The MAS must follow the format described in this memorandum. An R script is available for this purpose at http://water.usgs.gov/osw/techniques/sediment.html [link or image removed], [link or image removed] or the MAS can be prepared using tools of the user’s choice. Additional information may be appended at the end of the MAS if needed.
- The model development process is documented in a model archive summary (MAS) that includes a written summary of the decisions made during the model-development process, the model form, several diagnostic statistics and graphs that indicate adequate model fit, predictive ability and uncertainty, list and explanation of outliers and how they were handled in model development, and a link to the complete model calibration data set. An example MAS for turbidity and streamflow regression models is provided in Attachment B. An example MAS for acoustic methods is provided in Attachment C.
- A statistically valid number of samples are needed to develop, validate, and verify surrogate regression models over time. Samples must be collected over a range of conditions. In the model archive summary, the modeler must provide specific, detailed justification as to why the number of samples used for surrogate model development is sufficient to represent the population of data for which predictions are being made. Such justification must include the sufficiency of the data to describe seasonal, hydrologic, particle size, or any other factors that could possibly affect the surrogate relation.
- Model builders need to be cognizant of the possibility of overfitting their surrogate model. Overfitted models adhere too closely to the idiosyncrasies of a particular data set that do not actually appear in the population of data being modeled (Babyak, 2004). One way to guard against overfitting is to have an adequate number of sample observations per each explanatory variable in the model. A long used rule of thumb is 10-15 observations per explanatory variable (Babyak, 2004). Harrell (2001, pg. 61) recommends 10-20 observations per explanatory variable. So for a simple linear regression model, 20-40 observations are necessary (10-20 for both the intercept and the single explanatory variable). Green (1991) recommended 50 observations plus 8 additional observations for each explanatory variable. The number of samples also needs to sufficiently represent all seasonal, hydrologic, and other conditions potentially affecting the model, and to allow for evaluating the predictive performance of the model. Model fit statistics, such as coefficient of determination (R2) and mean square error, are not necessarily good measures of how well a model will predict outside the calibration data set. Crossvalidation is a good method for measuring this. For example, the cross-validation information in Attachment B indicates that when the model calibration data are randomly divided into subsets, the predictions from each subset regression model are very similar to the final surrogate model. Thus, for surrogate suspended-sediment models, a recommended minimum of 36 suspended-sediment samples is generally considered adequate for developing and validating a model with one explanatory variable, for example, turbidity. This number is based on the mid-range of sample sizes recommended in the literature with an additional 20 percent increase to allow for an adequate crossvalidation analysis to assess predictive capability of the model. An additional 12 samples, for a minimum of 48 samples, is recommended for a model with two explanatory variables, for example turbidity and streamflow.
- Samples must be representative of the stream cross-section and can be equal width increment (EWI) samples, equal discharge increment (EDI) samples, or fixed-point pump samples that have been adjusted using concurrent cross-section samples (Landers and others, 2016, p. 13; Edwards and Glysson, 1999, p. 31). The relation of fixed-point to EWI/EDI samples must be documented in the analysis and model archive summary. Samples also must be statistically independent within reason and should be tested for autocorrelation following methods described by Landers and others (2016). If the independence assumption is not met, then appropriate non-OLS statistical methods must be used to fit the model and it must be documented in a separate Bureau-approved interpretive report (Rasmussen and others, 2009, p. 11; Landers and others, 2016, p. 30). Particle-size analysis for samples is not required, but sand/silt split data often are very helpful when evaluating dataset variability and outliers (Rasmussen and others, 2009).
- The sampling design must ensure that samples are representative of the system being modeled. It is recommended that about half of the samples are collected at fixed intervals (monthly), and half of the samples are collected during runoff events or targeting other sources of variability.
- The recommended period to collect data for model development is 3 years, but generally should range from 2-6 years. If data are collected over just 1 year for model development, additional validation samples are needed in subsequent years if hydrologic conditions are different. Data collected over more than 6 years should be tested for violation of stationarity assumptions and appropriate actions taken to address it.
- The data analyst submits the MAS to two qualified technical peer reviewers. Qualified technical reviewers are those with scientific and technical expertise relevant to sound regression model development. The reviewers evaluate the model and recommend the MAS be either approved or rejected. If a model is rejected, the data analyst has the option to redevelop the model and MAS based on the reviewer’s suggestions, and then submit it for additional technical review. Before the surrogate regression model is used to compute data to be delivered to the public, the Center Director must approve the MAS as described in the Policy section of this technical memorandum.
- Data to validate model performance are required for any model used to estimate constituent values outside of the period of data collection used for model calibration. Validation data must be available in the National Water Information System (NWIS) (http://dx.doi.org/10.5066/F7P55KJN).
- A minimum of 8 validation samples per year on average are required to be collected with a minimum of 6 in any given year. This is a larger number than recommended by Rasmussen and others (2009) and Landers and others (2016) to better assess departures from the model and allow sufficient data to refit models if needed. Validation samples are flow-weighted composite samples collected from the channel cross-section (EWI/EDI when possible) using approved field protocols (USGS, 2006; Edward and Glysson, 1999) or fixed-point pump samples as previously described. Four of these samples are to be collected at a quarterly interval, that is, the samples are to be collected about 3 months apart and four samples are to be targeted toward time periods of variability in the system.
Exceptions to this guidance may exist, including:
- For many situations, variations in streamflow present the greatest source of variability and these four additional samples should be targeted to capture high flows that are often characterized by high suspended-sediment concentrations, and should include rising, peak, and falling limbs of the hydrograph.
- Targeted samples should be spread throughout the year unless the variability is limited to shorter periods in which case the targeted samples should be spread throughout the period of variability. For example, for streams affected by storms throughout the year, the samples targeting storm runoff would be spread throughout the year. However, for streams where flow is affected mainly by snowmelt or pronounced wet and dry (or ephemeral) periods, the targeted samples may be spread only throughout the snowmelt or the wet and dry periods.
- Because validation samples are collected at least quarterly, models can be used for data estimation no longer than about 3 months after the last validation sample has been collected.
5. As soon as practicable after each validation sample is collected, performance of the model with respect to that sample should be assessed. The residual from the predicted value associated with a validation sample should be assessed in the same manner as a residual from the model calibration data set.
- The recommended approach for ongoing model validation is conceptually based on the runs test for randomness in residuals (Hipel and McLeod, 1994, p. 942). A run is a consecutive sequence of positive or negative residuals. The runs test estimates the probability of the number of runs observed happening solely due to chance. For surrogate models, there will be insufficient data available to compute a formal runs test for several years (based on the recommended number of validation samples collected per year). Thus an approach based on consecutive large positive or negative residuals computed from the predicted values and observed concentrations from the validation samples should be used instead. The consecutive occurrence of even a few large residuals with the same sign has a very low probability of occurring due to chance. The presence of one or more of these large residuals is an early warning of possible considerable bias in the values predicted by the surrogate model that warrants investigation.
- If the residual from one validation sample has a value of about 2 to 3 standard errors from the predicted value, the operation of sensors and equipment providing input to the surrogate model should be checked to make sure nothing is malfunctioning. Anomalies in the watershed should also be investigated. The probability of a residual falling within this range from a well-fit model is about 0.01 to 0.05.
- If there are two consecutive validation samples that have a residual value of about 2 to 3 standard errors from the predicted value with both having the same sign either positive or negative, check equipment again for malfunctions and check for anomalies in the watershed. If all equipment is operating correctly and no anomalies are found, then collect an additional validation sample at the next site visit or within 30 days, if possible during conditions similar to those when samples with large residuals were collected. The probability of two independent consecutive residuals falling within the range of 2 to 3 standard errors from the predicted value from a well-fit model is about 0.0001 to 0.0025.
- If there is a third consecutive validation sample that has a residual value of 2 to 3 standard errors from the predicted value (with all three having the same sign, that is all three residuals being either positive or negative), then the model must be assumed to be flawed and must be refit using all data collected since the release of the previous model including the 3 recent validation samples with large residuals. The probability of three independent consecutive residuals falling within the range of 2 to 3 standard errors from the predicted value from a well-fit model is about 0.000001 to 0.000125.
- If the residual has a value greater than 3 standard errors from the predicted value and sensors and equipment are not malfunctioning and no watershed anomalies are found, then collect an additional validation sample at the next site visit or within 30 days. The probability of a residual being greater than 3 standard errors from the predicted value from a well-fit model is less than about 0.01.
- If there are two consecutive validation samples that have a residual value greater than 3 standard errors from the predicted value, regardless of the sign of the residual, then the model must be assumed to be flawed and must be refit using all data collected since the release of the previous model including the 2 recent validation samples with large residuals. The probability of two independent consecutive residuals both being greater than 3 standard errors from the predicted value from a well-fit model is less than about 0.0001.
- If the refit model is deemed adequate by following the same process used to fit the original model, the refit model will replace the existing model and predicted values from the first validation sample whose residual exceeds the 2 standard error threshold will be re-estimated using the refit model. A new model archive package must be prepared and approved following the same process as the original model prior to use of the refit model.
- If the refit model is deemed inadequate, use of the model must be discontinued completely or until additional sample data are collected and an adequate model can be developed by making adjustments that are consistent with the Techniques and Methods reports. For example, the original model might use turbidity as a single explanatory variable, and the refit model might use both turbidity and streamflow as explanatory variables, which is consistent with the Techniques and Methods report.
- Surrogate sediment models must be reviewed annually, typically after the continuous data used as a surrogate have been approved and the discrete sample results used in the model have been reviewed and approved. This review occurs even when no validation samples exceed the large residual thresholds discussed in the previous section of this guidance. The annual review should include all of the following:
- Review plots of all validation sample residuals collected during the year against time, predicted values, and explanatory variables. Compare residuals from samples collected during the current year to those collected in previous years. Look for patterns in the current year residuals that might indicate a change or shift in the relation between the response and explanatory variables used in the model.
- Review a boxplot of the validation sample residuals by year. Compare the distribution of residuals from samples collected during the current year to those collected in previous years.
- Review a boxplot of validation sample residuals by seasonal periods. Note this can only be done after several years of additional data collection if just the minimum eight validation samples per year are being collected. Also, fewer than the four seasons per year can be used if necessary to obtain enough data to create boxplots. An example of this might be looking at boxplots for three 4-month periods rather than four 3-month periods. Look for seasonal patterns in the residuals that might indicate a temporal change or shift in the relation between the response and explanatory variables used in the model.
- A narrative describing the annual model review must be written describing why the modeler believes there is no problem with the existing surrogate model or if a problem is identified how it was addressed. This narrative is analogous to the annual gaging station analysis that is prepared for each streamgage. Annual model review is documented in the Records Management System (RMS).
- Best practice usually is to start a new model within 6 months of the data collection period. When a revised model is developed, best practice is usually to start applying the model when it is approved; however, users may choose to apply the model beginning at the time that samples indicated deviation from the previous model.
- Even if no problems are identified during the annual model review, surrogate models must still be refit every 3 years with the additional validation samples collected during the ongoing 3-year period and documented in a new MAS. Routine model updates take advantage of additional collected data to ensure models are current and reduce model uncertainty and likelihood of stationarity issues.
- The initial refitting of the model should use the same form as the model being updated. However, the adequacy of the model form should be examined and if necessary alternative forms of the model should be explored.
- The data analyst will need to pay particular attention to the residual versus time plots for the refit model. If residuals early in the time series show patterns or otherwise depart from random noise about the zero reference line, it is an indication of a shift in the underlying processes driving the model. Sample observations from the earlier period of the sample time series should then be removed from the calibration dataset, removing as few as possible to address the lack of fit in the early periods of the model to see if an adequate model may be obtained. However, if removing sample observations reduces the minimum number of samples for model building below the minimum number recommended or to a time span of less than 2 years, additional data will be needed until those criteria are again met. Decisions and reasoning related to sample time series used in model development must be documented in the MAS.
- A new model archive summary package is to be prepared and approved following the same process described for the original model including approval in IPDS before the updated model is used.
- Centers must use good judgment in deciding when a new model becomes effective. The general recommendation is that the model becomes effective as soon after approval as practical.
- Extrapolation for acoustic models is allowed as described by Landers and others (2016). Limited extrapolation is allowed as described below for approved models developed following methods described by Rasmussen and others (2009).
- Approved models developed following Rasmussen and others (2009) may be used to extrapolate no more than 10 percent (calculated using retransformed units rather than log units) outside of the range of the sample data used to fit the model with no additional sample collection required. For models following Rasmussen and others (2009), extrapolation must not exceed the manufacturer’s specifications for the optimal performance range of the turbidity sensor. Approved models following Landers and others (2016) may be used to extrapolate no more than 20 percent outside the range of sample data used to fit the model.
- Approved models may be used to extrapolate more than 10 percent following Rasmussen and others (2009) and more than 20 percent following Landers and others (2016) if an additional validation sample in that extrapolated range can be collected during the same season and within about 90 days provided that channel conditions have not changed for either method.
- If the validation sample confirms that model predictions are accurate in the extrapolation range with validation data then predicted values may be kept and accepted as final data for display in NWISWeb or NRTWQ
- If the validation sample does not confirm that model predictions are accurate in the extrapolation range, then predicted values must either be:
- censored at greater than the predicted value if the validation sample confirms that the direction of the model bias is indeed positive; for example, the model prediction is 100 but the validation sample is 200 then the predicted values may be set to >100, or
- removed or blocked in NWIS using thresholds if the validation sample does not confirm the direction of the model bias; for example, the model prediction is 100 but validation sample is 50.
- If the validation sample does not confirm that model predictions are accurate in the extrapolation range, then predicted values must either be:
- If the validation sample confirms that model predictions are accurate in the extrapolation range with validation data then predicted values may be kept and accepted as final data for display in NWISWeb or NRTWQ
- Approved models may not be used for extrapolating more than 20 percent outside the range of the sample data used to fit the model until additional data are collected in that range and the model performance and sensor performance are validated or the model is refit using the new data. A minimum of one (but at least two are recommended) independent samples must be collected outside the range of the existing data to confirm the model is performing adequately. Model predictions made outside of the two previously mentioned allowable areas of extrapolation may not be served in the real-time water quality system and they must be removed or blocked in NWIS until the existing model is validated with the new data or a new model is fit that includes data in the range. Once a new model is approved, it may be used to make predictions for any data previously blocked in NWIS.
- Approved models shall not be used for interpolating between time intervals for which the surrogate data are collected. That is, if surrogate data are collected at 30-minute intervals, the models may not be used to estimate at 15-minute intervals by averaging or otherwise interpolating between values of the surrogates collected at the longer time step.
- Approved models can be disseminated and archived on the NRTWQ Web site ( [link or image removed]http://nrtwq.usgs.gov/ [link or image removed]) [link or image removed]. These time series are displayed in plots, tables, statistical summaries, and duration curves. Site and model information are also displayed. These computed values also can be displayed using NWIS, or using another approved data release method. As new models are employed, older models are archived along with the computed data. Models are numbered sequentially and include the station number, constituent, year the model was approved, and model version number if needed (for example, 06892350.SSC.WY15.ver1).
- Surrogate turbidity and streamflow data for computed concentrations and loads are considered category 1 data and are approved by following the Continuous Records Processing (CRP [link or image removed]) [link or image removed] policy of the Water Mission Area (WRD Policy Numbered Memorandum 2010.02). Methods of collecting the surrogate measurements are documented in formal publications series such as USGS Techniques and Methods or the National Field Manual. Computed concentration and loads are approved following CRP policy under category 3 which indicates approval to be completed within a year in most cases. All surrogate data must be stored in NWIS.
- Circumstances may arise in which historical data need revision. For example, if errors are discovered in explanatory data such as turbidity, then errors also exist in computed
suspended sediment data. Until further guidance is provided, Centers must use good judgment in deciding which circumstances justify correction of historical data. References
Babyak, M.A., 2004, What you see may not be what you get-A brief, nontechnical introduction to overfitting in regression-type models: Psychosomatic Medicine, v. 66, no. 3, p. 411-421.
Edwards, T.K., and Glysson, G.D., 1999, Field methods for measurement of fluvial sediment:
U.S. Geological Survey Techniques of Water-Resources Investigations, book 3, chap. C2, 89
p., accessed March 7, 2016, at https://pubs [link or image removed].er.usgs.gov/publication/twri03C2 [link or image removed]. [link or image removed]
Green, S.B., 1991, How many subjects does it take to do a regression analysis: Multivariate Behavioral Research, v. 26, no. 3, p.499-510.
Harrell, F.E., Jr., 2001, Regression Modeling Strategies-With Applications to Linear Models, Logistic Regression, and Survival Analysis: Springer, New York, 568 p.
Hipel, K.W., and McLeod, A.I., 1994, Modelling of Water Resources and Environmental Systems: Amsterdam, Elsevier, 1013 p.
Landers, M.N., Straub, T.D., Wood, M.S., and Domanski, M.M., 2016, Sediment acoustic index method for computing continuous suspended-sediment concentrations: U.S. Geological Survey Techniques and Methods, book 3, chap. C5
Rasmussen, P.P., Gray, J.R., Glysson, G.D., and Ziegler, A.C., 2009, Guidelines and procedures for computing time-series suspended-sediment concentrations and loads from in-stream turbidity-sensor and streamflow data: U.S. Geological Survey Techniques and Methods book 3, chap. C4, 53 p.
U.S. Geological Survey, 2006, Collection of water samples (ver. 2.0): U.S. Geological Survey
Techniques of Water-Resources Investigations, book 9, chap. A4, September 2006, accessed June 25, 2016, at http://pubs.water.usgs.gov/twri9A4/ [link or image removed] [link or image removed]
WRD Policy Numbered Memorandum No. 2010.02, Continuous Records Processing of all Water Time Series Data, accessed July, 2016 at http://water.usgs.gov/admin/memo/policy/wrdpolicy10.02.html [link or image removed]
[link or image removed]