[link or image removed] indicates that a reference has been removed from this document in order to prevent the exposure of internal resources.

SW2016.07, WQ2016.10 Policy and guidance for approval of surrogate regression models for computation of time series suspended-sediment concentrations and loads

BACKGROUND



The definition of a surrogate measure is a measurement taken with the intent to gain insight into a variable that is either impractical to measure directly, or not possible to measure at the desired continuous time interval. With a direct and uncomplicated causal relation, surrogate measurements can be nearly as useful as direct measurements although uncertainty associated with individual computed values generally is larger than discrete sample data. Increased temporal data richness could compensate for the larger uncertainty associated with computed data compared to laboratory results from actual samples.



In-situ turbidity, acoustic, and streamflow data, combined with discrete sample data, can be used to compute a time series of suspended-sediment concentrations and loads at stream sites. Two standard surrogate methods for computing time-series suspended sediment have been documented in U.S. Geological Survey (USGS) formal series Techniques and Methods reports. Rasmussen and others (2009) describe methods for developing regression models using in-situ turbidity and streamflow data, along with discrete samples of suspended-sediment concentrations. Landers and others (2016) describe sediment acoustic index methods for computing suspended-sediment concentrations. Both reports include detailed descriptions of methods for data collection, quality assurance and quality control, and data computation.



This technical memorandum describes reliable methods to develop statistical models specifically for suspended-sediment concentrations and loads based on surrogate measurements. The technical information herein does not apply to surrogate models for concentrations or loads of other water-quality constituents. This policy will facilitate a more consistent and streamlined approach for developing, documenting, and approving surrogate suspended-sediment regression models. Data computed as described in this memorandum meet USGS requirements for noninterpretive information. Users can refer to the Office of Surface Water (OSW) Web site on Sediment [link or image removed]  for updated model examples, R scripts, and other tools as they become available.



PURPOSE



The purpose of this memorandum is to provide policy and guidance for developing and approving regression models used to compute high temporal frequency time series suspendedsediment concentrations and loads from continuous turbidity or acoustic backscatter data and continuous streamflow data that can be published without the need for documentation in a Bureau-approved interpretive report. As such, computed data from a statistical surrogate model must meet the definition of data rather than interpreted data. This policy applies to surrogate approaches in which continuous (measurement interval every hour or less) in-situ turbidity, acoustic, and/or streamflow data are used in combination with discrete suspended-sediment samples to develop regression models for computing similar continuous suspended-sediment concentration or load data.



POLICY



Computed suspended-sediment concentration and load data qualify as “non-interpretive” when the following conditions are met:





The recommended steps for review and approval of the model and calibration dataset are:







Robert R. Mason                                                         Donna N. Myers



Chief, Office of Surface Water                              Chief, Office of Water Quality



Distribution: All WMA Employees



ATTACHMENTS  (For complete attachments Download PDF version [link or image removed])



  1. Policy and Guidance for Surrogate Suspended-Sediment Models
  2. Example Model Archive Summary for a Turbidity Suspended-Sediment Model
  3. Example Model Archive Summary for an Acoustic Suspended-Sediment Model


REFERENCES



Edwards, T.K., and Glysson, G.D., 1999, Field methods for measurement of fluvial sediment:



U.S. Geological Survey Techniques of Water-Resources Investigations, book 3, chap. C2, 89



p., accessed March 7, 2016, at https://pubs [link or image removed].er.usgs.gov/publication/twri03C2 [link or image removed]. [link or image removed]



Guy, H.P., 1969, Laboratory theory and methods for sediment analysis: U.S. Geological Survey Techniques of Water-Resources Investigations, book 5, chapter C1, 58 p.



Instructional Memorandum No. 2015.03, Review and approval of scientific data for release, accessed June 25, 2015 at http://www.usgs.gov/usgs-manual/im/IM-OSQI-2015-03.html [link or image removed] Landers, M.N., Straub, T.D., Wood, M.S., and Domanski, M.M., 2016, Sediment acoustic index method for computing continuous suspended-sediment concentrations: U.S. Geological



Survey Techniques and Methods, book 3, chap. C5 at https://pubs.er.usgs.gov/publication/tm3C5 [link or image removed]



Office of Surface Water Technical Memorandum No. 2015.01, Policy and guidelines for archival of surface-water, groundwater, and water-quality model applications, accessed June 25, 2015 at http://water.usgs.gov/admin/memo/SW/sw2015.01.pdf [link or image removed]



Office of Water Quality Technical Memorandum No. 2012.03, Update of policy on review and publication of discrete water data, accessed June 25, 2015 at http://water.usgs.gov/admin/memo/QW/qw12.03.pdf [link or image removed]



Office of Water Quality Technical Memorandum No. 2015.01, Policy and guidelines for archival of surface-water, groundwater, and water-quality model applications, accessed June 25, 2015 at http://water.usgs.gov/admin/memo/QW/qw2015.01.pdf [link or image removed]



Rasmussen, P.P., Gray, J.R., Glysson, G.D., and Ziegler, A.C., 2009, Guidelines and procedures for computing time-series suspended-sediment concentrations and loads from in-stream turbidity-sensor and streamflow data: U.S. Geological Survey Techniques and Methods book



3, chap. C4, 53 p, accessed June 25, 2016 at http://pubs.usgs.gov/tm/tm3c4/ [link or image removed]



Survey Manual 205.18, Authority to approve information products, accessed June 25, 2015 at http://www.usgs.gov/usgs-manual/200/205-18.html [link or image removed]



Survey Manual 502.4, Fundamental Science Practices: Review, Approval, and Release of Information Products, accessed May 17, 2016 at http://www2.usgs.gov/usgs [link or image removed]manual/500/502-4.html [link or image removed]



U.S. Geological Survey, 2006, Collection of water samples (ver. 2.0): U.S. Geological Survey Techniques of Water-Resources Investigations, book 9, chap. A4, September 2006, accessed June 25, 2016, at http://pubs.water.usgs.gov/twri9A4/ [link or image removed]. [link or image removed]



Wagner, R.J., Boulger, R.W., Jr., Oblinger, C.J., and Smith, B.A., 2006, Guidelines and standard procedures for continuous water-quality monitors—Station operation, record computation, and data reporting: U.S. Geological Survey Techniques and Methods 1-D3, 51 p. at https://pubs.usgs.gov/tm/2006/tm1D3/pdf/TM1D3.pdf [link or image removed]



Attachment A. Requirements and Guidance for Surrogate Suspended-Sediment Models



The following information is provided to supplement methods described by Rasmussen and others (2009) and Landers and others (2016), and to provide additional policy implementation details. It includes a combination of requirements for meeting policy, emphasized using bold font, and guidance for best practices. In addition, the USGS National Training Center class “Environmental Statistics for Data Analysis,” QW1075, is recommended for data analysts. Experience or training in the R programming language is recommended. The Surrogate Analysis and Index Developer Tool (SAID) is available for developing surrogate models, particularly using acoustic methods, and SAID produces all of the information needed for the MAS (but not in the same MAS format). The MAS must follow the format described in this memorandum. An R script is available for this purpose at http://water.usgs.gov/osw/techniques/sediment.html [link or image removed], [link or image removed] or the MAS can be prepared using tools of the user’s choice. Additional information may be appended at the end of the MAS if needed. 



  1. The model development process is documented in a model archive summary (MAS) that includes a written summary of the decisions made during the model-development process, the model form, several diagnostic statistics and graphs that indicate adequate model fit, predictive ability and uncertainty, list and explanation of outliers and how they were handled in model development, and a link to the complete model calibration data set. An example MAS for turbidity and streamflow regression models is provided in Attachment B. An example MAS for acoustic methods is provided in Attachment C.














Exceptions to this guidance may exist, including:



  1. For many situations, variations in streamflow present the greatest source of variability and these four additional samples should be targeted to capture high flows that are often characterized by high suspended-sediment concentrations, and should include rising, peak, and falling limbs of the hydrograph. 
    1. Targeted samples should be spread throughout the year unless the variability is limited to shorter periods in which case the targeted samples should be spread throughout the period of variability. For example, for streams affected by storms throughout the year, the samples targeting storm runoff would be spread throughout the year. However, for streams where flow is affected mainly by snowmelt or pronounced wet and dry (or ephemeral) periods, the targeted samples may be spread only throughout the snowmelt or the wet and dry periods.
  2. Because validation samples are collected at least quarterly, models can be used for data estimation no longer than about 3 months after the last validation sample has been collected.


5. As soon as practicable after each validation sample is collected, performance of the model with respect to that sample should be assessed. The residual from the predicted value associated with a validation sample should be assessed in the same manner as a residual from the model calibration data set. 





  1. If the residual from one validation sample has a value of about 2 to 3 standard errors from the predicted value, the operation of sensors and equipment providing input to the surrogate model should be checked to make sure nothing is malfunctioning. Anomalies in the watershed should also be investigated. The probability of a residual falling within this range from a well-fit model is about 0.01 to 0.05.
    1. If there are two consecutive validation samples that have a residual value of about 2 to 3 standard errors from the predicted value with both having the same sign either positive or negative, check equipment again for malfunctions and check for anomalies in the watershed. If all equipment is operating correctly and no anomalies are found, then collect an additional validation sample at the next site visit or within 30 days, if possible during conditions similar to those when samples with large residuals were collected. The probability of two independent consecutive residuals falling within the range of 2 to 3 standard errors from the predicted value from a well-fit model is about 0.0001 to 0.0025.
    1. If there is a third consecutive validation sample that has a residual value of 2 to 3 standard errors from the predicted value (with all three having the same sign, that is all three residuals being either positive or negative), then the model must be assumed to be flawed and must be refit using all data collected since the release of the previous model including the 3 recent validation samples with large residuals. The probability of three independent consecutive residuals falling within the range of 2 to 3 standard errors from the predicted value from a well-fit model is about 0.000001 to 0.000125.
    1. If the residual has a value greater than 3 standard errors from the predicted value and sensors and equipment are not malfunctioning and no watershed anomalies are found, then collect an additional validation sample at the next site visit or within 30 days. The probability of a residual being greater than 3 standard errors from the predicted value from a well-fit model is less than about 0.01.
    1. If there are two consecutive validation samples that have a residual value greater than 3 standard errors from the predicted value, regardless of the sign of the residual, then the model must be assumed to be flawed and must be refit using all data collected since the release of the previous model including the 2 recent validation samples with large residuals. The probability of two independent consecutive residuals both being greater than 3 standard errors from the predicted value from a well-fit model is less than about 0.0001.








































suspended sediment data. Until further guidance is provided, Centers must use good judgment in deciding which circumstances justify correction of historical data. References



Babyak, M.A., 2004, What you see may not be what you get-A brief, nontechnical introduction to overfitting in regression-type models: Psychosomatic Medicine, v. 66, no. 3, p. 411-421.



Edwards, T.K., and Glysson, G.D., 1999, Field methods for measurement of fluvial sediment:



U.S. Geological Survey Techniques of Water-Resources Investigations, book 3, chap. C2, 89



p., accessed March 7, 2016, at https://pubs [link or image removed].er.usgs.gov/publication/twri03C2 [link or image removed]. [link or image removed]



Green, S.B., 1991, How many subjects does it take to do a regression analysis: Multivariate Behavioral Research, v. 26, no. 3, p.499-510.



Harrell, F.E., Jr., 2001, Regression Modeling Strategies-With Applications to Linear Models, Logistic Regression, and Survival Analysis: Springer, New York, 568 p.



Hipel, K.W., and McLeod, A.I., 1994, Modelling of Water Resources and Environmental Systems: Amsterdam, Elsevier, 1013 p.



Landers, M.N., Straub, T.D., Wood, M.S., and Domanski, M.M., 2016, Sediment acoustic index method for computing continuous suspended-sediment concentrations: U.S. Geological Survey Techniques and Methods, book 3, chap. C5



Rasmussen, P.P., Gray, J.R., Glysson, G.D., and Ziegler, A.C., 2009, Guidelines and procedures for computing time-series suspended-sediment concentrations and loads from in-stream turbidity-sensor and streamflow data: U.S. Geological Survey Techniques and Methods book 3, chap. C4, 53 p.



U.S. Geological Survey, 2006, Collection of water samples (ver. 2.0): U.S. Geological Survey



Techniques of Water-Resources Investigations, book 9, chap. A4, September 2006, accessed June 25, 2016, at http://pubs.water.usgs.gov/twri9A4/ [link or image removed] [link or image removed]



WRD Policy Numbered Memorandum No. 2010.02, Continuous Records Processing of all Water Time Series Data, accessed July, 2016 at http://water.usgs.gov/admin/memo/policy/wrdpolicy10.02.html [link or image removed]



[link or image removed]