Institute: Pennsylvania
Year Established: 2018 Start Date: 2018-03-01 End Date: 2019-02-28
Total Federal Funds: $21,657 Total Non-Federal Funds: $44,817
Principal Investigators: Zhenhui Li
Project Summary: In the last decade and a half, improvements in hydrocarbon extraction such as high volume hydraulic fracturing (HVHF) and horizontal drilling have increased shale gas production throughout several states in the northeast. In the Marcellus gas play that underlies eight populous northeastern states, the development of shale gas production has been particularly rapid. This fast-pace development has occasionally impacted water quality, leading to increased public scrutiny in Pennsylvania (PA) with respect to shale gas development and water quality. Detection of contamination is difficult given variations in the background chemistry of both surface and ground waters. At the same time, many state and federal agencies are collecting and publishing water chemistry data, including the PA Department of Environmental Protection (PA DEP); U.S. Geological Survey (USGS); U.S. Environment Protection Agency: (EPA). Investigating such large datasets to distinguish contamination from background variation requires innovative methods. We propose to develop data mining techniques to analyze water chemistry data and shed light on contaminants such as CH4 and heavy metals. In particular, we will build a data-driven machine learning model to describe correlations in water chemistry. Our key innovation is to interpret the models by proposing techniques to select features, to test clustering, and to explore embedding. As shown by our recent publications, our data-driven approach will demonstrate both natural and anthropogenic controls on subsurface water chemistry that can be generalized to other areas of environmental data analysis. Two types of deliverables are expected: 1) a useful data mining algorithm will be released for others to use; and 2) papers will be published summarizing findings detailing the data mining techniques and inferences. These interactive approaches will benefit geological and computer scientists and will improve understanding of water quality for the public even beyond the scope of issues related to shale gas and water quality.