Aarhus Universitets segl

Nr. 522: Further development of the model for metals in stream water, MetalStat

Sørensen, P.B., Damgaard, C.F, Bjerg, P.L., Andersen, H.E., Holm, P.E., Bak, J.L., Rasmussen, D., Kjeldgaard, A., 2023. Aarhus Universitet, DCE – Nationalt Center for Miljø og Energi, 107 s. - Videnskabelig rapport nr. 522
http://dce2.au.dk/pub/SR522.pdf

Summary

This report is part of a major work under the Danish Environmental Protection Agency to develop models for estimating concentrations of environmentally hazardous pollutants in surface water. The starting point in this report is a pilot version of a statistical model for the Danish Environmental Protection Agency (MetalStat) for the nationwide occurrence of dissolved metals in running waters. The report is, thus, based on reporting of Metal-Stat’s principles given by Sørensen et al., 2022. Nationwide, the model distributes the metal concentration in the running waters into the ID15 catchments, in which the measuring stations are located, where an ID15 catchment has a varying area of typically 10-15 km2. The task for 2022 is to validate and improve the pilot version of MetalStat to be included in the Environmental Administration’s management. For calculation reasons, in 2021 it was only possible to complete calculations for the three metals lead, cadmium and nickel, although there are also extensive data on copper and zinc. This was due to the fact that the structure of the model has been adjusted right up to the end of 2021, and there are greater requirements for calculation time with five metals than with three. In this report, MetalStat has therefore expanded with copper and zinc, so that it deals with five metals in a multivariate statistical model that utilizes the correlation with the metals in the effort to provide an optimal use of monitoring data.

For the catchment areas where the metal concentration has been measured, the mean of the concentration on a log scale is estimated as a latent variable. As the mean value on the log scale corresponds to the median concentration, the main outcome from MetalStat is predictions of median concentrations. For those catchment areas that do not have measurements, MetalStat makes predictions of the median concentration, with an associated uncertainty interval, and is, thus, a model that covers the entire country.

The estimated latent variables are adapted to a multinormal model, in which the effect of years is quantified. All estimations and predictions are calculated as probability distributions for every ID15 catchment. So, the uncertainty of the prediction can be given at a 95% uncertainty interval (there is a 95% probability that the correct value is within the range). It is also possible to predict the probability that there are concentrations above environmental quality standards in the ID15 catchment. The size of the model’s uncertainty depends on the amount of monitoring data applied in the model. Thus, it is expected that MetalStat will predict significantly less uncertainty when all data from the period 2020-2022, which in this report are not included in the model, are applied in the next version of the model. This will mean that the predicted upper 95% uncertainty intervals will be reduced. However, it has not been possible within the time frame for this report to update MetalStat with all measurements from the period 2011-2022.

MetalStat can document the importance of different metal sources in relation to the total share of metal concentrations in watercourses, which can be explained by the attributes of each ID15 catchment. MetalStat will also be able to support identification of the proportion of natural mineralogical sources that contribute to the metal concentration.

If an ID15 catchment area does not include any measuring stations, and therefore appears as unmeasured, exchanges water with another ID15 catchment area, in which there are measurements, then these measurements in the measured ID15 catchment will contribute knowledge about the concentration level in the unmeasured ID15 catchment, as some of the water in the measured ID15 catchment also occurs in the unmeasured ID15 catchment. This is utilized in MetalStat using a mass balance correction to improve the certainty of the model’s predictions, so that measurements in measured ID15 catchments can contribute knowledge in other ID15 catchments without measurements.

MetalStat includes an X matrix that collects information from GIS about each catchment area (see Sørensen et al., 2022). As also described in Sørensen et al., 2022, it will therefore be useful to continuously evaluate and improve the X matrix as more measurements and new GIS data sources become available. A regression coefficient R2 on log-transformed concentrations is calculated based on how well the X matrix, together with seasonal variation, can estimate the latent concentration in the measured catchments. For four out of five metals, the range was  0.19-0.3, while Nickel fell out with a low value of 0.06. Thus, there is clearly room for improvement of the X matrix in particular. It is the project group’s assessment that the first step towards an improvement is to include the groundwater contribution in more detail. A small value for R2 means that MetalStat describes most of the variation in concentration values on a logarithmic scale, partly as random effects between catchments and years, and partly as residuals that vary randomly from one measurement to the next. 

New measurements from the period 2020-2022 have been used to evaluate MetalStat for the five metals. There was an intensive measurement program during this period, with approximately twice as many samples collected in the period 2020-2022 as in the entire period 2011-2019. These measurements cover approximately 400 ID15 catchments. A descriptive statistical analysis of the measurements from the period 2020-2022 that were measured above the detection limit generally indicates a lower concentration level compared to the measurements for the period 2011-2019, especially for zinc. MetalStat is not necessarily challenged by annual shifts in concentration levels, if these are not much greater than those occurring in the period 2011-2019. However, the subsequent analysis of the individual metals shows that the annual displacement challenges MetalStat for zinc, but not for the other four metals.

From MetalStat, a probability density function was generated to map the realistic intervals of the median concentration for each ID15 catchment that was measured during the period 2020-2022. Only in very few cases, the new measured concentrations were considerably outside the predicted interval of MetalStat. A closer examination of the individual major deviations for especially lead shows that for a few cases, the ID15 catchment area size of 10-15 km2 may not be fully representatively covered by the selected locations of NOVANA stations. For zinc, MetalStat generally provides an overprediction of the uncertainty interval because of the sharp decrease that has generally occurred in the concentration levels between period 2011-2019 and the period 2020-2022. When MetalStat is updated with the measurements from the period 2020-2022, it will therefore be advantageous to estimate the variables for zinc from the year 2020 as a new latent variable, so that there are two versions of latent variables for zinc in the model, one before 2020 and another from 2020.

Additional calculations of technical improvements have been made in the present version of MetalStat in relation to the reporting in Sørensen et al., 2022. In particular, the use of a local likelihood function in each catchment for the latent concentrations has reduced the calculation time. This means that a calculation process that tests the value of a latent concentration now only must calculate specifically on the ID15 catchment area in which this latent concentration is estimated, without recalculating all the other catchment areas. In the effort to optimize the X matrix, so-called “horseshoe priors” are being tested, which will find most important significant factors with less calculation time, but this test has not been completed with the release of this report.

During the preparation of this report, it became clear that an operating version of MetalStat was not finalized for administration in 2022, but that this activity will be sought to be carried out in 2023. However, a planning of coding of the operating version has been made to speed up the work. During this planning, a clear plan was drawn up to parallelize MetalStat to accelerate the computational speed using multiple CPU cores. Thus, there is an ongoing development that minimizes the calculation time, partly by calculating smarter, and therefore reducing the need for calculations, and partly by increasing the available computing power.

The most obvious improvement of the X matrix to be considered as a next step is to make a better description of the groundwater contribution to better describe the geogenic contribution that is not due to human activity. Such an improvement should be made as part of the development of an operating version of MetalStat. It is recommended that existing studies that relate metals in groundwater to geology, pH and redox conditions are taken as a starting point to extend the X matrix. Since the relationship between these attributes and the metal concentration in groundwater is highly non-linear, it is best to use a model that uses cut-off values rather than assuming a continuous function between pH and redox on one side and the metal concentration on the other. Therefore, it is recommended that an initial analysis discloses correlations between metal concentration in groundwater, pH, Redox and geology, which will constitute input to the description with the X matrix in MetalStat. The first step will be to ensure access to relevant GIS layers in cooperation with relevant institutions.

Contaminated sites could also be included in the X matrix, but this is not considered to be as important as the groundwater contribution for the five metals in MetalStat, which is why this activity must be prioritized after a groundwater inclusion. With these reservations, this report proposes a model for contaminated sites that could be built into MetalStat based on an existing screening model for the potential impact of contaminated sites on water flows.