Prior to publishing analysis and interpretation of water quality data, we will ensure that all data that meets QA/QC standards outlined in the current project Quality Assurance Project Plan (QAPP) and is accessible in the appropriate public repository.
Water quality data from this project is ultimately destined for the Environmental Protection Agency’s Water Quality Exchange (EPA WQX).
The Quality Assurance Project Plan (QAPP) for this project describes data management details and responsible parties for each step of the data pipeline from observation to repository. The 2021 data preparation and review process is published here as an appendix as an example of the process applied annually to each year’s data.
In this appendix we will collate 2021 laboratory data from several sources into a single spreadsheet document with a consistent format. The desired end format is a spreadsheet template provided by the EPA Water Quality Exchange. These template files are available to download from the EPA at https://www.epa.gov/waterdata/water-quality-exchange-web-template-files.
Once the data is collated, it will be evaluated according to a Quality Assurance Checklist (template example provided by the Alaska Department of Environmental Conservation Soldotna office). Field observations that do not meet the quality assurance standards described in the evaluation checklist will be flagged before being uploaded to the EPA WQX.
Data that has been uploaded to the EPA WQX is evaluated biannually by the Alaska Department of Environmental Conservation (ADEC) in their Integrated Water Quality Monitoring and Assessment Report1. The integrated report evaluates available water quality data from the previous five years against Alaska water quality standards and regulations (ADEC 2020).
A.1.1.1 2021 Water Quality Data AQWMS Formatting
The code scripts in this document assemble water quality data from the three analytical laboratories that partnered with Kenai Watershed Forum for this project in 2021:
A.1.2 2021 Provisional Results, Prior to QA/QC Review
Results last updated 2024-11-05
The above data sources have been collated in to a single .csv file (available for download) into a format compatible with the EPA Water Quality Exchange. These data have not yet been evaluated against QA/QC standards following guidance in the current project Quality Assurance Project Plan.
Prior to upload to the EPA WQX, all water quality data must be checked against a series of standard questions in order to evaluate how quality assurance / quality check (QA/QC) requirements are met. The draft Data Evaluation Checklist Template (available for download below) outlines these questions:
Note: the sequence of questions is structured such that data will be gradually flagged at each step based on the criteria described.
1.) Were the appropriate analytical methods used for all parameters?
Yes. Analytical methods from the approved 2020 QAPP were employed.
2.) Were there any deviations from the sampling plan?
All sites were visited as planned on 5/11/2021 and 7/27/2021. Most intrinsic water quality parameters measured with instruments (pH, dissolved oxygen, conductivity, turbidity) were not measured, which is a deviation of from the QAPP.
3.) Were field duplicates, blanks, and/or other QC samples collected as planned?
To see a table comparing planned vs actual results for 2021, download and view the csv file linked above.
From the above table we can see that there are deviations between planned results and actual results available. These reasons for the deviations are known and are attributable to two causes:
Cause 1: The Spring 2021 Chain of Custody (COC) from KWF to SGS was completed erroneously. The COC specified for EPA Method 200.8 analyses (dissolved metals) to be complete for all sites (when they should have stopped upstream of Morgan’s Landing RM31), and it also specified for EPA Method 200.7 analyses (total metals) to stop upstream of Morgan’s Landing (when they should have been performed for all sites in the project).
As a result, for Spring 2021 total metals data will be unavailable for sites upstream of the Morgan’s Landing RM31 site.
Cause 2: For Summer 2021, the SGS laboratory performed the EPA Method 200.8 analyses (dissolved metals) for all 27 analytes available for the method; instead of just the smaller subset of analytes as requested. (E.g., KWF received extra data for free. In this case., there are no consequences of deviating from the planned analyses).
4.) Do the laboratory reports provide results for all sites and parameters?
The laboratory reports provide results for all sites, and for all parameters, with the exceptions outlined above in question #3.
5.) Is a copy of the Chain of Custody included with the laboratory reports?
We worked with three separate laboratories in 2021:
SGS Laboratories, Anchorage, AK
Chain of Custody documents are included within the PDF laboratory reports linked above earlier in this appendix.
Soldotna Wastewater Treatment Plant, Soldotna, AK
Chain of Custody documents are on file with Kenai Watershed Forum for fecal coliform and total suspended solids for 5/11/2021, and for total suspended solids on 7/27/2021.
Tauriainen Engineering & Testing, Soldotna, AK
An individual document for each sample reports the time and date of delivery and analysis for each sample. These documents are included with the PDF laboratory reports linked above earlier in this appendix.
6.) Do the laboratory reports match the Chain of Custody and requested methods throughout?
The laboratory reports match the Chain of Custody and requested methods, with the one exception discussed in question #3.
For summer 2021, the SGS laboratory performed the EPA Method 200.8 analyses for all 27 analytes available for the method; instead of just the smaller subset of analytes as requested. (E.g., KWF received extra data for free. In this case., there are no consequences of deviating from the planned analyses).
7.) Are the number of samples on the laboratory reports the same as on the Chain of Custody?
The quantity of sample bottles sent to the laboratories matches the number of analyzed samples for samples collected and delivered on 5/11/2021 and 7/27/2021.
8.) Was all supporting info provided in the laboratory report, such as reporting limits for all analyses and definitions?
We worked with three separate laboratories in 2021:
SGS Laboratories, Anchorage, AK
SGS provided data as PDFs which included reporting limits, as well as in the form of an Electronic Data Deliverable where this information is also included in column format.
Soldotna Wastewater Treatment Plant (SWWTP), Soldotna, AK
SWWTP provided data in the form of .xls files. Reporting limits are described in this project’s current Quality Assurance Action Plan.
Tauriainen Engineering & Testing, Soldotna, AK
Tauriainen provided data in the form of PDF documents. Reporting limits are described in this project’s current Quality Assurance Action Plan.
9.) Are site names, dates, and times correct and as expected?
Yes, after post-season correction documented in this report. Notes: In 2021 Kenai Watershed Forum used pre-printed waterproof labels on all sample bottles, reducing opportunity for field and lab transcription errors. Remaining site name transcription errors from laboratories were corrected in post-season data review, using scripted code that did not modify the original report.
10.) Were there any issues with instrument calibration?
Instruments to measure intrinsic water quality parameters (sondes; to measure pH, dissolved oxygen, conductivity, turbidity) were not employed in 2021.
Teams did use hand-held probes to record water temperature on-site. Prior to field use, the hand-held probes were verified as measuring within the accuracy level defined in the QAPP using an ice bath in the Kenai Watershed Forum laboratory.
11.) Did the instruments perform as expected?
The hand-held water temperature probes performed as expected in 2021.
12.) Was instrument calibration performed according to the QAPP and instrument recommendations?
Water temperature is a parameter that is “verified” rather than calibrated. The hand-held water temperature probes were verified as measuring within the accuracy level defined in the QAPP using an ice bath in the laboratory, according to instrument recommendations.
13.) Was instrument verification during the field season performed according to the QAPP and instrument recommendations?
The hand-held water temperature probes were verified as measuring within the accuracy level define in the QAPP using an ice bath in the laboratory, according to instrument recommendations.
14.) Were instrument calibration verification logs or records kept?
Yes. These records are held at Kenai Watershed Forum, 44129 Sterling Hwy, Soldotna, AK.
15.) Do the instrument data files site IDs, time stamps and file names match?
Instrument files were not employed in 2021. Measurements from hand held probes were recorded on waterproof paper field forms.
16.) Is any insitu field data rejected and why?
No insitu data is rejected from 5/11/2021 or 7/27/2021.
17.) Were preservation, hold time and temperature requirements met?
Yes. Summer and Spring 2021 holding time requirements were met for all samples. See downloadable files below. Laboratory result documents indicated no compromises of preservation and temperature requirements.
18.) Are dissolved metal quantities less than total metals quantities?
Dissolved metals results are not consistently less than total metals results, as would be logically anticipated (see post from Flowlink Environmental website for description and rationale):
“Dissolved metal concentration is determined by filtering a water sample through 0.45 uM filter. Water that passes through the filter is analyzed for metals and the result is reported as dissolved metal concentration. Total metal concentration is determined by analyzing the unfiltered sample.
“Total metal concentration = Dissolved metal concentration + Particulate (insoluble) metal concentration.”
In 2021, there are results available for both dissolved and total metals for three elements: Ca, Fe, and Mg; for sites at and downstream of RM 31.
Dissolved metals results are available for 7/27/2021 (summer) only, and not for 5/11/2021 (spring); see question #3 in this sequence.
Results:
Calcium: all available observations show dissolved metals > total metals
Iron: all available observations show dissolved metals < total metals
Magnesium: nearly all available observations show dissolved metals < total metals, with the exception of three sites, all near the Kenai River mouth
As of Fall 2023 it is unclear why the values measured for total metals is greater than dissolved metals for the cases described above.
See discussion XX in “Summary” section of this report for plans in future years to address the issue of dissolved metals values being greater than total metals values in some cases.
19.) Are the duplicate sample(s) RPD (Relative Percent Difference) within range described in QAPP?
In 2021, field duplicate samples were collected at two sites on both 5/11/2021 and 7/27/2021:
No Name Creek (Tributary, RM0)
Funny River (Tributary, RM30)
Following the example in the Alaska Department of Environmental Conservation 2021 Kenai River Metals Field Report (Apsens and Petitt 2022): “… a set of paired samples was evaluated for RPD only if:
a.) one or both of the samples were above the Limit of Quantitation (LOQ); AND if
b.) one or both of the samples were at least two times the LOQ.”
A csv file of Relative Percent Difference values organized by site, date, and parameter may be downloaded below:
From the above downloadable table, we can see that a limited subset of paired values from field duplicates are eligible to be used for RPD calculations for 2021. Pairs of measurements must meet LOQ criteria described above (“a” and “b”) in order to be used. For 2021 data, of 69 paired field duplicate observations, 24 of them meet these criteria and thus may be used for RPD calculations.
RPD values range from 0 % to 165.22 %.
In the 2020 Quality Assurance Action Plan, Relative Percent Difference values for most parameters are set at 20%, with the exception of “Phosphorus, total” and “Nitrates (NO2+NO3)” at 25%, and “Suspended solids, total” at 5%.
Using the available RPD values against the above criteria, four paired observations exceed project precision goals in 2021. Three of these four observations are for fecal coliform, and one is for total phosphorus.
Most other parameters have at least one available RPD value that is within project precision goals defined in the QAPP.
The parameters that do not have at least one RPD calculation available, due to the LOQ constraints described above, are the following:
Arsenic
Cadmium
Chromium
Copper
Lead
Zinc
Other notes
In 2021 the full suite of dissolved metals was run for method 200.8 on samples from 7/27/2021. However we intend to only submit data for parameters that had a pre-defined QA plan as outlined in the QAPP, which is the case for the following dissolved six metals: Mg, As, Cd, Cr, Cu, Pb, and Zn.
Summary of Discussion on RPD (Relative Percent Difference) Values for 2021
We attempted to calculate RPD values for all parameters collected in 2021 at two field sites on two dates. For each pair of duplicate observations, one of the following descriptions apply:
It was possible to calculate RPD values because:
a.) one or both of the samples were above the LOQ; AND one or both of the samples were at least two times the LOQ
Project data for this parameter will be submitted unflagged to EPA WQX if RPD values meet project goals outlined in the QAPP
It was not possible to calculate RPD values because:
a.) one or both of the samples were not above the LOQ; OR one or both of the samples were not at least two times the LOQ
Project data for this parameter will be submitted unflagged to EPA WQX if both paired values are non-detect
Best practices for interpreting and applying RPD data in order to flag/not flag the overall dataset are currently being finalized. KWF communicated with the Alaska Department of Environmental Conservation on August 22, 2023 via an inquiry on this topic. The memo includes a visual flow chart of the logic for when data is flagged, and is available for download below.
Final determinations on flagging / not flagging 2021 data submitted to the EPA WQX
All 2021 field results will be submitted unflagged to the EPA WQX unless discussed below.
Fecal Coliform
Given that 3 out of 4 RPD values for fecal coliform values are >60%, field data values for the 2021 dataset may vary from actual environmental conditions. The 2020 KWF QAPP does not specify an RPD value for fecal coliform, it only specifies control checks for sterility and temperature. We used EPA Method 9222D (drinking water) for fecal coliform analysis methods, which does not specify an RPD value. However EPA general recommendations for pathogens in sewage sludge recommends and RPD of 20% (link).
Given the above considerations, KWF will submit 2021 fecal coliform data to the EPA WQX as “flagged”. Fecal coliform data will be evaluated in this report and assessed against state standards, but given the high RPD values among duplicate samples, 2021 data will likely not be considered in the context of potential regulatory exceedences.
TO DO: Calculate and visualize RPD values for 2014-2022 and compare, also compare to RPD values in previous comprehensive report.
Because these parameters do not have an associated RPD value from 2021 field data, they will be submitted to EPA WQX as “flagged.”
Note: beginning in Summer 2022, KWF initiated a series of changes to the QAPP that will better facilitate QA/QC decisions regarding Total Metals and Dissolved metals results. These changes include use of a dissolved metals field blank, and changing from field filtration to lab filtration as standard practice.
Joining with `by = join_by(activity_start_date, characteristic_name)`
20.) Were there any laboratory discrepancies, errors, data qualifiers, or QC failures (review laboratory duplicates, matrix spikes and blanks)?
Laboratory discrepancies and QC failures from the SGS lab are described on page 2 of the spring and summer PDF document reports, which are available for download below.
The following discrepancies were identified in the results from SGS laboratories:
Matrix Spike Recoveries
A total of 6 matrix spike or matrix spike duplicate samples are outside of QC criteria. The limit of recovery range for the analyte “Total Nitrate/Nitrite-N” is 90% - 110%. For these matrix spike measurements outside of the QC criteria, recovery levels range from 112% - 118%. The subset of sample values that exceed threshold may be downloaded as a table below:
No additional laboratory QA/QC anomalies for any other parameters were noted by any laboratories contracted in 2021. Additional details are available upon request at hydrology@kenaiwatershed.org
As of 2024-11-05, “Total Nitrate/Nitrite-N” values will be flagged for the spring 2021 dataset. (Spring only and not summer because no matrix spike recovery exedences were observed from the summer 2021 lab results). The matrix spike recovery exceedances indicate that reported lab values may be 2% - 7% above actual field values. At a later date we will evaluate these results in the context of year 2000 - present data to help evaluate the relevancy of this flag.
21.) Is any laboratory data rejected and why?
No additional 2021 laboratory data currently indicates a need for being flagged or rejected.
22.) During the field season, review raw data files (EDDs [Electronic Data Deliverables], instrument records) as they are received. Document changes and corrections to methods as needed.
In 2021, KWF initiated as standard practice the receipt of machine-readable EDD files from contracted laboratories, in addition to PDF results. The EDD files are being reviewed as part of this QA/QC process, which was initiated in Winter 2022.
23.) Is the dataset complete and did you receive the expected number of results?
From the discussion and results in question #3, deviations are evident between planned and actual results reported. The reasons for the deviations are known and are attributable to the two factors discussed in question #3, and thus not concerning to project integrity.
24.) Was the data collected representative of environmental conditions?.
The data collected in this project is representative of seasonal hydrological conditions on two fieldwork days in 2021. Conditions vary instantaneously thus this dataset serves to provide representative longitudinal snapshots of the watershed across space on the two dates.
Flagged data indicates that lab results may not be representative of environmental conditions, and thus may require additional interpretation or exclusion from analysis.
25.) Does project meet Completeness Measure A criteria?
From the QAPP, Completeness Measure A: “the primary number of samples collected divided by the usable number of samples submitted to ADEC, with a goal of 85% completeness.” (Note that as of 2023 results are now submitted directly to EPA rather than DEC, per request of DEC).
(Note: metals parameters not specified in the QAPP are excluded from these analyses. In 2021, SGS laboratories provided dissolved metals results for parameters beyond what was originally requested.)
We calculated Completeness Measure A at the following project scales:
Based on the data flagging decisions discussed in the previous questions, the 2021 dataset did not achieve all Completeness Measure A goals.
When completeness results were summarized by parameter, all dissolved metals had a value of 0% for Completeness Measure A since the decision was made to flag these as a result of an inability to calculate Relative Percent Difference values.
When completeness results were summarised by Site, Completeness Measure A values ranged from 44.4% - 71.4%, depending on the site’s sampling plan.
When completeness results are summarized for the overall 2021 dataset, Completeness Measure A is calculated at 50.1%; indicating that 49.9% of the overall result values were flagged.
26.) Does project meet Completeness Measure B criteria?
From the QAPP, Completeness Measure B: “the planned number of samples divided by the useable samples (submitted to ADEC) with a goal of 60%.”
For these calculations, we will consider “usable” as “unflagged,” and will calculate at the following scales:
For individual parameters (analysis methods)
For individual sites
As a project whole
Work still in progress on calculating Completness Measures B as of 2024-11-05
27.) Was the QA officer consulted for any data concerns?
Note: the 2012 and 2019 approved Quality Assurance Action Plan (QAPP) documents list Alaska Dept of Environmental Conservation Staff as the project’s QA officer(Kenai Watershed Forum 2012, 2019), whereas the 2023 approved QAPP lists Kenai Watershed Forum staff as the project QA officer(Kenai Watershed Forum 2023).
Kenai Watershed Forum consulted regularly with staff from the Alaska Department of Environmental Conservation and EPA Region 10 Office in order to prepare 2021 field data and confirm field methods and data QA/QC procedures.
28.) Are the correct monitoring locations associated with the project? Are latitude and longitude filled out in a consistent format?
We used the leaflet package to generate a map of unique sites from the 2021 dataset:
In the above map we can see that coordinates for the 2021 field sites correspond to the locations listed in the Quality Assurance Project Plan, thus they are filled out in a consistent format.
29.) Are the QAPP and other supporting documents attached?
Answer TBD. More detail applied here once upload is complete.
31.) Is the organization ID correct?
Kenai Watershed Forum’s Organization ID in the EPA Water Quality Exchange is “Kenai_WQX.” This value is manually confirmed during the upload process.
32.) Are the time zones consistent and correct (AKDT in summer)?
All times zones are listed as AKDT.
33.) Are all media types included? Media types appropriate to Characteristic?
“Water” is the only media type considered in this project. All “Characteristics” (parameters) are measured from water samples or in-situ water conditions.
34.) Check Sample Collection, Preparation and Preservation Methods, Thermal Preservative, Equipment ID, Activity Media. Is supporting information included and correct??
[1] "Water Bottle"
“Sample collection method”: all sample values have “Equipment ID” listed as “Water Bottle.”
“Preparation”:this information corresponds to the “result sample fraction” column
“Preservation Methods”: this information corresponds to the “preservative” and “thermal preservative” columns.
35.) Are all expected activity types present and are QC samples correctly identified?
[1] "Field Msr/Obs"
[2] "Quality Control Field Replicate Msr/Obs"
[3] "Quality Control Sample-Trip Blank"
The activity types present are listed above. All three types were planned and anticipated. QC samples are described earlier in Question #3.
36.) Is the Activity media subdivision filled in (if relevant)?
[1] "Surface Water"
All “Activity Media Subdivision” names are listed as “Surface Water.”
37.) For Water activity media, is the relative depth filled in?
Yes, “activity depth height measure” and “activity depth height measure unit” are included as columns.
38.) Is the number of results for each Characteristic correct?
Questions #2 and #3 address this topic in detail. A summary table of the total number of observations for each characteristic is available in the downloadable table below.
Prior to upload to the EPA WQX, the range of results will be evaluated visually and numerically in each characteristic-specific section of the comprehensive report, relative to past results. Likely outliers will be identified and recorded in a dedicated spreadsheet (download below). Each likely outlier will have accompanying notes discussing the choice, and a decisions as to whether a more detailed statistical investigation is warranted.
We observe that equivalent units, “mg/L” and “mg/l”, have slightly different notation among parameters. We will correct this at this step to make terms consistent.
41.) Are detection limits and laboratory qualifiers included for analytical results?
Laboratory qualifiers including “J” (below limit of quantitation; is an estimate), “=”, (observed value), “U” (below detection limit), and “ND” (Not Detected) are included in the column titled “result_qualifier.”
Detection limits and their corresponding units are included in columns titled “result_detection_limit_type_1” and in “result_detection_limit_unit_1” corresponding to Limit of Quanititation, and in columns titled “result_detection_limit_type_2” and “result_detection_limit_unit_2” corresponding to Method Detection Level.
42.) Are results trip blanks and/or field blanks above detection limits, and is any related data flagged as a result?
In summer 2021, trip blanks were included with those field crews tasked with collection water samples for hydrocarbon analyses. Sites visited by these crews include RM 1.5 - Kenai City Dock, RM 6.5 - Cunningham Park, Rm 40 - Bing’s Landing, and RM 43 - Upstream of Dow Island.
On the summer 2021 SGS chain of custody documents, BTEX trip blanks are specified for the above sites. However in the reported results, a single sample named “Trip Blank” is reported. This error resulted from a mis-communication with SGS staff.
In this case, all hydrocarbon results in the Trip Blank are reported as below detection limit, suggesting there is not a need to flag summer 2021 hydrocarbon results.
A.1.4 Uploading 2021 data to the EPA Water Quality Exchange (EPA WQX)
Each water quality dataset that is housed in the EPA WQX consists of three uploads:
Project description
Monitoring locations
Results and activities
We will prepare and finalize each of these uploads here.
A.1.4.1 Project Description
On 12/5/2023 we downloaded the KWF Agency Baseline project description datafile from the EPA Water Quality Portal (EPA WQP). The file can be directly accessed at the link below:
We found that the project file listed the QAPP (Quality Assurance Project Plan) as “not yet approved.” It is possible that this designation is listed as such because previously approved versions of the QAPP (versions 1.0 and 2.0) do not include signatures from an EPA staff on the Title & Approval page, but were instead approved by Alaska Department of Environment Conservation Staff. For the QAPP approved on 5/9/2023 (version 3.0), two Region 10 EPA staff signed as approvers on the Title & Approval page.
To apply this correction, in the “Project” spreadsheet we modified “QAPP Approval Status” from “N” to “Y” and re-uploaded to the EPA Water Quality Exchange in Winter 2023.
A.1.4.2 Monitoring locations
On 12/5/2023 we downloaded the KWF Agency Baseline project monitoring locations datafile from the EPA Water Quality Portal (EPA WQP). The file can be directly accessed at the link below:
We retained all sites whose column description for “MonitoringLocationName” contained the prefix “KBL” (Kenai Baseline).
We plotted locations on a map using the leaflet package, and confirmed that existing site names in the database matched those used in the 2021 dataset.
In the EPA WQX Central Data Exchange, we created a custom import configuration to accommodate importing the list of existing sites.
A.1.4.3 Results and Activities
Within Kenai Watershed Forum’s EPA Water Quality Portal web account, we developed an import configuration that accommodated all relevant columns and data types for this project. We based the configuration primarily on the template provided by the Alaska Dept of Environmental Conservation.
For this export, we identified as “Rejected” all observations that had been “flagged” throughout the QA/QC process described above. Flagged observations will be uploaded to the EPA WQX labeled as “Rejected” under the column “Result Status ID.”
value for "which" not specified, defaulting to c("rows", "cols")
New names:
Joining with `by = join_by(monitoring_location_id)`
Joining with `by = join_by(characteristic_name)`
• `` -> `...5`
Phosphorus from EPA Method 200.8 (Dissloved Metals by ICP/MS) requires a “method speciation”. (This does not apply to “Total Phosphorus”, method SM21 4500P-B,E). However this method does not distinguish speciation. Result Status ID was designated “as P” for this test/parameter. Regardless, this parameter was not included in the QAPP, so is designated as “Rejected” in “Result Status ID”.
---execute: echo: falsedate: "`r Sys.Date()`"format: html: code-fold: true code-tools: true code-summary: "Show the code"---# Appendix: Data Review and Upload```{r echo = F, message = F}# notes 1/24/2023# Current plan: # 1.) follow DEC data prep / evaluation steps per QAPP# 2.) format data to match WQX web template, as seen on youtube at https://www.youtube.com/watch?v=elsHENWlU5w```## IntroductionPrior to publishing analysis and interpretation of water quality data, we will ensure that all data that meets QA/QC standards outlined in the current project [Quality Assurance Project Plan (QAPP)](https://paperpile.com/app/p/7703451b-460d-00b4-82a0-1086ea2554c3) and is accessible in the appropriate public repository.Water quality data from this project is ultimately destined for the Environmental Protection Agency's Water Quality Exchange (EPA WQX).The Quality Assurance Project Plan (QAPP) for this project describes data management details and responsible parties for each step of the data pipeline from observation to repository. The 2021 data preparation and review process is published here as an appendix as an example of the process applied annually to each year's data.Year 2021 QA/QC is included here as an example. Other project years QA/QC may be accessed at <https://kenai-watershed-forum.github.io/kenai-river-wqx-qaqc/>. ### Year 2021 Water Quality DataIn this appendix we will collate 2021 laboratory data from several sources into a single spreadsheet document with a consistent format. The desired end format is a spreadsheet template provided by the EPA Water Quality Exchange. These template files are available to download from the EPA at <https://www.epa.gov/waterdata/water-quality-exchange-web-template-files>.Once the data is collated, it will be evaluated according to a Quality Assurance Checklist (template example provided by the Alaska Department of Environmental Conservation Soldotna office). Field observations that do not meet the quality assurance standards described in the evaluation checklist will be flagged before being uploaded to the EPA WQX.Data that has been uploaded to the EPA WQX is evaluated biannually by the Alaska Department of Environmental Conservation (ADEC) in their [Integrated Water Quality Monitoring and Assessment Report](https://dec.alaska.gov/water/water-quality/integrated-report/)[^appendix_a-1]. The integrated report evaluates available water quality data from the previous five years against Alaska water quality standards and regulations [@adec2020].[^appendix_a-1]: https://dec.alaska.gov/water/water-quality/integrated-report/#### 2021 Water Quality Data AQWMS FormattingThe code scripts in this document assemble water quality data from the three analytical laboratories that partnered with Kenai Watershed Forum for this project in 2021:- SGS Laboratories (Anchorage, AK)- Soldotna Wastewater Treatment Plant (Soldotna, AK)- Taurianen Engineering and Testing (Soldotna, AK)<br>------------------------------------------------------------------------##### 2021 Metals/Nutrients Lab Results (SGS Labs)```{r, echo = F, message = F}library("xfun")xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SGS/spring_2021_sgs_batch_info.csv', text = "Download Original Spring 2021 Metals/Nutrients Lab Results from SGS - Electronic Data Delivery file")xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SGS/Spring 2021 Results from SGS.pdf', text = "Download Original Spring 2021 Metals/Nutrients Lab Results from SGS - PDF file")``````{r, echo = F}xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/SGS/summer_2021_sgs_batch_info.csv', text = "Download Original Summer 2021 Metals/Nutrients Lab Results from SGS - Electronic Data Delivery file")xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/SGS/SGS_results_Summer2021.pdf', text = "Download Original Summer 2021 Metals/Nutrients Lab Results from SGS - PDF file")```\**Note: the chain of custody documents for SGS Laboratories are integrated into the above downloadable PDF files.*\newpage```{r, 2021 AQWMS formatting for SGS, echo = F, message = F}#| warning: false#| message: false# clear environmentrm(list=ls())# load packageslibrary(tidyverse)library(readxl)library(openxlsx)library(data.table)library(stringr)library(magrittr)library(janitor)library(hms)library(lubridate)library(anytime)xfun::pkg_load2(c("htmltools", "mime"))# Assign 2021 Field Sample Dates # Spring 2021 sampling datespring21_sample_date <- "5/11/2021"# Summer 2021 Sampling Datesummer21_sample_date <-"7/27/2021"``````{r message = FALSE, echo = F, include = F}#| warning: false#| message: false######################################################################################################################################################### Read in and Clean SGS/ALS Data ################################################################################################################################################################################## Part A: SGS Data Read In ############################### Reformat SGS data downloaded from their server client (SGS Engage, full EDD files) to match AQWMS template# read inspring_batch_sgs21 <- read.csv("other/input/2021_wqx_data/spring_2021_wqx_data/SGS/spring_2021_sgs_batch_info.csv")summer_batch_sgs21 <- read.csv("other/input/2021_wqx_data/summer_2021_wqx_data/SGS/summer_2021_sgs_batch_info.csv")# clean up and retain only useful columnssgs21 <- bind_rows(spring_batch_sgs21,summer_batch_sgs21) %>% clean_names() %>% remove_empty() %>% # remove unneeded columns select(-project_id)%>% rename(sample = sample_id, lab_sample = lab_sample_id, detection_limit = dl) %>% transform(lab_sample = as.character(lab_sample), sample_rpd = as.character(sample_rpd)) %>% # add lab name mutate(lab_name = "SGS North America, Anchorage, Alaska", matrix = "Water") %>% # split a.) lab sample run & b.) collect time and date in prep for future join with ALS data ##### NOTE: SGS data has date and time, ALS has date only. transform(collect_date_time = mdy_hm(collect_date), rec_date_time = mdy_hm(rec_date), run_date_time = mdy_hm(run_date_time), extracted_date_time = mdy_hm(extracted_date)) %>% mutate(collect_time = as_hms(collect_date_time), collect_date = date(collect_date_time), rec_date = date(rec_date_time), rec_time = as_hms(rec_date_time), run_time = as_hms(run_date_time), run_date = date(run_date_time), extracted_time = as_hms(extracted_date_time), extracted_date = date(extracted_date_time)) %>% select(-collect_date_time,-rec_date_time,-run_date_time) %>% rename(sample = sample)rm(spring_batch_sgs21,summer_batch_sgs21)###################### Part B: ALS Data Read In ############################### SGS subcontracted analyses of Ca, Fe, and Mg to ALS laboratories (Kelso, WA). These results are not included in the spreadsheet download from SGS engage and were entered manually in to seperate spring and summer "ALS" named spreadsheets#### read in spring 2021 results from ALS spring_als21 <- read.csv("other/input/2021_wqx_data/spring_2021_wqx_data/SGS/spring_2021_als_batch_info.csv") %>% clean_names() summer_als21 <- read.csv("other/input/2021_wqx_data/summer_2021_wqx_data/SGS/summer_2021_als_batch_info.csv") %>% clean_names()# bind spring and summerals21 <- bind_rows(spring_als21,summer_als21) %>% remove_empty() %>% # proceed left to right of existing ALS dataframe to make its naming structure match the sgs21 dataframe. Add, remove, modify column names as needed select(-client, -project, -service_request) %>% rename(lab_sample = lab_code) %>% rename( collect_date = date_collected, collect_time = time_collected, rec_date = date_received, rec_time = time_received, # sample_type ::: not sure where to match with sgs data yet or where to put in aqwms, but is important for qa/qc extracted_date = date_extracted, extracted_time = time_extracted, extraction_code = extraction_method, run_date = date_analyzed, run_time = time_analyzed, analytical_method = method, #units = units, analyte = component, resultflag = result_notes, amount_spiked = spike_concentration, percent_recovered = percent_recovery, allowable_limit = acceptance_limits, sample_rpd = rpd, # change report/detection limit terminology See SGS document, "SGS DL, LOD, LOQ Interpretation" loq = reporting_limit) %>% mutate(lab_name = "ALS Environmental, Kelso, Washington" #, #run_time = "" ) %>% # prep column classes to bind with sgs dataframe transform(analytical_method = as.character(analytical_method), run_date = mdy(run_date), run_time = as_hms(as.POSIXct(run_time, format = "%H:%M")), #run_time = as_hms(run_time), collect_date = mdy(collect_date), rec_date = mdy(rec_date), rec_time = as_hms(as.POSIXct(rec_time, format = "%H:%M")), extracted_date = mdy(extracted_date), extracted_time = as_hms(as.POSIXct(extracted_time, format = "%H:%M")), result = as.double(result), collect_time = as_hms(as.POSIXct(collect_time, format = "%H:%M")))# join SGS data with ALS datadat <- bind_rows(sgs21,als21) # remove old dataframesrm(als21,sgs21,spring_als21,summer_als21)# export table of sample types## assign sample type acronyms just like with sgs21 samples. see excel file for full definitions# --> make sure doesn't conflict with other sample_type designations in rest of document. use same acronyms### export table of sample types, then manually translate their abbreviationssample_types <- dat %>% select(sample_type,lab_name) %>% distinct()# remove old version and write new oneunlink("other/input/AQWMS/sample_type_abbreviations.xlsx")write.xlsx(sample_types, "other/input/AQWMS/sample_type_abbreviations.xlsx")# manually created a translation of all the acronyms in an accompanying file. removed inconsistencies in sample type abbreviations into one consistent schema between SGS and ALS labs############### Part C: Address spelling/format issues and inconsistent sample/site names ####################### Upon visual inspection of site names, we can see that the location names in the AQWMS template differ slightly from the place names in the SGS report (spelling and name inconsistencies).# 3/28/2022 - A note on addressing "Duplicate" designations. # In QA/QC data review in March 2022 the following was clarified through trial and error: we must make a careful distinction between "Field Duplicates" and "Lab duplicates" when preparing this data. The sample names contain info about whether a result is from a "Field Duplicate," e.g., two field collections made at the same location/day/time. However the ALS lab also created "Lab Duplicates," which are not from the same sites as field duplicates, and designates these as "DUP1" in the "sample_type" column.# See AQWMS Activity Type column assign distinctions# Decision - we will designate the field duplicates simply as a Field Duplicate# move info about duplicate sample and/or sample blank status into separate new column, "sample_condition"dat %<>% mutate(sample_condition = case_when( grepl("Method Blank",sample) ~ "Method Blank", grepl("Trip Blank",sample) ~ "Trip Blank", grepl("DUP",sample) ~ "Field Duplicate", grepl("Dup",sample) ~ "Field Duplicate")) # %>% # remove "DUP" designation from "sample" column # mutate(sample = str_replace(sample, "DUP|Dup", "")) #z <- dat %>% # filter(sample_condition == "Field Duplicate") %>% # select(sample, sample_condition)# remove from "sample" names the text containing the suffixes Diss/Dis (Dissolved metals sample) since we only want location info in this column. (Solution for this step was found at https://stackoverflow.com/questions/29271549/replace-all-occurrences-of-a-string-in-a-data-frame)dat %<>% mutate(sample = (str_replace(sample, "Diss|Dis|DUP|Dup",""))) %>% # remove "Diss" suffix and "EP" prefix from "analytical_method" column mutate(analytical_method = str_replace(analytical_method, "Diss", "")) %>% # note trailing space after "EP200.8 " mutate(analytical_method = str_replace(analytical_method,"EP200.8 ","200.8")) %>% # address the one stubborn site name still containing "Diss" mutate(sample = case_when( sample == "RM0-No Name Creek Diss" ~ "RM0-No Name Creek", TRUE ~ sample)) # Sample name clean up# We need to remove white spaces, apostrophes, and dashes; because join functions such as "left_join" are often uncooperative with these types of string characters. We will need to use joins with site names in next steps.dat %<>% # remove excess white spaces mutate(sample = str_trim(sample,"both")) %>% mutate(sample = str_squish(sample)) %>% # make remaining white spaces underscores mutate(sample = gsub("\\s+","_",sample)) %>% # remove apostrophes mutate(sample = gsub("\\'","",sample)) %>% # replace dashes with underscores mutate(sample = gsub("\\-","_",sample)) %>% # replace multiple underscores with single mutate(sample = gsub("\\__","_",sample)) %>% mutate(sample = gsub("\\___","_",sample)) %>% # again replace multiple underscores with single mutate(sample = gsub("\\__","_",sample)) # apply note regarding trip blanks (for BTEX organics)# assigned in sequence as encountered on chain of custodydat %<>% mutate(note = case_when( grepl("Trip_Blank_1", sample) ~ "KWF Crew, RM1.5_Kenai_City_Dock", grepl("Trip_Blank_2", sample) ~ "USFWS Crew, RM6.5_Cunningham_Park", grepl("Trip_Blank_3", sample) ~ "DEC Crew, RM40_Bings_Landing", grepl("Trip_Blank_4", sample) ~ "DEC Crew, RM43_Upstream_of_Dow_Island"))# seperate result qualifiers (U, J, B) in to a new column#sgs21 %<>% # account for fact that als data already has qualifier column and has characters in results column # mutate(qualifier = case_when( # result == "ND" ~ qualifier, # result != "ND" ~ str_extract(result,"[aA-zZ]+"))) %>%# mutate(result = str_remove(result,"[aA-zZ]+")) ############## Part D: Prepare SGS/ALS Location/Site Names ########################### NOTE: The SGS and ALS 2021 sample name results have a variety of misspelling and typos. For 2022, we should provide labs with a csv file of site names that they can use# In preparation for a join to AQWMS table, we will manually generate a match table csv file that we can use ## generate list of unique site names from 2021 SGS datasgs21_sitenames <- data.table(unique(dat$sample)) %>% arrange(V1)# generate list of unique site names from 2021 AQWMS template. These are the names we want in the final productaqwms21_sitenames <- read_excel("other/input/AQWMS/AWQMS_KWF_Baseline_2021.xlsx", sheet = "Monitoring Locations") %>% select("Monitoring Location Name", "Monitoring Location ID") %>% distinct()# write 2021 sgs site names to an excel filesite_match_table_path <- "other/input/AQWMS/sgs_site_names_matching_table.xlsx"write.xlsx(sgs21_sitenames, site_match_table_path) # create an excel file with two sheets: a.) SGS site names, and b.) AQWMS site nameswb <- loadWorkbook(site_match_table_path)addWorksheet(wb,"Sheet2")writeData(wb,"Sheet2",aqwms21_sitenames)saveWorkbook(wb,site_match_table_path,overwrite = TRUE)# Using these two tables, we will manually create a new file titled "sgs_site_names_matching_table_manual_edit.xlsx" and manually match up the two disparate naming systems. # Site name matching performed manually by B Meyer, March 18, 2022.# append "Monitoring Location Name" and "Monitoring Location ID" info from WQX to spring 2021 SGS data## read in site names join tablesitenames21_match <- read_excel("other/input/AQWMS/sgs_site_names_matching_table_manual_edit.xlsx") %>% select(`Monitoring Location Name`,`Monitoring Location ID`,sgs_sitenames) %>% rename(sample = sgs_sitenames) %>% filter(!is.na(`Monitoring Location ID`))# append monitoring location namesdat %<>% left_join(sitenames21_match, by = "sample") %>% clean_names()# remove extraneous dataframesrm(sgs21_sitenames,aqwms21_sitenames,sitenames21_match)######################## Part E: "Result Analytical Method Context" name rectification ####################### In the AQWMS template, the EPA names for chemical analyses that will go in the column "Result Analytical Method ID" do not exactly match the names provided by the laboratory (SGS). After communicating with SGS and ADEC on 2/8/2022, we are able to cross-walk between the two naming systems. These matches are documented in the excel file "analysis_code_matching_table.xlsx."# assign "Result Analytical Method ID" and "Result Analytical Method Context" to dataset using matching table# read in matching tableanalysis_code_matching_table <- read_excel("other/input/AQWMS/analysis_code_matching_table.xlsx") %>% select(-Comments,-`EPA Name`) %>% clean_names() %>% rename(analytical_method = sgs_analysis_code) %>% # remove "EP" prefix from method "EP200.8" mutate(analytical_method = str_replace(analytical_method,"EP200.8","200.8"))# read in AQWMS Analytical Methods listaqwms_analytical_methods <- read_excel("other/input/AQWMS/AWQMS_KWF_Baseline_2021.xlsx", sheet = "Analytical Methods") %>% select("ID","Context Code") %>% clean_names() %>% rename(epa_analysis_id = id) %>% distinct()# join two tables aboveepa_analysis_codes <- inner_join(aqwms_analytical_methods,analysis_code_matching_table, by = "epa_analysis_id") %>% filter(!context_code %in% c("USEPA Rev 5.4", "APHA (1997)", "APHA (1999)")) # join EPA analysis IDs and context codes to overall datasetdat %<>% mutate(analytical_method = str_replace(analytical_method,"EP200.8","200.8")) %>% left_join(epa_analysis_codes, by = "analytical_method") # remove unneeded dfsrm(analysis_code_matching_table,aqwms_analytical_methods,epa_analysis_codes)``````{r message = FALSE, echo = F, include = F}#| warning: false#| message: false########################## Miscellaneous Steps for SGS values #################################################### Address Non-Detect values ########################## Note on non-detect values# Non-detect values should be left blank. A non-detect does not necessarily mean there was a zero observation of an analyte in a sample, it could be just be present at a level lower than the method detection level (lower than what the lab equipment can detect). # Instead of putting 0 in the results, we’ll leave it blank. The Practical Quantitation Limit (or Limit of Quantitation) is presented alongside the result in each row. When DEC evaluates a waterbody, they’ll use ½ the PQL as a stand-in for a non-detect. # See explanatory document at "other/documents/references/SGS DL, LOD, LOQ Interpretation.pdf" for more details.# modify non-detect values from "0" to "NA" if resultflag = U or NDdat %<>% mutate(result1 = na_if(result,0)) %>% select(-result) %>% rename(result = result1)###### Segregate laboratory QA/QC data from field data ######### These lab-only data will be evaluated at a later step of QA/QC evaluation. See excel file "other/input/AQWMS/sample_type_abbreviations_manual_edit.xlsx" for sample_type naming schema.# Within this data evaluation, we will create two dataframes. One containing the full output of both lab QA data and field data, and the other containing field data only.sgs21_als21_qaqc_dat <- dat %>% # retain only results not from field sampling program (project samples and trip blanks) # also filter out hydrocarbon surrogate results ("surr"). Surrogate standards are compounds spiked into all samples, blanks, Laboratory Control Samples, and matrix spikes to monitor the efficacy of sample extraction, chromatographic, and calibration systems. They do not represent environmental observations. filter(!sample_type %in% c("PS","SMPL","TB") | grepl("(surr)",analyte))write.csv(sgs21_als21_qaqc_dat, "other/output/lab_qaqc_data/2021_lab_qaqc_data/sgs21_als21_qaqc_dat.csv")# for this dataframe, retain only non-field sample results for AQWMS exportdat %<>% filter(sample_type %in% c("PS","SMPL","TB")) %>% filter(!grepl("(surr)",analyte))rm(sgs21_als21_qaqc_dat)```<br>##### 2021 Fecal Coliform Lab Results (Soldotna Wastewater Treatment Plant (SWWTP)/Taurianen Engineering)```{r, echo = F}xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF Fecal 05-11-21.xls', text = "Download Original Spring 2021 Fecal Coliform Lab Results from SWWTP")``````{r, echo = F}xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/Taurianen/FecalColiform_Results_Summer2021.pdf', text = "Download Original Summer 2021 Fecal Coliform Lab Results from Taurianen")``````{r, echo = F}xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/SWWTP_Spring_2021_TCC_FC.jpg', text = "Download Spring 2021 Fecal Coliform Chain of Custody")``````{r, echo = F}xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/Taurianen/FC_CoC_Taurianen_Summer2021.pdf', text = "Download Summer 2021 Fecal Coliform Chain of Custody")``````{r include = F}################################################################################################################################################# Read in and Clean SWWTP / Taurianen FC Data ################################################################################################################################################################ Part A: SWWTP FC Data Read In ##################################################swwtp_spring21 <- read_excel("other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF Fecal 05-11-21.xls", skip = 11) %>% clean_names() %>%## fix site naming and terminology# move info about duplicate sample and/or sample blank status into separate column # sample type abbreviations mutate(sample_type = case_when( grepl("BLANK",sample_location_rm) ~ "MB", # method blank grepl("POSITIVE",sample_location_rm) ~ "LCS")) %>% # laboratory control sample # assign all other samples as "PS" (project sample) mutate_at(vars(sample_type),~replace_na(.,"PS")) %>% # field dup designation mutate(sample_condition = case_when( grepl("DUP",sample_location_rm) ~ "Field Duplicate")) %>% # remove "BLANK", and "POSITIVE designation from sample_location column mutate(sample_location_rm = (str_replace(sample_location_rm, "BLANK|POSITIVE", ""))) # remove "DUP" from site name column and trim white spaces in site name columnswwtp_spring21 %<>% mutate(sample_location_rm = str_remove(sample_location_rm,"DUP")) %>% mutate(sample_location_rm = str_trim(sample_location_rm,"right"))# address different site naming systems# use manually generated matching table# read in matching table and matchswwtp_spring21_site_matching <- read_excel("other/input/AQWMS/swwtp_site_names_matching_table_manual_edit.xlsx") # join# modify join table to contain "sample" column w/ format RM_0_No_Name_Creekswwtp_spring21 %<>% full_join(swwtp_spring21_site_matching) %>% select(-sample_location_rm)rm(swwtp_spring21_site_matching)## fix lab analysis times and datesswwtp_spring21 %<>% # lab processing time/date mutate(analysis_time_in = as_hms(time_in), analysis_date_in = mdy(spring21_sample_date), analysis_time_out = as_hms(time_out), # see file "other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF Fecal 05-11-21.xls for out analysis date analysis_date_out = ymd("2021-05-12")) %>% select(-time_in,-time_out) %>% transform(time_sampled = as_hms(time_sampled)) %>% # field sample date and time mutate(time_sampled = as_hms(time_sampled), sample_date = mdy(spring21_sample_date))## assign time/date received at lab. info from chain of custodyswwtp_spring21_rec_time <- "13:31:00"swwtp_spring21_rec_date <- "2021-05-11"swwtp_spring21 %<>% mutate(rec_date = ymd(swwtp_spring21_rec_date), rec_time = as_hms(swwtp_spring21_rec_time))## rename existing column names and create new ones to match sgs21 data format at end of prior code chunkswwtp_spring21 %<>% rename(lab_sample = dish_number, result = colony_count_100m_l, collect_time = time_sampled, run_time = analysis_time_in, run_date = analysis_date_in, # = analysis_time_out, # = analysis_date_out, collect_date = sample_date) %>% mutate(note = paste0("Lab analysis volume = ",ml," mL"), matrix = "Water (Surface, Eff., Ground)", analytical_method = "9222 D ~ Membrane filtration test for fecal coliforms", analyte = "Fecal Coliform", units = "cfu/100ml", # reporting limit ("loq") value from 2019 QAPP, pg 17 loq = 1.0, lab_name = "Soldotna Wastewater Treatment Plant, Soldotna, Alaska", units = "cfu/100ml", epa_analysis_id = "9222D", context_code = "APHA", analyst = "AW") %>% clean_names() %>% select(-ml,-colony_count) %>% # transform to prep for bind with sgs21 transform(lab_sample = as.character(lab_sample), result = as.double(result)) %>% # apply correction to the one "TNTC" result (Too Numerous To Count), since we can't have characters and integers in same column mutate(note = case_when( lab_sample == "30" ~ paste("Lab analysis volume = 0.5 mL, result = TNTC"), TRUE ~ note))########### Address Non-Detect values in SWWTP Fecal Coliform Data ########################## Non-detect values should be left blank. A non-detect does not necessarily mean there was a zero observation of an analyte in a sample, it could be just be present at a level lower than the method detection level (lower than what the lab equipment can detect). Instead of 0, we’ll leave it blank. The Practical Quantitation Limit (or Limit of Quantitation) is presented alongside the result. When DEC evaluates a waterbody, they’ll use ½ the PQL as a stand-in for a non-detect. See explanatory document at "other/documents/references/SGS DL, LOD, LOQ Interpretation.pdf" for more details.# Assign a resultflag column. Use "=" if result > 1.0 cfu and "U" if result < 1.0 cfu. See pg 20 of 2023 QAPP at "other/documents/QAPP/qapp_draft_v4.2.pdf"swwtp_spring21 %<>% mutate(resultflag = case_when( result < 1 ~ "U", TRUE ~ "="))# modify non-detect values from "0" to "NA" if resultflag = U or NDswwtp_spring21 %<>% mutate(result1 = na_if(result,0)) %>% select(-result) %>% rename(result = result1)# segregate lab results from field results, and write lab qa/qc results to external csvswwtp_spring21_qaqc_dat <- swwtp_spring21 %>% filter(sample_type %in% c("MB","LCS"))write.csv(swwtp_spring21_qaqc_dat, "other/output/lab_qaqc_data/2021_lab_qaqc_data/swwtp_spring21_fc_qaqc_dat.csv", row.names = F)swwtp_spring21 %<>% filter(!sample_type %in% c("MB","LCS"))## join SGS 2021 data to Spring 2021 Fecal Coliform data from SWWTPdat <- bind_rows(dat,swwtp_spring21) %>% select(-location)rm(swwtp_spring21)## NOTE - after this join, there is no content in the "sample" column for FC samples. This column does not exist## If we need this column to exist, do a table_join similar to what was done for the SGS and ALS data. May not need it though, since we have monitoring location name & monitoring location id columns already.``````{r}########################### Part B: Taurianen FC Data Read In (Summer 2021) ##############################################taur_summer21_rec_date <-"2021-07-27"taur_summer21_rec_time <-"13:37:00"## read in taurianen summer 2021 resultstaur_summer21 <-read_excel("other/input/2021_wqx_data/summer_2021_wqx_data/Taurianen/Fecal_Coliform_Results_Spreadsheet.xlsx", skip =3) %>%clean_names() %>%select(-qc1,-data_entry,-qc2) %>%## move info about duplicate sample and/or sample blank status into separate columnmutate(sample_condition =case_when(grepl("DUP",sample_location) ~"Field Duplicate")) %>%# remove "DUP" designation from sample_location columnmutate(sample_location = (str_replace(sample_location, "_DUP", ""))) %>%# trim white spaces in site name columnmutate(sample_location =str_trim(sample_location,"right")) %>%## add known info about times/dates, correct formats and column namesmutate(collect_date =mdy(summer21_sample_date),run_date =mdy(summer21_sample_date),run_time =as_hms(time_relinquished),analysis_date_out =mdy("7/28/2021"),analysis_time_out =as_hms(time_tested), # time/date received at lab from chain of custodyrec_date =ymd(taur_summer21_rec_date),rec_time =as_hms(taur_summer21_rec_time),# drop old columns.keep ="unused") %>%select(-date_of_testing,-neg_pos) %>%transform(time_sampled =as_hms(time_sampled)) %>%## add lab namemutate(lab_name ="Taurianen Engineering and Testing, Soldotna, Alaska") %>%# rename columnsrename(sample = sample_location,collect_time = time_sampled)# NOTE: for Taurianan QA/QC practices, see email from from Taurianen at "other/documents/references/Taurianen QA Technique (Email march 2022).docx" (folder in this project repo)## fix site naming and terminology# generate spreadsheet of unique site names from taurianen dataset taur_summer21_sites <-data.frame(unique(taur_summer21$sample)) %>%rename(sample = unique.taur_summer21.sample.)# export site names list to spreadsheetwrite.xlsx(taur_summer21_sites, "other/input/AQWMS/taurianen_site_names_matching_table.xlsx")# manually edit a new spreadsheet such that taurianen site names are paired iwth AWQMS site names# read in manually edited site names sheettaur_summer21_sites <-read_excel("other/input/AQWMS/taurianen_site_names_matching_table_manual_edit.xlsx") # join AWQMS site names to taurianen datataur_summer21 <-left_join(taur_summer21,taur_summer21_sites,by ="sample") ## add and/or rename other columns to match SWWTP dataframe structuretaur_summer21 %<>%clean_names() %>%select(-direct_count) %>%rename(result = number_of_colonies) %>%mutate(note ="",matrix ="Water",analytical_method ="9222 D ~ Membrane filtration test for fecal coliforms",analyte ="Fecal Coliform", units ="cfu/100ml",# loq = reporting limitloq =1,epa_analysis_id ="9222D",context_code ="APHA") %>%transform(result =as.double(result))# assign "sample_type"taur_summer21 %<>%mutate(sample_type =case_when( sample_condition =="Lab Blank"~"MB", # method blank sample_condition =="Positive Control"~"LCS", # laboratory control sampleTRUE~"PS" ))########### Address Non-Detect values ########################## Non-detect values should be left blank. A non-detect does not necessarily mean there was a zero observation of an analyte in a sample, it could be just be present at a level lower than the method detection level (lower than what the lab equipment can detect). Instead of 0, we’ll leave it blank. The Practical Quantitation Limit (or Limit of Quantitation) is presented alongside the result. When DEC evaluates a waterbody, they’ll use ½ the PQL as a stand-in for a non-detect. See explanatory document at other/documents/references/SGS DL, LOD, LOQ Interpretation.pdf for more details.# Assign resultflag column. Use "=" if result > 1 and "U" if result < 1. See pg 17 of 2020 QAPP at "other/documents/QAPP/QAPP-2020-KenaiRiverWatershed_ZnCu.pdf"taur_summer21 %<>%mutate(resultflag =case_when( result <1~"U",TRUE~"="))# modify non-detect values from "0" to "NA" if resultflag = U or NDtaur_summer21 %<>%mutate(result1 =na_if(result,0)) %>%select(-result) %>%rename(result = result1)# segregate lab results from field results, and write lab qa/qc results to external csvtaur_summer21_qaqc_dat <- taur_summer21 %>%filter(sample_type %in%c("MB","LCS"))write.csv(taur_summer21_qaqc_dat, "other/output/lab_qaqc_data/2021_lab_qaqc_data/taur_summer21_qaqc_dat.csv", row.names = F)# join 2021 Taurianen Fecal Coliform data into overall dataframe so fardat <-bind_rows(dat,taur_summer21)rm(taur_summer21,taur_summer21_sites,taur_summer21_qaqc_dat,swwtp_spring21_qaqc_dat)```<br>##### 2021 Total Suspended Solids Lab Results (Soldotna Wastewater Treatment Plant (SWWTP))```{r, echo = F}xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF TSS MONITORING 05-11-21.xlsx', text = "Download Original Spring 2021 Total Suspended Solids Results from SWWTP.xlsx")``````{r, echo = F}xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/SWWTP/KRWF TSS MONITORING 07-28-21.xlsx', text = "Download Original Summer 2021 Total Suspended Solids Results from SWWTP.xlsx")``````{r, echo = F}xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/SWWTP_Spring_2021_TCC_FC.jpg', text = "Download Spring 2021 Total Suspended Solids Chain of Custody")``````{r, echo = F}xfun::embed_file('other/input/2021_wqx_data/summer_2021_wqx_data/SWWTP/TSS_CoC_SWWTP_Summer2021.pdf', text = "Download Summer 2021 Total Suspended Solids Chain of Custody")```<br>```{r include = F}# SWWTP Spring 2021 TSS data## Reformat TSS data to match AQWMS template# read inswwtp_tss_spring21 <- read_excel('other/input/2021_wqx_data/spring_2021_wqx_data/SWWTP/KRWF TSS MONITORING 05-11-21.xlsx', skip = 1, sheet = "Updated_Formatting") %>% clean_names() %>% transform(date_of_analysis = anydate(date_of_analysis)) %>% # add info from lab COC mutate(rec_date = ymd_hms("2021-05-11 14:00:00"))swwtp_tss_summer21 <- read_excel('other/input/2021_wqx_data/summer_2021_wqx_data/SWWTP/KRWF TSS MONITORING 07-28-21.xlsx', skip = 1, sheet = "Updated_Formatting") %>% clean_names() %>% transform(sample_time = anytime(sample_time)) %>% # add info from lab COC mutate(rec_date = ymd_hms("2021-07-27 14:00:00"))# combine spring & summerswwtp_tss21 <- bind_rows(swwtp_tss_spring21,swwtp_tss_summer21) %>% remove_empty() rm(swwtp_tss_spring21,swwtp_tss_summer21)# prepare and format to match larger dataset## miscellaneous stepsswwtp_tss21 %<>% select(-qc1,-data_entry,-x8) %>% rename(analysis_time = time) %>% transform(sample_time = as_hms(sample_time), analysis_time = as_hms(analysis_time)) %>% # move info about duplicate sample and/or sample blank status into separate column mutate(sample_condition = case_when( grepl("DUP",sample_location) ~ "Field Duplicate")) %>% # remove "DUP" designation from locations column mutate(sample_location = str_replace(sample_location, "_DUP", "")) %>% # replace "O" with zeros in location column mutate(sample_location = str_replace(sample_location, "RM_O", "RM_0")) %>% # add units of suspended solids mutate(units = "mg/l") %>% rename(result = s_s_mg_l) %>% transform(result = as.numeric(result)) %>% # add info about EPA analysis type from AWQMS template mutate(epa_analysis_id = "2540-D", analytical_method = "SM21-2540-+D", context_code = "APHA", note = "") %>% # remove tare and paper weight values select(-dried_wt,-paper_wt,-tare_wt_kg, -ml) %>% # modify date/time formats mutate(collect_date = as.character(paste(field_sample_date,sample_time)), run_date_time = as.character(paste(date_of_analysis,analysis_time)), .keep = "unused") %>% mutate(collect_time = as_hms(as.POSIXct(collect_date))) %>% mutate(collect_date = date(as.POSIXct(collect_date)), run_time = as_hms(ymd_hms(run_date_time)), run_date = date(ymd_hms(run_date_time)),.keep = "unused") %>% # rename rename(analyst = signature) %>% # miscellaneous mutate(lab_sample = "", matrix = "Water", analyte = "Total suspended solids", # loq = reporting limit loq = 1.0, # lod = sensitivity, or method detection limit lod = 0.31, lab_name = "Soldotna Wastewater Treatment Plant, Soldotna, Alaska")# assign "sample_type"swwtp_tss21 %<>% mutate(sample_type = case_when( sample_condition == "Lab Blank" ~ "MB", # method blank sample_condition == "Positive Control" ~ "LCS", # laboratory control sample TRUE ~ "PS" ))# get site names consistent with AWQMS formatswwtp_tss_sitenames <- data.frame(unique(swwtp_tss21$sample_location))# delete existing csv if presentunlink("other/input/AQWMS/swwtp_tss_sitenames.csv")# export csv of swwtp_tss site nameswrite.csv(swwtp_tss_sitenames,"other/input/AQWMS/swwtp_tss_sitenames.csv",row.names = F)# use this list to create manually edited file, matched to AWQMS template names# read in manually edited fileswwtp_tss_sitenames <- read_excel("other/input/AQWMS/swwtp_tss_site_names_matching_table_manual_edit.xlsx")# join correct site names to overall 2021 TSS datasetswwtp_tss21 <- left_join(swwtp_tss21,swwtp_tss_sitenames) %>% clean_names() %>% rename(sample = sample_location)########### Address Non-Detect values ########################## Non-detect values should be left blank. A non-detect does not necessarily mean there was a zero observation of an analyte in a sample, it could be just be present at a level lower than the method detection level (lower than what the lab equipment can detect). Instead of 0, we’ll leave it blank. The Practical Quantitation Limit (or Limit of Quantitation) is presented alongside the result. When DEC evaluates a waterbody, they’ll use ½ the PQL as a stand-in for a non-detect. See explanatory document at other/documents/references/SGS DL, LOD, LOQ Interpretation.pdf for more details.# Assign resultflag column. Use "=" if result > 1 and "U" if result < 1. See pg 20 of 2023 QAPP at "other/documents/QAPP/qapp_draft_v4.2.pdf"swwtp_tss21 %<>% mutate(resultflag = case_when( result < 1 & result > 0.31 ~ "J", result < 0.31 ~ "U", TRUE ~ "="))# modify non-detect values from "0" to "NA" if resultflag = U or NDswwtp_tss21 %<>% mutate(result1 = na_if(result,0)) %>% select(-result) %>% rename(result = result1)# in future scripts need to ensure that non-detect "0" results (resultflag = "U") are shown as "NA" rather than zero.###### In this next step is where we would normally export lab QA/QC sample data as a seperate csv. However, I have discovered that at least for 2021-2022 this lab QA data was not reported by SWWTP for Total Suspended Solids######## The SWWTP TSS Standard Opeating Procedure at "other\documents\references". The SOP does not describe the the kind of QA we need for baseline, as described in our QAPP####### The QAPP specifies the need for the following QA measurements for TSS: lab blank, lab duplicate sample, external QC check sample. These QA results were not reported in 2021 and 2022, likely not produced either. Check previous years####### The SWWTP SOP for TSS specifies the following for QA practices: "Every year the Soldotna WWTP must participate in the DMR QA as part of our NPDES permit. Part of the DMR QA test set is the TSS PE sample. This test must be performed and we must pass the sample to be able to report data for the year." Further details on the DMR QA test at https://www.epa.gov/compliance/discharge-monitoring-report-quality-assurance-study-program####### Bottom line: need to re-instate recommended QA for TSS for 2023. This topic has been added to the agenda of the technical advisory committee meeting March 20th, 2023.####### Update code here for 2023 and/or other previous years 2014-2020# segregate lab results from field results, and write lab qa/qc results to external csv#swwtp_tss21_qaqc_dat <- swwtp_tss21 %>%# filter(sample_type %in% c("MB","LCS"))#write.csv(swwtp_tss21_qaqc_dat, "other/output/lab_qaqc_data/2021_lab_qaqc_data/swwtp_tss21_qaqc_dat.csv", row.names = F)#rm(swwtp_tss21_qaqc_dat)# join TSS data with overall datasetdat <- bind_rows(dat,swwtp_tss21)rm(swwtp_tss_spring21,swwtp_tss_sitenames,swwtp_tss21)# 2/14/23 go back to FC section and see if sample name is already bad up there# (unnecessary??)``````{r include = F}############### Miscellaneous Steps for Overall Field Results Dataframe ############################## a.) filter out lab blanks and positive lab controls ###################dat %<>%# filter(!sample_condition %in% c("Lab Blank","Positive Control")) # should already be gone now 3/4/22############ b.) match latitude and longitude coordinates to sites #################### read in coordinatessite_coords <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "site_coordinates") %>% remove_empty() ## join coords to overall dfdat <- left_join(dat,site_coords)############ c.) assign "result sample fraction" (e.g. filtered, dissolved, etc.) ############## read in manually organized table that pairs "result sample fraction" with "analytical method"result_sample_fraction <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "result_sample_fraction") %>% filter(!is.na(analytical_method)) %>% select(-description)## join to tabledat <- left_join(dat,result_sample_fraction)################ d.) assign "result detection condition"################ read in manually assigned join table for "result detection condition"result_detection_condition <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "result_detection_condition") %>% clean_names() %>% filter(!is.na(resultflag))## join to table## z <- dat %>% left_join(dat,result_detection_condition, by = "resultflag")## the attempt at left_join above is exhibiting nonsensical result for unclear reason. For now, define programmatically instead:dat %<>% mutate(result_detection_condition = case_when( resultflag == "U" | resultflag == "ND" ~ "Not Detected", resultflag == "J" ~ "Present Below Quantification Limit"))############### e.) assign chemical preservative type ################## read in tablechemical_preservative <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "chemical_preservative") %>% filter(!is.na(preservative)) %>% select(-description)## join to overall dataframedat <- left_join(dat,chemical_preservative)############# f.) assign bottle type and color ######################## read in external tablebottle_type_color <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "sample_container_type_color") %>% select(-description) %>% filter(!is.na(sample_container_type))## join to overall dataframedat <- left_join(dat,bottle_type_color)rm(site_coords, result_sample_fraction, result_detection_condition, chemical_preservative, bottle_type_color)############ g.) assign "Statistical Base Code" column ############### this value is not applicable to most results here, usually left blank. will assign "count": to fecal coliform thoughdat %<>% mutate(stat_base_code = case_when( analyte == "Fecal Coliform" ~ "Count" ))########### f.) "Activity ID" code shortening ####################### The Activity ID column consists of concatenating several columns (`Monitoring Location ID`,"-",collect_date,"-",analyte,"-", sample_condition). This field is permitted to be no longer than 55 characters. With full names of analytes included, the entries sometimes exceed 55 characters. # To address this issue, we will do two things: ## a.) use analyte abbreviations: for single elements, we will use their periodic table abbreviation. TBD for other analytes.## b.) use abbreviations for sample_condition (designations of field blank or field dup)## a.) analyte abbreviations# export list of unique analytes from 2021 datawrite.csv(data.frame(unique(dat$analyte)),"other/input/AQWMS/analytes_list.csv", row.names = F)# manually assign abbreviations for each analyte in the sister file "analytes_list_manual_edit.csv"# re-import edited list of analyte abbreviation names, then append to dataframe. Use in later step when creating Activity IDs.analyte_abbrev <- read.csv("other/input/AQWMS/analytes_list_manual_edit.csv") colnames(analyte_abbrev) <- c("analyte","analyte_abbreviation")analyte_abbrev %<>% select(analyte, analyte_abbreviation)dat %<>% left_join(analyte_abbrev)``````{r include = F}## b.) sample_condition abbreviationsdat %<>% mutate(sample_condition_abbrv = case_when( sample_condition == "Field Duplicate" ~ "DUP", sample_condition == "Trip Blank" ~ "Blank")) # temporarily reduce DF width to visually examine errors described below; see "temp" file in AQWMS folder#%>% select(sample,collect_date,sample_type,sample_condition,sample_condition_abbrv,analyte,lab_name,result)# write temporary inspection file#write.csv(z, "other/input/AQWMS/temp.csv")########################## TO DO# issues to resolve:# some samples from 5/11/2021 no name creek have "PS" for sample type but have "Field Duplicate" for sample_condition. need to fix. # 3/16/23:: this is a correct description; field duplicates ARE project samples# also need to disambiguate "PS" from "SMPL" (the latter is the ALS lab designation) (exploring around line 474 to diagnose)## Note: at this stage there are lots of format issues with conventions in the "sample" column, but this column is not retained in the final data export so it is not an urgent issue. For reference, these issues include:# inconsistency in sample_name formats too (with _)?# a few FC results have no sample name (possibly these are method blanks)# a few TSS results have only partial sample names# Rm10.1_Kenai_River has lowercase# some No Name Creek project sample names have a trailing _# steps to rectify:### 1.) rename SMPL Ca/Mg results as PS in line 1045. (Do so here rather than later on so as not to interfere with other previous global edits/mutates that may refer to "SMPL")### 2.) ``````{r include = F, eval = F}############ g.) prepare final format ################################ create column structure from example in AQWMS template. Use existing input from SGS results if applicable, specify value from "Permitted Values" tab if SGS input not applicable or not yet specified# 6/21/23:: check to see if format will still work for EPA WQX template# 10/12/2023 - there might not be a point to doing 'all' of this formatting at this stage, if we don;t have it flagged and ready for export yet# 12/4/2023 - confirmed that it was unnecessary to have the column names formatted with capital letters & spaces at this stage.# Current task - take the dataframe at the end of this long script, make so column names/structure conforms to the ADEC example. In the QA/QC process, it went through the janitor::clean_names process, thus names need to be fixed again at the end.dat %<>% # Proceeding left to right across columns of AWQMS template # Mutate new column or rename existing column as needed mutate( `Monitoring Location ID` = monitoring_location_id, `Activity Media Name` = "Water", `Activity Media Subdivision Name` = "Surface Water", # create activity ID name conditionally if condition(s) present `Activity ID` = case_when( is.na(sample_condition) ~ paste0(`Monitoring Location ID`,"-",collect_date,"-",analyte_abbreviation), !is.na(sample_condition) ~ paste0(`Monitoring Location ID`,"-",collect_date,"-",analyte_abbreviation,"-",sample_condition_abbrv)), `Activity Start Date` = collect_date, `Activity Start Time` = collect_time, `Activity End Date` = "", `Activity End Time` = "", `Activity Latitude` = latitude, `Activity Longitude` = longitude, `Activity Source Map Scale` = "", `Activity Type` = case_when( sample_condition == "Field Duplicate" ~ "Quality Control Field Replicate Msr/Obs", sample_condition == "Blank" ~ "Quality Control Sample-Trip Blank", sample_type == "TB" ~ "Quality Control Sample-Trip Blank", TRUE ~ "Field Msr/Obs"), # All samples are surface grab samples. Depths are assigned across the board here as 6 inches (~15 cm) `Activity Depth/Height Measure` = 15, `Activity Depth/Height Unit` = "cm", # Next three columns not applicable for surface grab samples `Activity Top Depth/Height Measure` = "", `Activity Top Depth/Height Unit` = "", `Activity Bottom Depth/Height Measure` = "", `Activity Bottom Depth/Height Unit` = "", `Activity Relative Depth Name` = "", `Activity Comment` = note, `Characteristic Name` = analyte, `Result Analytical Method ID` = epa_analysis_id, `Result Analytical Method Context` = context_code, `Method Speciation` = "", `Result Value` = result, `Result Unit` = units, `Result Qualifier` = resultflag, `Result Weight Basis` = "Sampled", `Statistical Base Code` = stat_base_code, `Result Sample Fraction` = result_sample_fraction, `Result Value Type` = "Actual", `Result Comment` = "", `Sample Collection Method ID` = "", `Equipment ID` = "Water Bottle", `Result Detection Condition` = result_detection_condition, `Result Detection Limit Type 1` = "Limit of Quantitation", `Result Detection Limit Value 1` = loq, `Result Detection Limit Unit 1` = units, # note: lod = "limit of detection"; equivalent to "method detection level" `Result Detection Limit Type 2` = "Method Detection Level", `Result Detection Limit Value 2` = lod, `Result Detection Limit Unit 2` = units, `Laboratory Accreditation Indicator` = "", `Laboratory Name` = lab_name, `Laboratory Sample ID` = lab_sample, `Analysis Start Date` = run_date, `Analysis Start Time` = run_time, `Biological Intent` = "", `Subject Taxonomic Name` = "", `Thermal Preservative` = "Cold packs", `Sample Container Type` = sample_container_type, `Sample Container Color` = sample_container_color, `Chemical Preservative` = preservative # remove columns that were mutated to a new name ,.keep = "unused")# Save a copy of the whole dataframe with all results and parameters prior to reducing it to just the column subset. We will use this later in QA/QC analyses.all_dat <- dat# next, for our AQWMS export, we want to retain just those columns listed in the AQWMS template# found solution to this problem here: https://gist.github.com/djhocking/62c76e63543ba9e94ebe# get all column names from AQWMS templateaqwms_colnames <- read_excel("other/input/AQWMS/AWQMS_KWF_Baseline_2021.xlsx", sheet = "KWF Baseline AWQMS Template") %>% colnames()# select from subset of column in aqwms templatedat %<>% select(one_of(aqwms_colnames))# export final formatted AQWMS results to external csvwrite.csv(dat,"other/output/aqwms_formatted_results/2021_kwf_baseline_results_aqwms.csv",row.names = F)```<br>### 2021 Provisional Results, Prior to QA/QC Review*Results last updated `r Sys.Date()`*The above data sources have been collated in to a single .csv file (available for download) into a format compatible with the EPA Water Quality Exchange. ***These data have not yet been evaluated against QA/QC standards following guidance in the current project Quality Assurance Project Plan.***```{r, echo = F}xfun::embed_file('other/output/aqwms_formatted_results/2021_kwf_baseline_results_aqwms.csv', text = "Download All Provisional 2021 Kenai River Baseline Water Quality Monitoring Results, Formatted for EPA WQX import")# check unique activity types#z <- read.csv("other/output/aqwms_formatted_results/2021_kwf_baseline_results_aqwms.csv")#z1 <- data.frame(unique(z$Characteristic.Name))```<br>------------------------------------------------------------------------### 2021 Data QA/QC EvaluationPrior to upload to the EPA WQX, all water quality data must be checked against a series of standard questions in order to evaluate how quality assurance / quality check (QA/QC) requirements are met. The draft Data Evaluation Checklist Template (available for download below) outlines these questions:```{r, echo = F}xfun::embed_file('other/documents/AQWMS_documents/Kenai_Baseline_Data_Evaluation_Checklist_20230331.xlsx', text = "Download 2021 Draft Kenai Baseline Data Evaluation Checklist Template")```#### Pre-Database##### Overall Project SuccessNote: the sequence of questions is structured such that data will be gradually flagged at each step based on the criteria described.**1.) Were the appropriate analytical methods used for all parameters?**Yes. Analytical methods from the approved 2020 QAPP were employed.<br>**2.) Were there any deviations from the sampling plan?**All sites were visited as planned on 5/11/2021 and 7/27/2021. Most intrinsic water quality parameters measured with instruments (pH, dissolved oxygen, conductivity, turbidity) were not measured, which is a deviation of from the QAPP.<br>**3.) Were field duplicates, blanks, and/or other QC samples collected as planned?**```{r echo = F, message = F}### First: get number of samples (results) actually collected from formatted, collated results created from the above code chunks### report for each parameter and datetotal_samples_collected_2021_summary_param <- read.csv('other/output/aqwms_formatted_results/2021_kwf_baseline_results_aqwms.csv') %>% clean_names() %>% group_by(result_analytical_method_id,characteristic_name,activity_start_date,activity_type) %>% count() %>% rename(actual_results_n = n) %>% transform(activity_start_date = ymd(activity_start_date))### Second: use kit request data to count "required" (planned) samples## read in SPRING 2021 planned samples from the packing listspring21_planned <- read_excel("other/input/2021_wqx_data/spring_2021_wqx_data/SGS/KRBWQM_SGS_bottle_order_Spring_2021.xlsx", sheet = "bottles_analyses_spring_2021") %>% mutate(activity_start_date = spring21_sample_date)## read in SUMMER 2021 planned samples from the packing listsummer21_planned <- read_excel("other/input/2021_wqx_data/summer_2021_wqx_data/SGS/KRBWQM_SGS_bottle_order_Summer_2021.xlsx", sheet = "bottles_analyses_summer_2021") %>% mutate(activity_start_date = summer21_sample_date)## combine spring and summer 2021 planned samplesplanned_samples_2021 <- bind_rows(spring21_planned,summer21_planned) %>%## prepare and summarise table samples collected and tests performed clean_names() %>% filter(item == "Bottle") %>% select(monitoring_location_id,site,container_size,quantity,preservative,analysis_1,analysis_2,activity_start_date) %>% fill(site, .direction = "down") %>% fill(monitoring_location_id, .direction = "down") %>% filter(site != "Spare Kit") %>% # make sample type names consistent mutate(sample_type = case_when( grepl("DUPLICATE",site, ignore.case = TRUE) ~ "FIELD DUPLICATE", grepl("blank",analysis_1, ignore.case = TRUE) ~ "TRIP BLANK" )) %>% pivot_longer(values_to = "analysis", cols = c("analysis_1","analysis_2")) %>% filter(!is.na(analysis)) %>% # consolidate BTEX analysis types ("blank" analysis is same as analysis regular samples; difference is only that no sample was collected for the trip blanks) # distinction for trip blanks (for BTEX) listed in "activity type" column mutate(analysis = str_replace(analysis,"624 - BTEX trip blank","624 - BTEX")) %>% # rename analysis type names to match between planned and actual mutate(analysis = case_when( analysis == "SM4500-NO3E - Nitrogen (Nitrate+Nitrite)" ~ "4500-NO3(F)", analysis == "SM4500-PE - Total Phosphorus" ~ "4500-P-E", analysis == "200.7 - Total Metals" ~ "200.7", analysis == "200.8 - Dissolved Metals" ~ "200.8", analysis == "1 liter Total Suspended Solids" ~ "2540-D", analysis == "Fecal Coliform" ~ "9222D", analysis == "624 - BTEX" ~ "8260D")) %>% # assign number of expected result types per analysis mutate(expected_results = case_when( analysis == "4500-NO3(F)" ~ 1, # Nitrate/Nitrite analysis == "4500-P-E" ~ 1, # total phosphorus analysis == "200.7" ~ 3, # total metals analysis == "200.8" ~ 6, # custom short list; note that lab accidentally ran full list of 27 analytes in 2021 analysis == "2540-D" ~ 1, # TSS analysis == "9222D" ~ 1, analysis == "8260D" ~ 6)) %>% # BTEX analyses mutate(activity_type = case_when( sample_type == "FIELD DUPLICATE" ~ "Field Duplicate", sample_type == "TRIP BLANK" ~ "Trip Blank", TRUE ~ "Field Msr/Obs")) %>% select(-sample_type)# clear intermediary dataframes steps# rm(spring21_planned,summer21_planned)# create summary table of PLANNED samplesplanned_samples_2021_summary <- planned_samples_2021 %>% group_by(analysis,expected_results,activity_start_date,activity_type) %>% summarise(expected_ct = sum(expected_results)) %>% ungroup() %>% select(-expected_results) %>% rename(expected_results_n = expected_ct, result_analytical_method_id = analysis) %>% transform(activity_start_date = mdy(activity_start_date), expected_results_n = as.numeric(expected_results_n)) %>% # replace naming conventions in "activity_type" column to match join dataframe mutate(activity_type = case_when( activity_type == "Field Duplicate" ~ "Quality Control Field Replicate Msr/Obs", activity_type == "Field Msr/Obs" ~ "Field Msr/Obs", activity_type == "Trip Blank" ~ "Quality Control Sample-Trip Blank" ))# BY SITE# create summary table of PLANNED samples BY SITEplanned_samples_2021_summary_by_site <- planned_samples_2021 %>% group_by(monitoring_location_id,analysis,expected_results,activity_start_date,activity_type) %>% summarise(expected_ct = sum(expected_results)) %>% ungroup() %>% select(-expected_results) %>% rename(expected_results_n = expected_ct, result_analytical_method_id = analysis) %>% transform(activity_start_date = mdy(activity_start_date), expected_results_n = as.numeric(expected_results_n)) %>% # replace naming conventions in "activity_type" column to match join dataframe mutate(activity_type = case_when( activity_type == "Field Duplicate" ~ "Quality Control Field Replicate Msr/Obs", activity_type == "Field Msr/Obs" ~ "Field Msr/Obs", activity_type == "Trip Blank" ~ "Quality Control Sample-Trip Blank" ))### create table of quantity of samples (results) actually collected; not grouped by parameter, but instead by analysis typetotal_samples_collected_2021_summary <- read.csv('other/output/aqwms_formatted_results/2021_kwf_baseline_results_aqwms.csv') %>% clean_names() %>% group_by(result_analytical_method_id,activity_start_date,activity_type) %>% count() %>% rename(actual_results_n = n) %>% transform(activity_start_date = ymd(activity_start_date)) # create table of planned vs actual ANALYSESplanned_actual_analyses_2021 <- left_join(planned_samples_2021_summary,total_samples_collected_2021_summary, by = c("result_analytical_method_id", "activity_start_date", "activity_type")) %>% mutate(pct_diff = ((actual_results_n - expected_results_n) / (actual_results_n + expected_results_n)) * 100)# write table to downloadwrite.csv(planned_actual_analyses_2021,"other/output/field_qa_qc_data/planned_actual_analyses_2021.csv", row.names = F)# TO DO::: once data that does not pass QC muster is flagged and removed, we will append an additional column(s) to the tables generated in there questions ``````{r, echo = F}xfun::embed_file('other/output/field_qa_qc_data/planned_actual_analyses_2021.csv', text = "Download Planned vs. Actual Analysis Results for 2021, Kenai Baseline")```<br>To see a table comparing planned vs actual results for 2021, download and view the csv file linked above.From the above table we can see that there are deviations between planned results and actual results available. These reasons for the deviations are known and are attributable to two causes:***Cause 1:*** The Spring 2021 Chain of Custody (COC) from KWF to SGS was completed erroneously. The COC specified for EPA Method 200.8 analyses (dissolved metals) to be complete for all sites (when they should have stopped upstream of Morgan's Landing RM31), and it also specified for EPA Method 200.7 analyses (total metals) to stop upstream of Morgan's Landing (when they should have been performed for all sites in the project).As a result, for Spring 2021 total metals data will be unavailable for sites upstream of the Morgan's Landing RM31 site.***Cause 2:*** For Summer 2021, the SGS laboratory performed the EPA Method 200.8 analyses (dissolved metals) for all 27 analytes available for the method; instead of just the smaller subset of analytes as requested. (E.g., KWF received extra data for free. In this case., there are no consequences of deviating from the planned analyses).<br>**4.) Do the laboratory reports provide results for all sites and parameters?**The laboratory reports provide results for all sites, and for all parameters, with the exceptions outlined above in question #3.<br>**5.) Is a copy of the Chain of Custody included with the laboratory reports?**We worked with three separate laboratories in 2021:- SGS Laboratories, Anchorage, AK - Chain of Custody documents are included within the PDF laboratory reports linked above earlier in this appendix.- Soldotna Wastewater Treatment Plant, Soldotna, AK - Chain of Custody documents are on file with Kenai Watershed Forum for fecal coliform and total suspended solids for 5/11/2021, and for total suspended solids on 7/27/2021.- Tauriainen Engineering & Testing, Soldotna, AK - An individual document for each sample reports the time and date of delivery and analysis for each sample. These documents are included with the PDF laboratory reports linked above earlier in this appendix.<br>**6.) Do the laboratory reports match the Chain of Custody and requested methods throughout?**The laboratory reports match the Chain of Custody and requested methods, with the one exception discussed in question #3.For summer 2021, the SGS laboratory performed the EPA Method 200.8 analyses for all 27 analytes available for the method; instead of just the smaller subset of analytes as requested. (E.g., KWF received extra data for free. In this case., there are no consequences of deviating from the planned analyses).<br>**7.) Are the number of samples on the laboratory reports the same as on the Chain of Custody?**The quantity of sample bottles sent to the laboratories matches the number of analyzed samples for samples collected and delivered on 5/11/2021 and 7/27/2021.<br>**8.) Was all supporting info provided in the laboratory report, such as reporting limits for all analyses and definitions?**We worked with three separate laboratories in 2021:- SGS Laboratories, Anchorage, AK - SGS provided data as PDFs which included reporting limits, as well as in the form of an Electronic Data Deliverable where this information is also included in column format.- Soldotna Wastewater Treatment Plant (SWWTP), Soldotna, AK - SWWTP provided data in the form of .xls files. Reporting limits are described in this project's current Quality Assurance Action Plan.- Tauriainen Engineering & Testing, Soldotna, AK - Tauriainen provided data in the form of PDF documents. Reporting limits are described in this project's current Quality Assurance Action Plan.<br>**9.) Are site names, dates, and times correct and as expected?**- *Yes, after post-season correction documented in this report.* Notes: In 2021 Kenai Watershed Forum used pre-printed waterproof labels on all sample bottles, reducing opportunity for field and lab transcription errors. Remaining site name transcription errors from laboratories were corrected in post-season data review, using scripted code that did not modify the original report.<br>**10.) Were there any issues with instrument calibration?**- Instruments to measure intrinsic water quality parameters (sondes; to measure pH, dissolved oxygen, conductivity, turbidity) were not employed in 2021.- Teams did use hand-held probes to record water temperature on-site. Prior to field use, the hand-held probes were verified as measuring within the accuracy level defined in the QAPP using an ice bath in the Kenai Watershed Forum laboratory.<br>**11.) Did the instruments perform as expected?**- The hand-held water temperature probes performed as expected in 2021.<br>**12.) Was instrument calibration performed according to the QAPP and instrument recommendations?**- Water temperature is a parameter that is "verified" rather than calibrated. The hand-held water temperature probes were verified as measuring within the accuracy level defined in the QAPP using an ice bath in the laboratory, according to instrument recommendations.<br>**13.) Was instrument verification during the field season performed according to the QAPP and instrument recommendations?**- The hand-held water temperature probes were verified as measuring within the accuracy level define in the QAPP using an ice bath in the laboratory, according to instrument recommendations.<br>**14.) Were instrument calibration verification logs or records kept?**- Yes. These records are held at Kenai Watershed Forum, 44129 Sterling Hwy, Soldotna, AK.<br>**15.) Do the instrument data files site IDs, time stamps and file names match?**- Instrument files were not employed in 2021. Measurements from hand held probes were recorded on waterproof paper field forms.<br>**16.) Is any insitu field data rejected and why?**- No insitu data is rejected from 5/11/2021 or 7/27/2021.<br>**17.) Were preservation, hold time and temperature requirements met?**- Yes. Summer and Spring 2021 holding time requirements were met for all samples. See downloadable files below. Laboratory result documents indicated no compromises of preservation and temperature requirements.```{r hold-times, echo = F, messages = F, warnings = F}# embed for downloadxfun::embed_file("other/input/AQWMS/sample_holding_times.csv", text = "Download Table of Maximum Holding Times for Each Sample Type")``````{r hold-times-1, echo = F, messages = F, warnings = F}# define columns to retaincol_names <- c( "sample", "epa_analysis_id", "analyte", "collect_date", "collect_time", "rec_date", "rec_time", "lab_name")# read in datatotal_samples_collected_2021_summary_param <- read.csv('other/output/aqwms_formatted_results/2021_kwf_baseline_results_aqwms.csv')# read in columns relevant to holding time calculationsholdtime_dat <- dat %>% select(all_of(col_names)) %>% # some observations have date/time in separate columns, while some have them in the same column. remedy this inconsistency mutate(rec_date1 = as.Date(rec_date), rec_time1 = case_when( !is.na(rec_time) ~ as_hms(rec_time), is.na(rec_time) ~ as_hms(rec_date))) %>% select(-rec_time,-rec_date) %>% rename(rec_time = rec_time1, rec_date = rec_date1) %>% # create single date/time columns for activity and lab receipt mutate(activity_datetime = ymd_hms(paste(collect_date,collect_time)), rec_datetime = ymd_hms(paste(rec_date,rec_time))) %>% # calculate actual holding time period in hours mutate(hold_time_hours = as.numeric(rec_datetime - activity_datetime)) %>% select(-collect_date,-collect_time,-rec_date,-rec_time) %>% mutate(analytical_method = str_trim(dat$analytical_method, side = "left"))# join the (manually created) dataframe with maximum allowed sample holding times# read in max holding timesmax_holding_times <- read.csv("other/input/AQWMS/sample_holding_times.csv") %>% transform(max_holding_time_hours = as.numeric(max_holding_time_hours)) # join holding time data to maximum holding times table# calculate Y/N column for pass/failholdtime_dat <- left_join(holdtime_dat,max_holding_times) %>% mutate(hold_time_pass = case_when( hold_time_hours > max_holding_time_hours ~ "N", TRUE ~ "Y"))``````{r echo = F}# save hold time calculation results as external csvwrite.csv(holdtime_dat,"other/output/field_qa_qc_data/holding_time_calcs.csv", row.names = F)# embed for downloadxfun::embed_file("other/output/field_qa_qc_data/holding_time_calcs.csv", text = "Download Holding Time Calculations for Spring and Summer 2021 Field Samples")```<br>**18.) Are dissolved metal quantities less than total metals quantities?**- Dissolved metals results are ***not*** consistently less than total metals results, as would be logically anticipated (see post from [Flowlink Environmental](https://www.flowlink.ca/post/2018/12/15/dissolved-total-metals-whats-the-difference-and-why-it-matters) website for description and rationale): - [**"Dissolved metal concentration** is determined by filtering a water sample through 0.45 uM filter. Water that passes through the filter is analyzed for metals and the result is reported as dissolved metal concentration. **Total metal concentration** is determined by analyzing the unfiltered sample.]{.underline}[**"Total metal concentration = Dissolved metal concentration + Particulate (insoluble) metal concentration.**]{.underline}" - In 2021, there are results available for both dissolved and total metals for three elements: Ca, Fe, and Mg; for sites at and downstream of RM 31. - Dissolved metals results are available for 7/27/2021 (summer) only, and not for 5/11/2021 (spring); see question #3 in this sequence.- Results: - Calcium: all available observations show dissolved metals \> total metals - Iron: all available observations show dissolved metals \< total metals - Magnesium: nearly all available observations show dissolved metals \< total metals, with the exception of three sites, all near the Kenai River mouth```{r echo = F, message = F, warning = F}# define elements to comparediss_total_analytes <- c("Calcium, Total","Iron, Total","Magnesium, Total", "Calcium","Iron","Magnesium")# prep dataframediss_total_metals_dat <- dat %>% select(analyte, collect_date, collect_time, monitoring_location_name, result, units, result_detection_condition, sample_condition) %>% filter(analyte %in% diss_total_analytes) %>% # make units consistent between different labels ("mg/L", "ug/L", "mg/l") mutate(result = case_when( units == "mg/L" | units == "mg/l" ~ result*1000, units == "ug/L" ~ result)) %>% mutate(units = str_replace(units,"mg/L","ug/L"))# choose location to save csvsdir <- "other/output/field_qa_qc_data/metals_total_diss/"# calciumca <- diss_total_metals_dat %>% filter(analyte %in% c("Calcium","Calcium, Total")) %>% pivot_wider(names_from = "analyte", values_from = "result") %>% mutate(particulate_conc = `Calcium, Total` - Calcium) %>% # mutate column to designate if total metals > dissolved metals mutate(total_greater_diss = ifelse(particulate_conc > 0,"Y","N"))ca_dir <- paste0(dir,"ca",".csv")write.csv(ca,ca_dir)xfun::embed_file(paste0(dir,"ca",".csv"), text = "Download 2021 Calcium data, total vs dissolved")# ironfe <- diss_total_metals_dat %>% filter(analyte %in% c("Iron","Iron, Total")) %>% pivot_wider(names_from = "analyte", values_from = "result") %>% mutate(particulate_conc = `Iron, Total` - Iron) %>% # mutate column to designate if total metals > dissolved metals mutate(total_greater_diss = ifelse(particulate_conc > 0,"Y","N"))fe_dir <- paste0(dir,"fe",".csv")write.csv(fe,fe_dir)xfun::embed_file(paste0(dir,"fe",".csv"), text = "Download 2021 Iron data, total vs dissolved")# magnesiummg <- diss_total_metals_dat %>% filter(analyte %in% c("Magnesium","Magnesium, Total")) %>% pivot_wider(names_from = "analyte", values_from = "result") %>% mutate(particulate_conc = `Magnesium, Total` - Magnesium) %>% # mutate column to designate if total metals > dissolved metals mutate(total_greater_diss = ifelse(particulate_conc > 0,"Y","N"))mg_dir <- paste0(dir,"mg",".csv")write.csv(mg,mg_dir)xfun::embed_file(paste0(dir,"mg",".csv"), text = "Download 2021 Magnesium data, total vs dissolved")rm(ca,fe,mg)# NOTE: make the above 3 dataframe results into a function! Note that in 2023 we are getting total metals for Cu and Zn as well...!```<br>As of Fall 2023 it is unclear why the values measured for total metals is greater than dissolved metals for the cases described above.See discussion XX in "Summary" section of this report for plans in future years to address the issue of dissolved metals values being greater than total metals values in some cases.<br>**19.) Are the duplicate sample(s) RPD (Relative Percent Difference) within range described in QAPP?**In 2021, field duplicate samples were collected at two sites on both 5/11/2021 and 7/27/2021:- No Name Creek (Tributary, RM0)- Funny River (Tributary, RM30)Following the example in the Alaska Department of Environmental Conservation 2021 Kenai River Metals Field Report [@apsens2022]: "... a set of paired samples was evaluated for RPD only if:- a.) one or both of the samples were above the Limit of Quantitation (LOQ); AND if- b.) one or both of the samples were at least two times the LOQ."```{r echo = F, message = F, warning = F}# generate table of rpd values and make csv downloadable# general approach:# create table with RPD goals from QAPP# calculate observed 2021 RPD values# compare to goals# subset field data from sites with field duplicates in 2021# for 2021, we have the same sites for both dates. In future years we will need to assign sites by dates as well (join by date).qaqc_sites_2021 <- read_excel("other/input/AQWMS/aqwms_qaqc/aqwms_qaqc_info.xlsx", sheet = "field_dup_sites_2021") %>% remove_empty()# read in data formatted for exportexport_dat <- read.csv('other/output/aqwms_formatted_results/2021_kwf_baseline_results_aqwms.csv') %>% clean_names()# inner joinfield_dup_dat_2021 <- inner_join(export_dat,qaqc_sites_2021) # specify and retain columns needed for RPD analysisrpd_cols_1 <- c("monitoring_location_id", "activity_type", "activity_start_date", "characteristic_name", "result_detection_condition", "result_value", "result_unit", "result_detection_limit_type_1", "result_detection_limit_value_1", "result_detection_limit_unit_1" )# prepare dataframe to evaluate if pairs of values can be evaluated for RPDrpd_check_dat <- field_dup_dat_2021 %>% select(all_of(rpd_cols_1)) %>% pivot_wider(names_from = "activity_type", values_from = c("result_value")) %>% clean_names() %>%# assign columns to check if data can be evaluated for RPD# field samples# is field sample above LOQ? mutate(field_above_loq = case_when( field_msr_obs > result_detection_limit_value_1 ~ "Y", TRUE ~ "N")) %>%# is field sample 2x LOQ ? mutate(field_loq_2x = case_when( field_msr_obs > 2*result_detection_limit_value_1 ~ "Y", TRUE ~ "N")) %>%# qc samples# is qc sample above LOQ? mutate(qc_above_loq = case_when( quality_control_field_replicate_msr_obs > result_detection_limit_value_1 ~ "Y", TRUE ~ "N")) %>%# is qc sample 2x LOQ ? mutate(qc_loq_2x = case_when( quality_control_field_replicate_msr_obs > 2*result_detection_limit_value_1 ~ "Y", TRUE ~ "N"))``````{r echo = F}# execute 2-stage winnowing process to determine which data can be used for RPD calculations# evaluate if observations can be use for RPD calculationsrpd_check_dat %<>% # does this pair of measurements meet criteria that allow them to be used for RPD calculations? # 1.) one or both of the measurements must be above the parameter's limit of quantitation (LOQ), AND # 2.) one or both of the measurements must be at least two times the LOQ." mutate(loq_pass = case_when( field_above_loq == "Y" | qc_above_loq == "Y" ~ "Y", TRUE ~ "N")) %>% mutate(loq_2x_pass = case_when( field_loq_2x == "Y" | qc_loq_2x == "Y" ~ "Y", TRUE ~ "N")) %>% # pair of values is eligible for RPD calcs if above criteria met, AND both values are present mutate(rpd_eligible = case_when( loq_pass == "Y" & loq_2x_pass == "Y" & !is.na(field_msr_obs) & !is.na(quality_control_field_replicate_msr_obs) ~ "Y" , TRUE ~ "N"))# calculate RPD valuesrpd_check_dat %<>% # calc absolute percent diff between field and qc sample value mutate(rpd_pct = ((field_msr_obs - quality_control_field_replicate_msr_obs) / ((field_msr_obs + quality_control_field_replicate_msr_obs)/2))*100) %>% mutate(rpd_pct = abs(round(rpd_pct,2))) # apply result flags where appropriate according to RPD logic described below#z <- rpd_check_dat %>%# mutate(resultFlag = case_when(# field# ))# check 2016 report to see how this was executed exactly .... seems like a ton of data # write csvwrite.csv(rpd_check_dat,"other/output/field_qa_qc_data/rpd_check_dat.csv", row.names = F)```A csv file of Relative Percent Difference values organized by site, date, and parameter may be downloaded below:```{r echo = F}# export csv of rpd results xfun::embed_file("other/output/field_qa_qc_data/rpd_check_dat.csv", text = "Download 2021 Duplicate Sample Relative Percent Difference Data")# note: code in the next chunk references an object named "rpd_check_dat," do not change the name of the csv#z <- rpd_check_dat %>%# select(monitoring_location_id,characteristic_name,activity_start_date,rpd_pct) %>%# arrange(characteristic_name)```<br>From the above downloadable table, we can see that a limited subset of paired values from field duplicates are eligible to be used for RPD calculations for 2021. Pairs of measurements must meet LOQ criteria described above ("a" and "b") in order to be used. For 2021 data, of 69 paired field duplicate observations, 24 of them meet these criteria and thus may be used for RPD calculations.RPD values range from `r min(rpd_check_dat$rpd_pct, na.rm = T)` % to `r max(rpd_check_dat$rpd_pct, na.rm = T)` %.In the 2020 Quality Assurance Action Plan, Relative Percent Difference values for most parameters are set at 20%, with the exception of "Phosphorus, total" and "Nitrates (NO2+NO3)" at 25%, and "Suspended solids, total" at 5%.Using the available RPD values against the above criteria, four paired observations exceed project precision goals in 2021. Three of these four observations are for fecal coliform, and one is for total phosphorus.Most other parameters have at least one available RPD value that is within project precision goals defined in the QAPP.The parameters that do *not* have at least one RPD calculation available, due to the LOQ constraints described above, are the following:- Arsenic- Cadmium- Chromium- Copper- Lead- Zinc*Other notes*In 2021 the full suite of dissolved metals was run for method 200.8 on samples from 7/27/2021. However we intend to only submit data for parameters that had a pre-defined QA plan as outlined in the QAPP, which is the case for the following dissolved six metals: Mg, As, Cd, Cr, Cu, Pb, and Zn.**Summary of Discussion on RPD (Relative Percent Difference) Values for 2021**We attempted to calculate RPD values for all parameters collected in 2021 at two field sites on two dates. For each pair of duplicate observations, one of the following descriptions apply:- It ***was possible*** to calculate RPD values because: - a.) one or both of the samples **were above** the LOQ; AND one or both of the samples **were** at least two times the LOQ - Project data for this parameter will be submitted unflagged to EPA WQX if RPD values meet project goals outlined in the QAPP- It ***was not possible*** to calculate RPD values because: - a.) one or both of the samples **were not above** the LOQ; OR one or both of the samples **were not** at least two times the LOQ - Project data for this parameter will be submitted unflagged to EPA WQX if both paired values are non-detectBest practices for interpreting and applying RPD data in order to flag/not flag the overall dataset are currently being finalized. KWF communicated with the Alaska Department of Environmental Conservation on August 22, 2023 via an inquiry on this topic. The memo includes a visual flow chart of the logic for when data is flagged, and is available for download below.```{r echo = F}# export csv of rpd results xfun::embed_file("other/documents/QAPP/memos/KWF Memo to ADEC August 2023.pdf", text = "Download August 2023 Memo to ADEC regarding RPD Interpretation")```<br>*Final determinations on flagging / not flagging 2021 data submitted to the EPA WQX****All 2021 field results will be submitted unflagged to the EPA WQX unless discussed below.***- *Fecal Coliform* - Given that 3 out of 4 RPD values for fecal coliform values are \>60%, field data values for the 2021 dataset may vary from actual environmental conditions. The 2020 KWF QAPP does not specify an RPD value for fecal coliform, it only specifies control checks for sterility and temperature. We used EPA Method 9222D (drinking water) for fecal coliform analysis methods, which does not specify an RPD value. However EPA general recommendations for pathogens in sewage sludge recommends and RPD of 20% [(link)](https://www.epa.gov/biosolids/quality-assurance-project-plan-qapp). - Given the above considerations, KWF will submit 2021 fecal coliform data to the EPA WQX as "flagged". Fecal coliform data will be evaluated in this report and assessed against state standards, but given the high RPD values among duplicate samples, 2021 data will likely not be considered in the context of potential regulatory exceedences. - TO DO: Calculate and visualize RPD values for 2014-2022 and compare, also compare to RPD values in previous comprehensive report.- *Selected Dissolved Metals (As, Cd, Cr, Cu, Pb, Zn)* - Because these parameters do not have an associated RPD value from 2021 field data, they will be submitted to EPA WQX as "flagged." - Note: beginning in Summer 2022, KWF initiated a series of changes to the QAPP that will better facilitate QA/QC decisions regarding Total Metals and Dissolved metals results. These changes include use of a dissolved metals field blank, and changing from field filtration to lab filtration as standard practice.```{r echo = F}# MOVE THIS STEP TO LAST AFTER ALL QUESTIONS THAT COULD PRODUCE A FLAG# doing now: 12/4/23 11:20 - check to see if any other questions generate flags# notes on flagging data# based on above description, use case_when logic to apply flag Y/N as appropriate. # code should remain agnostic of parameter and year# source the results file, re-save as "flagged file" in same folder; do so for each additional future step that involves flagging# read in data formatted for exportexport_dat <- read.csv('other/output/aqwms_formatted_results/2021_kwf_baseline_results_aqwms.csv') %>% clean_names() %>% transform(activity_start_date = ymd(activity_start_date))# apply manually-chosen data flags based on discussion in question 19# NOTE: this csv also has a column for summary rationale for each flagging decisionflags_2021 <- read.csv('other/input/AQWMS/2021_data_flag_decisions.csv') %>% select(-notes) %>% transform(activity_start_date = mdy(activity_start_date))export_dat <- full_join(export_dat,flags_2021) %>% # fill in "flag" column with "N" if not already "Y" mutate(flag = case_when( is.na(flag) ~ "N", TRUE ~ "Y" )) %>% filter(!is.na(activity_id))# do we need to distinguish between RPD flag and MS flag?# re-export results w/ rpd flag columnwrite.csv(export_dat,"other/output/aqwms_formatted_results/2021_export_data_flagged.csv", row.names = F)# remove existing "export_dat" dataframerm(export_dat)# Coordinates for JC site are still correct at this step.# is RM 43 missing longitude?```<br>**20.) Were there any laboratory discrepancies, errors, data qualifiers, or QC failures (review laboratory duplicates, matrix spikes and blanks)?**Laboratory discrepancies and QC failures from the SGS lab are described on page 2 of the spring and summer PDF document reports, which are available for download below.```{r echo = F}xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SGS/Spring 2021 Results from SGS.pdf', text = "Download Original Spring 2021 Metals/Nutrients Lab Results from SGS - PDF file")xfun::embed_file('other/input/2021_wqx_data/spring_2021_wqx_data/SGS/Spring 2021 Results from SGS.pdf', text = "Download Original Spring 2021 Metals/Nutrients Lab Results from SGS - PDF file")``````{r echo = F, message = F, warning = F}# calculate qa/qc discrepancies as detailed in PDF reports check. qapp for guidance, or check w/ qa officer...matrix_spike_limits <- read.csv("other/output/lab_qaqc_data/2021_lab_qaqc_data/sgs21_als21_qaqc_dat.csv") %>% select(lab_name,sample,collect_date,extracted_date,sample_type,result,analyte,dissolved,analytical_method, resultflag,percent_recovered,rec_limit_low,rec_limit_high,sample_rpd,rpd_limit_low,rpd_limit_high, loq,lod,detection_limit,sample_condition) %>% # Matrix Spike Recovery # calculate if matrix spike recovery limits pass requirements mutate(rec_limit_pass = case_when( is.na(rec_limit_low) ~ "", percent_recovered > rec_limit_high | percent_recovered < rec_limit_low ~ "N", TRUE ~ "Y" ))# calculate total number of matrix spike cases where recovery exceeded limitsmatrix_spike_recovery_fails <- matrix_spike_limits %>% filter(rec_limit_pass == "N") %>% nrow() %>% as.numeric()# generate table of specific failed samplesmatrix_spike_recovery_fails_tbl <- matrix_spike_limits %>% filter(rec_limit_pass == "N") %>% select(-extracted_date,-dissolved)write.csv(matrix_spike_recovery_fails_tbl,"other/output/lab_qaqc_data/matrix_spike_recovery_fails_2021.csv")```<br>The following discrepancies were identified in the results from SGS laboratories:**Matrix Spike Recoveries**A total of `r matrix_spike_recovery_fails` matrix spike or matrix spike duplicate samples are outside of QC criteria. The limit of recovery range for the analyte "Total Nitrate/Nitrite-N" is 90% - 110%. For these matrix spike measurements outside of the QC criteria, recovery levels range from 112% - 118%. The subset of sample values that exceed threshold may be downloaded as a table below:```{r matrix-spikes, echo = F, messages = F, warnings = F}# embed for downloadxfun::embed_file("other/output/lab_qaqc_data/matrix_spike_recovery_fails_2021.csv", text = "Download Matrix Spike Recovery Values Exceeding Threshold for 2021 Sample Analyses")```<br>No additional laboratory QA/QC anomalies for any other parameters were noted by any laboratories contracted in 2021. Additional details are available upon request at [hydrology\@kenaiwatershed.org](mailto:hydrology@kenaiwatershed.org){.email}As of `r Sys.Date()`, "Total Nitrate/Nitrite-N" values will be flagged for the spring 2021 dataset. (Spring only and not summer because no matrix spike recovery exedences were observed from the summer 2021 lab results). The matrix spike recovery exceedances indicate that reported lab values may be 2% - 7% above actual field values. At a later date we will evaluate these results in the context of year 2000 - present data to help evaluate the relevancy of this flag.```{r echo = F}# apply "flag = Y" to all "Total Nitrate/Nitrite-N" values for Spring 2021# this step is already accomplished in a previous code block, which imports and joins "other/input/AQWMS/2021_data_flag_decisions.csv" to the parameters chosen in this discussion.```<br>**21.) Is any laboratory data rejected and why?**No additional 2021 laboratory data currently indicates a need for being flagged or rejected.<br>**22.) During the field season, review raw data files (EDDs \[Electronic Data Deliverables\], instrument records) as they are received. Document changes and corrections to methods as needed.**In 2021, KWF initiated as standard practice the receipt of machine-readable EDD files from contracted laboratories, in addition to PDF results. The EDD files are being reviewed as part of this QA/QC process, which was initiated in Winter 2022.<br>**23.) Is the dataset complete and did you receive the expected number of results?**From the discussion and results in question #3, deviations are evident between planned and actual results reported. The reasons for the deviations are known and are attributable to the two factors discussed in question #3, and thus not concerning to project integrity.<br>**24.) Was the data collected representative of environmental conditions?.**The data collected in this project is representative of seasonal hydrological conditions on two fieldwork days in 2021. Conditions vary instantaneously thus this dataset serves to provide representative longitudinal snapshots of the watershed across space on the two dates.Flagged data indicates that lab results may not be representative of environmental conditions, and thus may require additional interpretation or exclusion from analysis.<br>**25.) Does project meet Completeness Measure A criteria?**From the QAPP, Completeness Measure A: "the primary number of samples collected divided by the usable number of samples submitted to ADEC, with a goal of 85% completeness." (Note that as of 2023 results are now submitted directly to EPA rather than DEC, per request of DEC).(Note: metals parameters not specified in the QAPP are excluded from these analyses. In 2021, SGS laboratories provided dissolved metals results for parameters beyond what was originally requested.)We calculated Completeness Measure A at the following project scales:- For individual parameters- For individual sites- As a project whole```{r echo = F, message=FALSE}# read in list of analytes specified in QAPPanalytes_2021 <- read.csv("other/input/AQWMS/analytes_list_manual_edit.csv")colnames(analytes_2021) <- c("characteristic_name","a","param_in_qapp") analytes_2021 %<>% select(-a)# BY PARAMETER# summarise flags by parametercompleteness_summary_param <- read.csv('other/output/aqwms_formatted_results/2021_export_data_flagged.csv') %>% clean_names() %>% # reduce the list of analytes to only those specified in QAPP left_join(analytes_2021) %>% filter(param_in_qapp == "Y") %>% # summarise group_by(result_analytical_method_id,characteristic_name,flag) %>% count() %>% rename(actual_results_n = n) %>% # transform(activity_start_date = ymd(activity_start_date)) pivot_wider(names_from = "flag", values_from = "actual_results_n") %>% rename(flag_Y = Y, flag_N = N) %>% rowwise() %>% mutate(total_samples = sum(flag_Y,flag_N, na.rm = T)) %>% # calculate Completeness Measure A for each parameter mutate(cma = flag_N/total_samples)%>% mutate(cma = replace_na(cma,0)) %>% adorn_pct_formatting(,,,cma)write.csv(completeness_summary_param,"other/output/field_qa_qc_data/completeness_measures/cma_parameter.csv")## BY SITE# summarise flags by sitecompleteness_summary_site <- read.csv('other/output/aqwms_formatted_results/2021_export_data_flagged.csv') %>% clean_names() %>% # reduce the list of analytes to only those specified in QAPP left_join(analytes_2021) %>% filter(param_in_qapp == "Y") %>% # summarise group_by(monitoring_location_id,flag) %>% count() %>% rename(actual_results_n = n) %>% # transform(activity_start_date = ymd(activity_start_date)) pivot_wider(names_from = "flag", values_from = "actual_results_n") %>% rename(flag_Y = Y, flag_N = N) %>% rowwise() %>% mutate(total_samples = sum(flag_Y,flag_N, na.rm = T)) %>% # calculate Completeness Measure A for each parameter mutate(cma = flag_N/total_samples)%>% mutate(cma = replace_na(cma,0)) %>% adorn_pct_formatting(,,,cma)write.csv(completeness_summary_site,"other/output/field_qa_qc_data/completeness_measures/cma_site.csv")# OVERALL# summarise flags for overall projectcompleteness_summary_project <- read.csv('other/output/aqwms_formatted_results/2021_export_data_flagged.csv') %>% clean_names() %>% # reduce the list of analytes to only those specified in QAPP left_join(analytes_2021) %>% filter(param_in_qapp == "Y") %>% # summarise group_by(flag) %>% count() %>% rename(actual_results_n = n) %>% # transform(activity_start_date = ymd(activity_start_date)) pivot_wider(names_from = "flag", values_from = "actual_results_n") %>% rename(flag_Y = Y, flag_N = N) %>% rowwise() %>% mutate(total_samples = sum(flag_Y,flag_N, na.rm = T)) %>% # calculate Completeness Measure A for each parameter mutate(cma = flag_N/total_samples)%>% mutate(cma = replace_na(cma,0)) %>% adorn_pct_formatting(,,,cma)write.csv(completeness_summary_project,"other/output/field_qa_qc_data/completeness_measures/cma_project.csv")# procedure: based on discussion in above questions, decide what is flagged as "unusable." THEN return to this question to calculate completeness measure A. adapt from existing code in "appendix_a_temp.qmd"``````{r complete-measures, echo = F, messages = F, warnings = F}# embed for downloadxfun::embed_file("other/output/field_qa_qc_data/completeness_measures/cma_parameter.csv", text = "Download Completeness Measure A Results, By Parameter")xfun::embed_file("other/output/field_qa_qc_data/completeness_measures/cma_site.csv", text = "Download Completeness Measure A Results, By Site")xfun::embed_file("other/output/field_qa_qc_data/completeness_measures/cma_project.csv", text = "Download Completeness Measure A Results, By Overall")```<br>*Discussion on "Completeness Measure A" Results*Based on the data flagging decisions discussed in the previous questions, the 2021 dataset did not achieve all Completeness Measure A goals.When completeness results were summarized by parameter, all dissolved metals had a value of 0% for Completeness Measure A since the decision was made to flag these as a result of an inability to calculate Relative Percent Difference values.When completeness results were summarised by Site, Completeness Measure A values ranged from 44.4% - 71.4%, depending on the site's sampling plan.When completeness results are summarized for the overall 2021 dataset, Completeness Measure A is calculated at 50.1%; indicating that 49.9% of the overall result values were flagged.<br>**26.) Does project meet Completeness Measure B criteria?**From the QAPP, Completeness Measure B: "the planned number of samples divided by the useable samples (submitted to ADEC) with a goal of 60%."For these calculations, we will consider "usable" as "unflagged," and will calculate at the following scales:- For individual parameters (analysis methods)- For individual sites- As a project whole```{r echo = F, message=FALSE, eval= F}# NOTE: IF THE NUMBER/TYPE OF PLANNED SAMPLES IS IDENTICAL TO THE SAMPLES ACTUALLY COLLECTED, THAN COMPLETENESS MEASURES A = COMPLETENESS MEASURES B (?!)# BY PARAMETER (actually by method)# unflagged (usable) vs expected resultscmb_by_method <- read.csv('other/output/aqwms_formatted_results/2021_export_data_flagged.csv') %>% clean_names() %>% # reduce the list of analytes to only those specified in QAPP left_join(analytes_2021) %>% filter(param_in_qapp == "Y") %>% group_by(result_analytical_method_id, activity_start_date, activity_type, flag) %>% count() %>% rename(unflagged_results_n = n) %>% pivot_wider(names_from = "flag", values_from = "unflagged_results_n") %>% rename(flag_Y = Y, flag_N = N) %>% transform(activity_start_date = ymd(activity_start_date)) %>% left_join(planned_samples_2021_summary) %>% # calculate Completeness Measure B for each parameter mutate(cmb = flag_N/expected_results_n)%>% mutate(cmb = replace_na(cmb,0)) %>% adorn_pct_formatting(,,,cmb)write.csv(cmb_by_method,"other/output/field_qa_qc_data/completeness_measures/cmb_parameter.csv")# BY SITE# unflagged (usable) vs expected results# prepare table of planned sample summary that includes site names# repeat methods as above for parametercmb_by_site <- read.csv('other/output/aqwms_formatted_results/2021_export_data_flagged.csv') %>% clean_names() %>% # reduce the list of analytes to only those specified in QAPP left_join(analytes_2021) %>% filter(param_in_qapp == "Y") %>% group_by(monitoring_location_id,flag) %>% count() %>% rename(unflagged_results_n = n) %>% pivot_wider(names_from = "flag", values_from = "unflagged_results_n") %>% rename(flag_Y = Y, flag_N = N) %>% left_join(planned_samples_2021_summary_by_site) %>% # calculate Completeness Measure B for each SITE mutate(cmb = flag_N/expected_ct)%>% mutate(cmb = replace_na(cmb,0)) %>% adorn_pct_formatting(,,,cmb)# calculated results here are non-sensical. re-assess what we need in the tables being joined; by sitewrite.csv(cmb_by_method,"other/output/field_qa_qc_data/completeness_measures/cmb_parameter.csv")# BY PROJECT WHOLE# unflagged (usable) vs expected results ```Work still in progress on calculating Completness Measures B as of `r Sys.Date()`<br>**27.) Was the QA officer consulted for any data concerns?***Note: the 2012 and 2019 approved Quality Assurance Action Plan (QAPP) documents list Alaska Dept of Environmental Conservation Staff as the project's QA officer* [@kenaiwatershedforum2012; @kenaiwatershedforum2019]*, whereas the 2023 approved QAPP lists Kenai Watershed Forum staff as the project QA officer* [@kenaiwatershedforum2023]*.*Kenai Watershed Forum consulted regularly with staff from the Alaska Department of Environmental Conservation and EPA Region 10 Office in order to prepare 2021 field data and confirm field methods and data QA/QC procedures.<br>**28.) Are the correct monitoring locations associated with the project? Are latitude and longitude filled out in a consistent format?**We used the leaflet package to generate a map of unique sites from the 2021 dataset:```{r echo = F, message=FALSE}# read in locations from exported data (export_dat)export_dat <- read.csv("other/output/aqwms_formatted_results/2021_export_data_flagged.csv")site_check <- export_dat %>% distinct(monitoring_location_id,activity_latitude,activity_longitude) %>% transform(activity_longitude = as.numeric(iconv(activity_longitude, 'utf-8', 'ascii', sub=''))) %>% # the row for KR RM 43 longitude likely contains a non-unicode character, thus the "iconv" function, otherwise it becomes NA. The above transform also leave that row as latitude > 0, thus the "case_when" step below. Sheesh! mutate(activity_longitude = case_when( monitoring_location_id == "KR RM 43" ~ activity_longitude*-1, TRUE ~ activity_longitude ))# generate maplibrary(leaflet)leaflet(site_check) %>% addTiles() %>% addMarkers(~activity_longitude,~activity_latitude, popup = ~as.character(monitoring_location_id), label = ~as.character(monitoring_location_id))```In the above map we can see that coordinates for the 2021 field sites correspond to the locations listed in the Quality Assurance Project Plan, thus they are filled out in a consistent format.<br>**29.) Are the QAPP and other supporting documents attached?**The QAPP will be included and attached in the data submission to the EPA WQX. Find the current approved QAPP posted on Kenai Watershed Forum's website at <https://www.kenaiwatershed.org/news-media/qapp-revisions-completed-2023/>.<br>**30.) Is all project metadata correct?**Answer TBD. More detail applied here once upload is complete.<br>**31.) Is the organization ID correct?**Kenai Watershed Forum's Organization ID in the EPA Water Quality Exchange is "Kenai_WQX." This value is manually confirmed during the upload process.<br>**32.) Are the time zones consistent and correct (AKDT in summer)?**```{r echo = F}# Existing format does not include AKDT time zone column.# read in locations from exported data (export_dat)export_dat <- read.csv("other/output/aqwms_formatted_results/2021_export_data_flagged.csv") %>% mutate(time_zone = "AKDT")# re-write export_datwrite.csv(export_dat,"other/output/aqwms_formatted_results/2021_export_data_flagged.csv")```All times zones are listed as AKDT.<br>**33.) Are all media types included? Media types appropriate to Characteristic?**"Water" is the only media type considered in this project. All "Characteristics" (parameters) are measured from water samples or in-situ water conditions.<br>**34.) Check Sample Collection, Preparation and Preservation Methods, Thermal Preservative, Equipment ID, Activity Media. Is supporting information included and correct??**```{r echo = F}# sample collection methodunique(export_dat$equipment_id)```"Sample collection method": all sample values have "Equipment ID" listed as "Water Bottle.""Preparation":this information corresponds to the "result sample fraction" column"Preservation Methods": this information corresponds to the "preservative" and "thermal preservative" columns.<br>**35.) Are all expected activity types present and are QC samples correctly identified?**```{r echo = F}# id uniqueunique(export_dat$activity_type)```<br>The activity types present are listed above. All three types were planned and anticipated. QC samples are described earlier in Question #3.<br>**36.) Is the Activity media subdivision filled in (if relevant)?**```{r echo = F}unique(export_dat$activity_media_subdivision_name)```All "Activity Media Subdivision" names are listed as "Surface Water."<br>**37.) For Water activity media, is the relative depth filled in?**Yes, "activity depth height measure" and "activity depth height measure unit" are included as columns.<br>**38.) Is the number of results for each Characteristic correct?**Questions #2 and #3 address this topic in detail. A summary table of the total number of observations for each characteristic is available in the downloadable table below.```{r echo = F}#summarisechr_num <- export_dat %>% group_by(characteristic_name) %>% tally()# savedir <- "other/output/misc/characteristic_count_2021.csv"write.csv(chr_num,dir)``````{r echo = F}# embed for downloadxfun::embed_file(dir, text = "Download Observation Count by Characteristic, 2021 Kenai River Baseline Water Quality")```<br>**39.) Do the range of result values make sense?**Prior to upload to the EPA WQX, the range of results will be evaluated visually and numerically in each characteristic-specific section of the comprehensive report, relative to past results. Likely outliers will be identified and recorded in a dedicated spreadsheet (download below). Each likely outlier will have accompanying notes discussing the choice, and a decisions as to whether a more detailed statistical investigation is warranted.```{r echo = F}embed_file("other/input/outliers/baseline_outliers.xlsx", text = "Download identified potential outliers for 2021 Kenai River Baseline Water Quality data")```<br>**40.) Are units correct and consistent for each parameter?**```{r echo = F}z <- export_dat %>% select(characteristic_name, result_unit) %>% distinct()data.frame(unique(z$result_unit))# apply correction to make units consistent. May be different solution here for other yearsexport_dat[export_dat == "mg/l"] <- 'mg/L'```We observe that equivalent units, "mg/L" and "mg/l", have slightly different notation among parameters. We will correct this at this step to make terms consistent.<br>**41.) Are detection limits and laboratory qualifiers included for analytical results?**Laboratory qualifiers including "J" (below limit of quantitation; is an estimate), "=", (observed value), "U" (below detection limit), and "ND" (Not Detected) are included in the column titled "result_qualifier."Detection limits and their corresponding units are included in columns titled "result_detection_limit_type_1" and in "result_detection_limit_unit_1" corresponding to Limit of Quanititation, and in columns titled "result_detection_limit_type_2" and "result_detection_limit_unit_2" corresponding to Method Detection Level.<br>**42.) Are results trip blanks and/or field blanks above detection limits, and is any related data flagged as a result?**In summer 2021, trip blanks were included with those field crews tasked with collection water samples for hydrocarbon analyses. Sites visited by these crews include RM 1.5 - Kenai City Dock, RM 6.5 - Cunningham Park, Rm 40 - Bing's Landing, and RM 43 - Upstream of Dow Island.On the summer 2021 SGS chain of custody documents, BTEX trip blanks are specified for the above sites. However in the reported results, a single sample named "Trip Blank" is reported. This error resulted from a mis-communication with SGS staff.In this case, all hydrocarbon results in the Trip Blank are reported as below detection limit, suggesting there is not a need to flag summer 2021 hydrocarbon results.<br>### Uploading 2021 data to the EPA Water Quality Exchange (EPA WQX)Each water quality dataset that is housed in the EPA WQX consists of three uploads:- Project description- Monitoring locations- Results and activitiesWe will prepare and finalize each of these uploads here.#### Project DescriptionOn 12/5/2023 we downloaded the KWF Agency Baseline project description datafile from the EPA Water Quality Portal ([EPA WQP](https://www.waterqualitydata.us/#countrycode=US&statecode=US%3A02&countycode=US%3A02%3A122&within=65&lat=60.550309&long=-151.098437&organization=KENAI_WQX&mimeType=csv&providers=NWIS&providers=STEWARDS&providers=STORET)). The file can be directly accessed at the link below:```{r echo = F}embed_file("other/input/WQX_downloads/project/project.csv", text = "Kenai River Baseline Water Quality Monitoring project details for EPA WQP")```<br>We found that the project file listed the QAPP (Quality Assurance Project Plan) as "not yet approved." It is possible that this designation is listed as such because previously approved versions of the QAPP (versions 1.0 and 2.0) do not include signatures from an EPA staff on the Title & Approval page, but were instead approved by Alaska Department of Environment Conservation Staff. For the [QAPP approved on 5/9/2023](https://www.kenaiwatershed.org/news-media/qapp-revisions-completed-2023/) (version 3.0), two Region 10 EPA staff signed as approvers on the Title & Approval page.To apply this correction, in the "Project" spreadsheet we modified "QAPP Approval Status" from "N" to "Y" and re-uploaded to the EPA Water Quality Exchange in Winter 2023.#### Monitoring locationsOn 12/5/2023 we downloaded the KWF Agency Baseline project monitoring locations datafile from the EPA Water Quality Portal ([EPA WQP](https://www.waterqualitydata.us/#advanced=true)). The file can be directly accessed at the link below:```{r echo = F}embed_file("other/input/WQX_downloads/station/station.csv", text = "Download Kenai River Baseline Water Quality Monitoring Sites 2021")```<br>We retained all sites whose column description for "MonitoringLocationName" contained the prefix "KBL" (Kenai Baseline).We plotted locations on a map using the leaflet package, and confirmed that existing site names in the database matched those used in the 2021 dataset.In the EPA WQX Central Data Exchange, we created a custom import configuration to accommodate importing the list of existing sites.```{r echo = F, message=FALSE}# import csv of sites downloaded from EPA Water Quality Portalwqp_sites <- read.csv("other/input/WQX_downloads/station/station.csv")# --> TO DO::: dveelop import config to accommodate this csv# read in locations from exported data (export_dat)wqp_sites %<>% distinct(MonitoringLocationName,MonitoringLocationDescriptionText,LatitudeMeasure,LongitudeMeasure) %>% filter(str_detect(MonitoringLocationName,"^KBL"))# generate mapleaflet(wqp_sites) %>% addTiles() %>% addMarkers(~LongitudeMeasure,~LatitudeMeasure, popup = ~as.character(MonitoringLocationName), label = ~as.character(MonitoringLocationName))# Notes on ADEC Site Names# It appears that the "monitoring location id" site names provided by ADEC are not in the existing project dataset on the EPA WQX. (Example: ADEC template calls the No Name Creek site "KR RM 0 NNC", whereas the existing name in the EPA database for this site is "KBL_t_00.0", indicating "Kenai River Baseline, tributary, river mile 0.00")# For the next step, "Results & Activities," we need to use the existing EPA site names for MonitoringLocationID, and also integrate these into the Activity ID names, and also make sure other relevant columns like MonitoringLocationDescriptionText are included.# we can perform the join by lat/long if we convert to character and have the same number of decimal places in both dataframes# this issue will be avoidable in the future!# Notes on "station.csv" download from EPA WQX# The download contain extensive column info about well characteristics. No wells are included as part of this study, but we kept them in the import configuration for consistency's sake. TBD if this is necessary.```<br>#### Results and ActivitiesWithin Kenai Watershed Forum's EPA Water Quality Portal web account, we developed an import configuration that accommodated all relevant columns and data types for this project. We based the configuration primarily on the template provided by the Alaska Dept of Environmental Conservation.For this export, we identified as "Rejected" all observations that had been "flagged" throughout the QA/QC process described above. Flagged observations will be uploaded to the EPA WQX labeled as "Rejected" under the column "Result Status ID."```{r echo = F}# remove all flagged observations from 2021 dataset# label all flagged observations as "Rejected" and non-flagged as "Final."export_dat %<>% mutate(`Result Status ID` = case_when( flag == "Y" ~ "Rejected", flag == "N" ~ "Accepted" ))``````{r echo = F}# here, we will re-develop the "export_dat" dataframe to match desired import configuration for EPA WQP; modify import config as needed. # before we prepare the dataframe, we need to re-prepare some columns in order to correctly create the "Activity ID". In the future we can avoid this step by using desired site names in earlier step# --> site names# rejoin WQP site location info (site names) to export_dat## use matched data at "other/input/AQWMS/AQWMS_matching_table.xlsxsite_name_match <- read_excel("other/input/AQWMS/AQWMS_template_matching_table.xlsx", sheet = "adec_site_names") %>% remove_empty()export_dat <- left_join(export_dat, site_name_match)# --> characteristic names# rejoin edited list of analyte abbreviation names, then append to dataframe. Use in creating Activity IDs.analyte_abbrev <- read.csv("other/input/AQWMS/analytes_list_manual_edit.csv") colnames(analyte_abbrev) <- c("characteristic_name","analyte_abbreviation")analyte_abbrev %<>% select(characteristic_name, analyte_abbreviation)export_dat %<>% left_join(analyte_abbrev)# --> sample_condition names (dup or field blank)# this column is not present, so we will extract the info from the existing activity_id columnexport_dat %<>% mutate(sample_condition_abbrv = case_when( grepl("DUP",activity_id) ~ "DUP", grepl("Blank", activity_id) ~ "Blank" ))# working here 12/6/23 - modify/adapt/delete code below to accomplish above 2 goals# rename as "export_dat" once script successful & complete# --> finalize dataframe# Proceeding left to right across columns of AWQMS template# Mutated new column or rename existing column as neededexport_dat %<>% mutate( `Project ID` = 10000007, `Monitoring Location ID` = MonitoringLocationIdentifier, `Activity Media Name` = activity_media_name, `Activity Media Subdivision Name` = activity_media_subdivision_name, # create activity ID name conditionally if condition(s) present `Activity ID` = case_when( is.na(sample_condition_abbrv) ~ paste0(`MonitoringLocationName`,"-",activity_start_date,"-",analyte_abbreviation), !is.na(sample_condition_abbrv) ~ paste0(`MonitoringLocationName`,"-",activity_start_date,"-",analyte_abbreviation,"-",sample_condition_abbrv)), `Activity Start Date` = activity_start_date, `Activity Start Time` = activity_start_time, `Activity Start Time Zone` = "AKDT", `Activity End Date` = activity_end_date, `Activity End Time` = activity_end_time, `Activity End Time Zone` = "AKDT", `Activity Latitude` = activity_latitude, `Activity Longitude` = activity_longitude, `Activity Source Map Scale` = activity_source_map_scale, `Activity Type` = activity_type, # All samples are surface grab samples. Depths are assigned across the board here as 6 inches (~15 cm) `Activity Depth/Height Measure` = activity_depth_height_measure, `Activity Depth/Height Unit` = activity_depth_height_unit, # Next three columns not applicable for surface grab samples `Activity Top Depth/Height Measure` = activity_top_depth_height_measure, `Activity Top Depth/Height Unit` = activity_top_depth_height_unit, `Activity Bottom Depth/Height Measure` = activity_bottom_depth_height_measure, `Activity Bottom Depth/Height Unit` = activity_bottom_depth_height_unit, `Activity Relative Depth Name` = activity_relative_depth_name, `Activity Comment` = activity_comment, `Characteristic Name` = characteristic_name, `Result Analytical Method ID` = result_analytical_method_id, `Result Analytical Method Context` = result_analytical_method_context, `Method Speciation` = method_speciation, `Result Value` = result_value, `Result Unit` = result_unit, `Result Qualifier` = result_qualifier, `Result Weight Basis` = result_weight_basis, `Statistical Base Code` = statistical_base_code, `Result Sample Fraction` = result_sample_fraction, `Result Value Type` = result_value_type, `Result Comment` = result_comment, `Sample Collection Method ID` = sample_collection_method_id, `Equipment ID` = equipment_id, `Result Detection Condition` = result_detection_condition, ### answer in progress here 12/15/23 # question - can we have mre than one result detection condition column # in the CDX import configuration, it appears that multiple columns can have the identical name. This will not work in R. Try: in export_dat, create detection limit columns designated as "1" and "2", then remove these for the import/upload `Result Detection Limit Type 1` = result_detection_limit_type_1, `Result Detection Limit Value 1` = result_detection_limit_value_1, `Result Detection Limit Unit 1` = result_detection_limit_unit_1, # note: lod = "limit of detection"; equivalent to "method detection level" `Result Detection Limit Type 2` = result_detection_limit_type_2, `Result Detection Limit Value 2` = result_detection_limit_value_2, `Result Detection Limit Unit 2` = result_detection_limit_unit_2, `Laboratory Accreditation Indicator` = laboratory_accreditation_indicator, `Laboratory Name` = laboratory_name, `Laboratory Sample ID` = laboratory_sample_id, `Analysis Start Date` = analysis_start_date, `Analysis Start Time` = analysis_start_time, `Biological Intent` = biological_intent, `Subject Taxonomic Name` = subject_taxonomic_name, `Thermal Preservative` = thermal_preservative, `Sample Container Type` = sample_container_type, `Sample Container Color` = sample_container_color, `Chemical Preservative` = chemical_preservative # remove columns that were mutated to a new name ,.keep = "unused")# next, for our AQWMS export, we want to retain just those columns listed in the AQWMS template# found solution to this problem here: https://gist.github.com/djhocking/62c76e63543ba9e94ebe# get all column names from AQWMS template# need to modify this to include MonitoringLocationIdentifieraqwms_colnames <- read_excel("other/input/AQWMS/AWQMS_KWF_Baseline_2021.xlsx", sheet = "KWF Baseline AWQMS Template") %>% colnames()# select from subset of column in aqwms template export_dat %<>% select(one_of(aqwms_colnames))# export final formatted AQWMS results to external csvwrite.csv(export_dat,"other/output/aqwms_formatted_results/results_activities.csv",row.names = F)``````{r echo = F}# assemble all components of upload to one folder location# renew this code for future years; this is manually edited for now.# project (hashtagged out for now becuase we modified the csv to prep for upload; changed QAPP status to Y and added website)#project <- read.csv("other/input/WQX_downloads/project/project.csv")#write.csv(project,"other/output/aqwms_formatted_results/project.csv", row.names = F)# stations#station <- read.csv("other/input/WQX_downloads/station/station.csv") %>% # remove all non-KWF sites# filter(grepl("KBL", MonitoringLocationName))# write.csv(station,"other/output/aqwms_formatted_results/station.csv", row.names = F)```Other notes for final 2021 data upload to EPA CDX- Phosphorus from EPA Method 200.8 (Dissloved Metals by ICP/MS) requires a "method speciation". (This does not apply to "Total Phosphorus", method SM21 4500P-B,E). However this method does not distinguish speciation. Result Status ID was designated "as P" for this test/parameter. Regardless, this parameter was not included in the QAPP, so is designated as "Rejected" in "Result Status ID".Work in progress here as of `r Sys.Date()````{r echo = F}# 1/4/24# next step: re-start (delete old) EPA CDX import config for results_activities. Somehow I duplicated a bunch of column names. Consider removing all irrelevant columns with no info (all NA) from results_activities, make import config to reflect this# next steps# for next years::: take the dataframe at the end of this long script, make so column names/structure conforms to the ADEC example. In the 2021 QA/QC process, it went through the janitor::clean_names process, thus names needed to be fixed again at the end.## 1.) see that column name structure in EPA WQX matches that of adec aqwms example## 2.) ensure that df export from here matches EPA WQX column names# projects# monitoring locations# results & activitiesknitr::knit_exit()```\newpage