May 20, 2020

Data Integrity in a Regulatory World: SCADA Systems

This article originally appeared in WWD May 2020 issue as "Data Integrity in a Regulatory World: SCADA Systems" 

Jon Watson
Jon Watson, VP of Emarosa Engineering Inc.

The safe delivery of drinking water and wastewater depends on municipalities, cities and private utilities. Increasingly, regulatory authorities and consumers are asking organizations to demonstrate the effectiveness of their services. Data collected and stored by Supervisory Control and Data Acquisition (SCADA) systems can only start to meet these requirements. Progressive service providers are now seeking to determine if their data is accurate, reliable, relevant and useful. 

Several years ago, simply collecting and storing water treatment facility data was considered good practice. Generating reports based on data was an afterthought. Today, storing that data is just the first step an organization takes before it can confidently use data for day-to-day decisions and long term planning. With advanced reporting software tools to augment SCADA systems, data in the water and wastewater sector can now work in ways previously only imagined. 

Data Resiliency

In the face of climate change, municipalities have started to take on the issue of resiliency. Data collection must be considered in overall resiliency planning. Data gaps as a result of network or server failure are a significant source of unreliable data. Traditionally, SCADA systems poll remote processing units (RPUs) or programmable logic controllers (PLCs) for data at specified intervals. But, if there is a network communication failure at any point, data is lost. Many organizations have tried to solve this issue with redundant communication links. However, this adds complexity to the network and comes at a cost. In addition, redundant communication links do not solve the problem of a lost SCADA server. 

A possible solution is implementing SCADA systems and PLCs that are capable of storing and forwarding data during a communication loss. These protocols require specific hardware and software. Another approach is setting up PLCs to log time-stamped data and store this information outside of the SCADA system in a secondary database. The secondary data set can backfill the primary data set when required. Several modern PLC platforms now feature this capability. These systems leverage already installed PLC hardware and avoid stand-alone data loggers, modified instrumentation wiring and an increased number of components. Both approaches address the challenge of network communication outages. 

Data Validation

Traditional reporting systems generate reports directly from SCADA historian data. Determining the accuracy of the data is often a manual review process subject to human error and interpretation. Issues are found after data is used with subsequent questions and investigation. These challenges become exponential as SCADA systems grow in size and complexity. Modern reporting systems can analyze critical process data with algorithms that use specific business rules for a given utility. For example, spikes in chlorine can be removed from data sets when there is no system flow. Removing this data provides useful information on chlorine when the system is operating. Data validation processes detect data gaps and other conditions specific to organizational needs. The validation process flags questionable data for manual approval or approves sound data automatically. This reduces the time it takes compliance staff to verify data. In these advanced systems, data can be commented on electronically and records are kept of the situation or condition that led to the issue.


Enhancing operations and long-term planning with dynamic reporting should be a priority of every utility. Organizations can combine redundant data sets into a common data set or use a reporting system with an “either/or” approach to enhance data reliability. Running data through an automated validation process gives utilities confidence their reports are accurate and meaningful, so they can make impactful decisions quicker than ever before. When multiple data sets are combined with a validation process, the flow of data is simplified.


A multi-layered approach to data reliability means that instead of questioning validity and accuracy, utilities can enact change in their organizations. Making data collection and validation a part of a resiliency strategy advances providers and their management of process data.

About the author

Jon Watson, C.E.T, is vice president of Emarosa Engineering Inc. Watson can be reached at [email protected].