STATION CLUSTERING FOR DETECTING DATA INSTABILITY IN AN AIR QUALITY MONITORING NETWORK

Authors

DOI:

https://doi.org/10.28925/2663-4023.2026.32.1198

Keywords:

Data Mining, K-Means, environmental monitoring, atmospheric air monitoring, information and analytical system, intelligent technology, information technologies, data reliability and validity

Abstract

This paper proposes and validates an approach for automated detection of data instability in an atmospheric air quality monitoring network using data mining methods. Unlike traditional threshold-based checks, the approach focuses on behavioral features describing the “measurement stream quality” (completeness, missingness rate, signal variability, and sensor “sticking” indicators) computed from hourly aggregates. The clustering objects are “station–sensor” pairs, which enables localization of issues both at the station level and at the level of individual measurement channels. K-Means clustering is applied with prior feature scaling; the optimal number of clusters is selected using the elbow method and the silhouette coefficient. For cluster interpretation, a projection onto two principal components is used, reflecting a data availability/incompleteness index and a signal dynamics index (variability versus “sticking”). Experiments on real-world data reveal stable degradation profiles of measurements and allow identification of reliable stations and problematic channels (in particular, sensors with a high missingness rate or near-zero hourly variation). The practical value of the study lies in the ability to integrate the proposed method into environmental information-and-analytical systems as a data quality control module, and to further use the results for selecting reference sensors, calibration, and building predictive models.

Downloads

Download data is not yet available.

References

European Environment Agency. (2022). Air quality in Europe 2022. https://doi.org/10.2800/488115

Agbo, B., Al-Aqrabi, H., Hill, R., & Alsboui, T. (2022). Missing data imputation in the Internet of Things sensor networks. Future Internet, 14(5), Article 143. https://doi.org/10.3390/fi14050143

Jiao, W., Hagler, G., Williams, R., Sharpe, R., Brown, R., Garver, D., Judge, R., Caudill, M., Rickard, J., Davis, M., Weinstock, L., Zimmer-Dauphinee, S., & Buckley, K. (2016). Community Air Sensor Network (CAIRSENSE) project: Evaluation of low-cost sensor performance in a suburban environment in the southeastern United States. Atmospheric Measurement Techniques, 9(11), 5281–5292. https://doi.org/10.5194/amt-9-5281-2016

U.S. Environmental Protection Agency. (2025, May 1). How to use air sensors: Air sensor guidebook. https://www.epa.gov/air-sensor-toolbox/how-use-air-sensors-air-sensor-guidebook

Buelvas, J., Múnera, D., Tobón V., D. P., Aguirre, J., & Gaviria, N. (2023). Data quality in IoT-based air quality monitoring systems: A systematic mapping study. Water, Air, & Soil Pollution, 234(4), Article 248. https://doi.org/10.1007/s11270-023-06127-9

Chen, M., Zhu, H., Chen, Y., & Wang, Y. (2022). A novel missing data imputation approach for time series air quality data based on logistic regression. Atmosphere, 13(7), 1044. https://doi.org/10.3390/atmos13071044

International Organization for Standardization. (2008). ISO/IEC 25012:2008. Software engineering—Software product quality requirements and evaluation (SQuaRE)—Data quality model. https://www.iso.org/standard/35736.html

Wand, Y., & Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 86–95. https://doi.org/10.1145/240455.240479

Scikit-learn developers. (n.d.). StandardScaler. In Scikit-learn. Retrieved March 2026, from https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler

Scikit-learn developers. (n.d.). KMeans. In Scikit-learn. Retrieved March 2026, from https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans

Shevchenko, D. V., & Holub, B. L. (2025). Air quality monitoring in real time. Mathematical Machines and Systems, (1), 103–112.

Shevchenko, D. V., & Holub, B. L. (2025). Application of data mining methods for multidimensional analysis of atmospheric air quality based on environmental data. Science and Technology Today. Series: Engineering, 8(49), 1801–1810.

Shevchenko, D. V., & Holub, B. L. (2025). Multidimensional analytics of environmental data: Application of OLAP in monitoring systems. Mathematical Machines and Systems, (3–4), 54–65

Downloads


Abstract views: 63

Published

2026-03-26

How to Cite

Shevchenko, D., Holub, B., & Borodkina, I. (2026). STATION CLUSTERING FOR DETECTING DATA INSTABILITY IN AN AIR QUALITY MONITORING NETWORK. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 4(32), 1054–1064. https://doi.org/10.28925/2663-4023.2026.32.1198