STATION CLUSTERING FOR DETECTING DATA INSTABILITY IN AN AIR QUALITY MONITORING NETWORK
DOI:
https://doi.org/10.28925/2663-4023.2026.32.1198Keywords:
Data Mining, K-Means, environmental monitoring, atmospheric air monitoring, information and analytical system, intelligent technology, information technologies, data reliability and validityAbstract
This paper proposes and validates an approach for automated detection of data instability in an atmospheric air quality monitoring network using data mining methods. Unlike traditional threshold-based checks, the approach focuses on behavioral features describing the “measurement stream quality” (completeness, missingness rate, signal variability, and sensor “sticking” indicators) computed from hourly aggregates. The clustering objects are “station–sensor” pairs, which enables localization of issues both at the station level and at the level of individual measurement channels. K-Means clustering is applied with prior feature scaling; the optimal number of clusters is selected using the elbow method and the silhouette coefficient. For cluster interpretation, a projection onto two principal components is used, reflecting a data availability/incompleteness index and a signal dynamics index (variability versus “sticking”). Experiments on real-world data reveal stable degradation profiles of measurements and allow identification of reliable stations and problematic channels (in particular, sensors with a high missingness rate or near-zero hourly variation). The practical value of the study lies in the ability to integrate the proposed method into environmental information-and-analytical systems as a data quality control module, and to further use the results for selecting reference sensors, calibration, and building predictive models.
Downloads
References
European Environment Agency. (2022). Air quality in Europe 2022. https://doi.org/10.2800/488115
Agbo, B., Al-Aqrabi, H., Hill, R., & Alsboui, T. (2022). Missing data imputation in the Internet of Things sensor networks. Future Internet, 14(5), Article 143. https://doi.org/10.3390/fi14050143
Jiao, W., Hagler, G., Williams, R., Sharpe, R., Brown, R., Garver, D., Judge, R., Caudill, M., Rickard, J., Davis, M., Weinstock, L., Zimmer-Dauphinee, S., & Buckley, K. (2016). Community Air Sensor Network (CAIRSENSE) project: Evaluation of low-cost sensor performance in a suburban environment in the southeastern United States. Atmospheric Measurement Techniques, 9(11), 5281–5292. https://doi.org/10.5194/amt-9-5281-2016
U.S. Environmental Protection Agency. (2025, May 1). How to use air sensors: Air sensor guidebook. https://www.epa.gov/air-sensor-toolbox/how-use-air-sensors-air-sensor-guidebook
Buelvas, J., Múnera, D., Tobón V., D. P., Aguirre, J., & Gaviria, N. (2023). Data quality in IoT-based air quality monitoring systems: A systematic mapping study. Water, Air, & Soil Pollution, 234(4), Article 248. https://doi.org/10.1007/s11270-023-06127-9
Chen, M., Zhu, H., Chen, Y., & Wang, Y. (2022). A novel missing data imputation approach for time series air quality data based on logistic regression. Atmosphere, 13(7), 1044. https://doi.org/10.3390/atmos13071044
International Organization for Standardization. (2008). ISO/IEC 25012:2008. Software engineering—Software product quality requirements and evaluation (SQuaRE)—Data quality model. https://www.iso.org/standard/35736.html
Wand, Y., & Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 86–95. https://doi.org/10.1145/240455.240479
Scikit-learn developers. (n.d.). StandardScaler. In Scikit-learn. Retrieved March 2026, from https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler
Scikit-learn developers. (n.d.). KMeans. In Scikit-learn. Retrieved March 2026, from https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans
Shevchenko, D. V., & Holub, B. L. (2025). Air quality monitoring in real time. Mathematical Machines and Systems, (1), 103–112.
Shevchenko, D. V., & Holub, B. L. (2025). Application of data mining methods for multidimensional analysis of atmospheric air quality based on environmental data. Science and Technology Today. Series: Engineering, 8(49), 1801–1810.
Shevchenko, D. V., & Holub, B. L. (2025). Multidimensional analytics of environmental data: Application of OLAP in monitoring systems. Mathematical Machines and Systems, (3–4), 54–65
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Дмитро Шевченко, Белла Голуб, Ірина Бородкіна

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.