NEURAL NETWORK MODEL FOR ASSESSING THE SECURITY LEVEL OF HEAVILY NOISE-MASKED SPEECH INFORMATION BASED ON THE RII STRUCTURAL FRAMEWORK

Authors

DOI:

https://doi.org/10.28925/2663-4023.2025.30.970

Keywords:

heavily noise-masked speech; Residual Intelligibility Index (RII); triphone model; emission neural network; BayesianNN.

Abstract

This study addresses the problem of determining the security level of speech information under complex acoustic and vibroacoustic interference, where traditional speech-quality metrics (SNR, STI, SII, PESQ, STOI) fail to reflect the actual capability of modern reconstruction algorithms (HMM, DNN, BayesianNN, GAN) to recover the semantic content of intercepted signals. Under low signal-to-noise ratios, and even after applying active vibroacoustic masking, a substantial portion of speech structures remains recoverable, creating a risk of information leakage. The absence of a quantitative criterion that unifies the physical, linguistic, and neural-network parameters of the signal and allows evaluation of the real possibility of semantic reconstruction defines the central scientific and technical challenge of this work. To address this, an integral model is proposed for estimating the residual speech information of heavily noise-masked speech, taking into account spectral structure, contextual linguistic dependencies, and the capabilities of modern reconstruction systems. The multi-level architecture of the model includes an analytical spectral description of the signal (Aᵢ, μᵢ, σᵢ*, Z₀(f), s(f)), a Bayesian-Markov triphone structure, and a multilayer emission neural network built using CNN, MLP, and BayesianNN components. The spectral level provides a formal description of energy peaks and adaptive smoothing; the linguistic level reflects probabilistic regularities of transitions between triphones; the neural-network level integrates all feature types and models uncertainty in emission probabilities. Based on the synthesis of these levels, the residual intelligibility criterion (RII) is formulated, quantitatively characterizing the ability of a potential interceptor to recover message content after interference and filtering. A threshold value RII* is defined, interpreted as a conditional boundary between information-dangerous and information-insufficient reconstruction regimes. The proposed model can be applied in technical information-protection systems, testing laboratories, and evaluation complexes for active vibroacoustic interference. The obtained results establish scientific and technical foundations for developing instrumental methods to determine the level of residual speech informativeness and enhance its protection.

Downloads

Download data is not yet available.

References

Deng, L., & Li, X. (2020). Deep learning in speech processing: A review. IEEE Signal Processing Magazine, 37(3), 107–139.

Kolossa, D., & Haeb-Umbach, R. (2018). Robust Speech Recognition: A Probabilistic Approach. Academic Press.

Pardo, J. M., et al. (2023). Bayesian deep learning for acoustic uncertainty modelling. IEEE SPL, 30, 642–646.

Zhao, Y., & Ma, J. (2020). Probabilistic reconstruction of masked speech using Bayesian neural networks. Neural Networks, 132, 229–241.

Arora, A., & Singh, R. (2021). Whisper-to-speech reconstruction using GANs. Pattern Recognition Letters, 152, 62–70. https://doi.org/10.1016/j.patrec.2021.09.011.

Koizumi, Y., et al. (2020). Speech enhancement using deep generative models. IEEE/ACM TASLP, 28, 1778–1788.

Pascual, S., Ravanelli, M., & Serrà, J. (2020). SEGAN and its descendants: Generative enhancement of degraded speech. IEEE SPS Magazine, 37(6), 22–38.

Fogerty, D. (2017). Predicting speech intelligibility in noise. Attention, Perception, & Psychophysics, 79, 333–344.

Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning. IEEE TASLP, 26(10), 1872–1892.

Zolfaghari, M., & Sameti, H. (2023). End-to-end noise-robust ASR integrating spectral and linguistic features. Computer Speech & Language, 77, 101439.

ISO 22955:2021. Acoustics — Acoustic quality of open office spaces.

ISO 3382-3:2022. Acoustics — Measurement of room acoustic parameters — Part 3: Open public offices.

Reddy, C. K., et al. (2019). The Interspeech Deep Noise Suppression Challenge. Interspeech.

Li, A., et al. (2021). Two-stage deep enhancement for ultra-low SNR conditions. Applied Acoustics, 178, 108018.

Bhat, S., & Chatterjee, S. (2020). Complex-domain deep networks for speech enhancement at low SNR. Speech Communication, 122, 48–62.

Adiga, V. S., & Seltzer, M. L. (2020). End-to-end models for robust speech recognition in low-resource and noisy conditions. ICASSP.

Dong, L., & Xu, B. (2022). Noise-robust ASR using factorized time-domain convolutional networks. Speech Communication, 139, 15–27.

Sriram, J., & Chen, M. (2021). Transformer architectures for robust speech recognition. ICASSP.

Biswas, S., & Manocha, D. (2023). Transformer-based low-SNR speech enhancement using multi-scale spectral attention. IEEE SPL, 30, 124–128.

Hershey, J. R., et al. (2016). Deep clustering: Discriminative embeddings for segmentation and separation. ICASSP.

Qin, K., & Wang, D. (2020). Time-domain speech enhancement using deep learning: A survey. Speech Communication, 127, 1–16.

Andronic, E., & Dehak, N. (2019). End-to-end text-independent speaker verification for extremely noisy environments. Interspeech 2019, 4075–4079. https://doi.org/10.21437/Interspeech.2019-2447

Chen, J., Wang, X., & Xu, Y. (2021). Multi-task neural networks for joint speech enhancement and ASR. IEEE TASLP, 29, 2071–2084.

Kim, J., & Hahn, M. (2018). Voice conversion using GANs. NeurIPS Workshop.

Ma, J., & Zhao, Y. (2022). Audio-visual speech reconstruction in extremely noisy environments. IEEE TPAMI, 44(8), 4105–4117.

Alkin, B. (2021). Applied acoustics research for speech privacy. Springer. https://doi.org/10.1007/978-3-030-67379-8. ISBN 978-3-030-67378-1

Nuzhnyi, S. (2025). Udoskonalennia alhorytmu vidnovlennia linhvistychnoyi skladovoyi movnoyi informatsiyi fonemno-formantnym metodom dlia zadach otsinky rivnia yii zakhyshchenosti. Elektronne fakhove naukove vydannia Kiberbezpeka: osvita, nauka, tekhnika, 3(27). https://doi.org/10.28925/2663-4023.2025.27

Nuzhnyi, S. (2025). Adaptyvnyi fonemno-spektralnyi metod vidnovlennia movlennia v umovakh aktyvnykh akustychnykh zavad. Elektronne fakhove naukove vydannia Kiberbezpeka: osvita, nauka, tekhnika, 4(28). https://doi.org/10.28925/2663-4023.2025.28.794805

Nuzhnyi, S. (2025). Otsiniuvannia rivnia zakhyshchenosti skladnozashumlendnoyi movnoyi informatsiyi za kryteriiem zalyshkovoyi rozbirlyvosti movy. Elektronne fakhove naukove vydannia Kiberbezpeka: osvita, nauka, tekhnika, 1(29). https://doi.org/10.28925/2663-4023.2025.29.937

Downloads


Abstract views: 10

Published

2025-10-26

How to Cite

Nuzhnyi, S. (2025). NEURAL NETWORK MODEL FOR ASSESSING THE SECURITY LEVEL OF HEAVILY NOISE-MASKED SPEECH INFORMATION BASED ON THE RII STRUCTURAL FRAMEWORK. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 2(30), 645–661. https://doi.org/10.28925/2663-4023.2025.30.970