METHOD OF ADAPTIVE SELECTION AND WEIGHTING OF FALSE INFORMATION INDICATORS TO IMPROVE DETECTION EFFICIENCY UNDER HYBRID WARFARE CONDITIONS
DOI:
https://doi.org/10.28925/2663-4023.2026.33.1166Keywords:
false information, fake news detection, hybrid warfare, feature selection, TF-IDF, logistic regression, risk triage, information securityAbstract
The article addresses the problem of improving false information detection in text messages under hybrid warfare conditions. The relevance of the study is determined by the fact that disinformation campaigns in the modern information environment are used to undermine trust in public institutions, distort the perception of events, destabilize public attitudes, and create additional pressure on decision-making systems. The introduction substantiates the need for an interpretable and resource-efficient method suitable for large, dynamic, and imbalanced message streams. The problem statement shows that the use of the full feature space increases computational complexity, reduces interpretability, and does not support the prioritization of message verification according to risk level. The section on recent studies and publications summarizes current approaches to false information detection, including content-based, fact-based, behavioral, and hybrid models, as well as approaches to disinformation risk assessment in wartime conditions. The theoretical foundations systematize the main concepts related to feature selection, text feature space construction, term weighting, classification quality assessment, and the use of PR/ROC representations under class imbalance. On this basis, the conceptual framework of the proposed method of adaptive selection and weighting of false information indicators is formed. The methodology section describes an experimental evaluation carried out on an open Ukrainian-language news corpus related to the events of Russia’s war against Ukraine. After cleaning the data, removing empty records and duplicates, and applying a minimum length filter of 200 characters, a dataset of 29,372 messages was obtained, including 353 false messages and 29,019 true messages. TF-IDF features based on unigrams and bigrams were used for text representation, while logistic regression was selected as the baseline classifier. Feature selection was implemented using a chi-square plus top-K scheme with several K values tested on the validation set; the final working configuration was K=5000. For practical decision support, a three-level risk triage scheme was introduced based on the 80th and 95th percentiles of the validation score distribution. The results section shows that reducing the feature space from 30,000 to 5,000 features does not lead to a substantial decrease in classification quality: on the test set, F1 decreases only from 0.768 to 0.760, while PR-AUC decreases from 0.819 to 0.805. At the same time, the proposed triage procedure demonstrates practical value: the high-risk group covered 256 messages out of 4,406 in the test set and contained 50 out of 53 false messages, whereas the low-risk group contained only 1 false message. The conclusions justify that the proposed method can be used as an interpretable and resource-efficient component of information space monitoring systems. Further research should focus on extending the feature set with semantic, source-based, and behavioral indicators, as well as testing the method on additional Ukrainian-language corpora.
Keywords: false information; fake
Downloads
References
Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2), 211-236. https://doi.org/10.1257/jep.31.2.211
Lazer, D. M. J., Baum, M. A., Benkler, Y., Berinsky, A. J., Greenhill, K. M., Menczer, F., Metzger, M. J., Nyhan, B., Pennycook, G., Rothschild, D., Schudson, M., Sloman, S. A., Sunstein, C. R., Thorson, E. A., Watts, D. J., & Zittrain, J. L. (2018). The science of fake news. Science, 359(6380), 1094-1096. https://doi.org/10.1126/science.aao2998
Wardle, C., & Derakhshan, H. (2017). Information disorder: Toward an interdisciplinary framework for research and policy making. Council of Europe. https://edoc.coe.int/en/media/7495-information-disorder-toward-an-interdisciplinary-framework-for-research-and-policy-making.html
Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. Science, 359(6380), 1146–1151. https://doi.org/10.1126/science.aap9559
Zhou, X., & Zafarani, R. (2021). A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys, 53(5), Article 109, 1-40. https://doi.org/10.1145/3395046
Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22-36. https://doi.org/10.1145/3137597.3137600
Ruchansky, N., Seo, S., & Liu, Y. (2017). CSI: A hybrid deep model for fake news detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 797-806). ACM. https://doi.org/10.1145/3132847.3132877
Maschmeyer, L., Abrahams, A., Pomerantsev, P., & Yermolenko, V. (2025). Donetsk don’t tell – “hybrid war” in Ukraine and the limits of social media influence operations. Journal of Information Technology & Politics, 22(1), 49-64. https://doi.org/10.1080/19331681.2023.2211969
Bachmann, S.-D. D., Putter, D., & Duczynski, G. (2023). Hybrid warfare and disinformation: A Ukraine war perspective. Global Policy, 14(5), 858-869. https://doi.org/10.1111/1758-5899.13257
Tyshchenko, V. S., & Muzhanova, T. M. (2022). Dezinformatsiia i feikovi novyny: Oznaky ta metody vyiavlennia v merezhi Internet [Disinformation and fake news: Features and methods of detection on the Internet]. Kiberbezpeka: osvita, nauka, tekhnika, 2(18), 175-186. https://doi.org/10.28925/2663-4023.2022.18.175186
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
Hall, M. A. (1999). Correlation-based feature selection for machine learning (Doctoral dissertation, University of Waikato). https://www.cs.waikato.ac.nz/ml/publications/1999/99MH-Thesis.pdf
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289-1305. https://www.jmlr.org/papers/v3/forman03a.html
Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513-523. https://doi.org/10.1016/0306-4573(88)90021-0
Shannon, C. E. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30(1), 50-64. https://doi.org/10.1002/j.1538-7305.1951.tb01366.x
Piantadosi, S. T. (2014).Zipf’s word frequency law in natural language:A critical review and future directions.Psychonomic Bulletin & Review, 21(5), 1112-1130. https://doi.org/10.3758/s13423-014-0585-6
Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4), 427-437. https://doi.org/10.1016/j.ipm.2009.03.002
Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432
Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning (pp. 233-240). ACM. https://doi.org/10.1145/1143844.1143874
Zepopo. (n.d.). Ukrainian fake and true news [Data set]. Kaggle. https://www.kaggle.com/datasets/zepopo/ukrainian-fake-and-true-news
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Дмитро Дженджеро, Володимир Наконечний

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.