SECURITY OF LARGE LANGUAGE MODELS: RISKS, THREATS, AND SECURITY APPROACHES

Authors

DOI:

https://doi.org/10.28925/2663-4023.2025.29.918

Keywords:

large language models; Generative AI; cybersecurity; AI firewall; prompt injection; guardrails; watermarking; LLM vulnerability.

Abstract

The article provides a comprehensive analysis of current security challenges related to Large Language Models (LLMs), which have become a key element of digital transformation across multiple sectors. It examines typical threats arising both from targeted attacks on models and from their malicious use in cybercrime. The main risk vectors are identified, including prompt injection - embedding hidden instructions in user queries to alter model logic, and jailbreaking - crafting prompts that bypass built-in restrictions and trigger undesirable behavior. Special attention is given to the risks of confidential data leakage from training datasets, generation of vulnerable or malicious code that can enter production environments, and the dissemination of disinformation, including multimedia deepfakes. Based on the analysis, a conceptual LLM security model is proposed, combining technical, architectural, and regulatory elements of protection. Particular emphasis is placed on assessing and applying mechanisms such as AI firewalls - intermediary systems that filter model inputs and outputs; built-in security modules integrated into model architectures; and guardrails - restrictions on outputs without altering parameters. Additionally, watermarking methods for synthetic content identification and AI-generated content detection tools are considered. Regulatory measures are highlighted as an essential component to establish usage frameworks for powerful models and mitigate misuse risks. It is concluded that traditional cybersecurity measures, focused on static or signature-based detection, are insufficient for generative systems operating in dynamic natural language environments. Enhancing security requires multilayered strategies covering all stages of the LLM lifecycle - from design and training to deployment and regulatory oversight.

Downloads

Download data is not yet available.

References

OpenAI. (2025). GPT-5 system card. OpenAI. https://openai.com/index/gpt-5-system-card/

Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., … & Song, D. (2021). Extracting training data from large language models. In Proceedings of the 30th USENIX Security Symposium (pp. 2633–2650). USENIX Association.

Weidinger, L., Uesato, J., Rauh, M., Griffin, C., Huang, P. S., Mellor, J., … & Gabriel, I. (2021). Ethical and social risks of harm from language models. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT) (pp. 214–229). ACM.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT) (pp. 610–623). ACM.

European Commission. (2024). Proposal for a regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). EUR-Lex. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:52021PC0206

Microsoft. (2023). Responsible AI standard v2. Microsoft. https://www.microsoft.com/ai/responsible-ai

Anthropic. (2023). Constitutional AI: Harmlessness from AI feedback. arXiv Preprint. arXiv:2212.08073. https://arxiv.org/abs/2212.08073

National Institute of Standards and Technology. (2023). AI risk management framework (AI RMF 1.0). U.S. Department of Commerce. https://www.nist.gov/itl/ai-risk-management-framework

Anthropic. (2023). Model card and evaluations for Claude models. Anthropic. https://www-cdn.anthropic.com/bd2a28d2535bfb0494cc8e2a3bf135d2e7523226/Model-Card-Claude-2.pdf

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., … & Metzler, D. (2023). Emergent abilities of large language models. arXiv Preprint. arXiv:2307.02483. https://arxiv.org/pdf/2307.02483

Azaria, A., & Mitchell, T. (2023). The internal state of an LLM knows when it’s lying. arXiv Preprint. arXiv:2305.07243. https://arxiv.org/pdf/2305.07243

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. arXiv Preprint. arXiv:1909.08593. https://arxiv.org/pdf/1909.08593

Stiennon, N., Ouyang, L., Wu, J., Ziegler, D. M., Lowe, R., Voss, C., Radford, A., Amodei, D., & Christiano, P. (2020). Learning to summarize with human feedback. arXiv Preprint. arXiv:2104.05218. https://arxiv.org/pdf/2104.05218

Liu, A., Pang, R. Y., Zeng, K., Wang, A., Xie, T., Chen, X., & Zhou, D. (2023). Trustworthy AI: A computational perspective. arXiv Preprint. arXiv:2301.10226. https://arxiv.org/pdf/2301.10226

Glukhov, D., Wiggers, K., & Young, T. (2023). The malicious use of generative AI for cyberattacks: A survey. Journal of Information Security and Applications, 75, 103666. Elsevier. https://doi.org/10.1016/j.jisa.2023.103666

Downloads


Abstract views: 11

Published

2025-09-26

How to Cite

Haydur, H., Vlasenko, V., & Petrova, O. (2025). SECURITY OF LARGE LANGUAGE MODELS: RISKS, THREATS, AND SECURITY APPROACHES. Electronic Professional Scientific Journal «Cybersecurity: Education, Science, Technique», 1(29), 664–675. https://doi.org/10.28925/2663-4023.2025.29.918