ЕФЕКТИВНІСТЬ ПЛАТФОРМИ NVIDIA CUDA ДЛЯ СИСТЕМАТИЧНОГО ПОШУКУ ПРОСТИХ ЧИСЕЛ

Pavlo Chykunov; Serhii Manakov; Olena Trofymenko

doi:10.28925/2663-4023.2026.33.1215

Authors

Pavlo Chykunov National University "Odessa Law Academy" https://orcid.org/0000-0003-4959-7744
Serhii Manakov National University "Odessa Law Academy" https://orcid.org/0000-0001-5930-4592
Olena Trofymenko National University "Odessa Law Academy" https://orcid.org/0000-0001-7626-0886

DOI:

https://doi.org/10.28925/2663-4023.2026.33.1215

Keywords:

high-performance computing; prime numbers; GPGPU; CUDA; C++; pseudorandom number generation; sieving algorithms; parallel computing; profiling.

Abstract

The article is devoted to the study of the efficiency of using the NVIDIA CUDA platform for systematic search for prime numbers in the range 1÷10⁹. The aim of this study is to optimize the processes of searching and testing prime numbers using GPGPU. The tasks are CPU and CUDA software implementation of sieving algorithms (sieve of Eratosthenes, wheel factorization, Atkin sieve, Sundaram sieve), probabilistic tests of primality (Miller-Rabin, Fermat, Solovay-Strassen, Lucas-Lehmer), as well as comparing their speed and scalability, determining the advantages and limitations of GPGPU. The algorithms of the sieve of Eratosthenes, Sundaram, wheel factorization, Miller-Rabin and Luc-Lemaire tests for CPU and GPU are implemented using Visual Studio and NVIDIA CUDA Toolkit. Experiments were conducted on ranges from one million to one billion numbers with search time, the number of primes found, and the maximum value recorded. Microsoft Concurrency Visualizer was used to assess system characteristics, which allowed analyzing CPU resource consumption, synchronization level, and load distribution efficiency. The authors developed an integral performance indicator for search algorithms that considers bandwidth, parallelization efficiency, and synchronization losses. The results showed a significant reduction in search time in CUDA implementations for algorithms with high parallelism. The Sieve of Eratosthenes provides a stable acceleration of 2 to 8 times, wheel factorization – up to 32 times, and the Miller-Rabin test – up to 3 times. At the same time, the GPU approach revealed limitations for sequential algorithms: the Sundaram sieve works up to 3 times slower, the Luc-Lemaire test – up to 4 times. Profiling revealed a high level of synchronization (over 80%), which reduces efficiency and indicates the possibility of further optimization. The results also showed an average decrease in CPU load of 20%, which opens prospects for creating hybrid computing systems with a combination of CPU and GPU resources. Based on the study, recommendations were formulated for the selection of algorithms and approaches to their implementation on GPUs for high-performance computing.

Downloads

Download data is not yet available.

References

Durant, L., Preda, M., Woltman, G., & Blosser, A. (2024). Discovery of the 52nd Mersenne prime 2^136279841-1. Great Internet Mersenne Prime Search (GIMPS). https://www.mersenne.org/primes/press/M136279841.html

Bentz, J., Kumar, R., & Singh, A. (2025). NVIDIA Blackwell and NVIDIA CUDA 12.9 introduce family-specific architecture features. NVIDIA Technical Blog. https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/

Bahig, H. M., Hazber, M. A. G., Al-Utaibi, K., & Nassr, D. I. (2022). Efficient sequential and parallel prime sieve algorithms. Symmetry, 14(12), 2527. https://doi.org/10.3390/sym14122527

Frąckiewicz, M. (2025). High-performance computing highlights (June-July 2025): Exascale era, HPC-AI convergence, and global supercomputing advances. TechStock. https://ts2.tech/en/high-performance-computing-highlights-june-july-2025-exascale-era-hpc-ai-convergence-and-global-supercomputing-advances/

Månsson, J. (2021). Comparative study of CPU and GPGPU implementations of the sieves of Eratosthenes, Sundaram and Atkin [Master’s thesis, Blekinge Institute of Technology]. DiVA. https://www.diva-portal.org/smash/get/diva2:1531686/FULLTEXT01.pdf

Asaduzzaman, A., Maiti, A., & Yip, C. (2014). Fast effective deterministic primality test using CUDA/GPGPU. International Journal of Computer Applications, 98(11), 37-41. https://doi.org/10.5120/17234-7625

NVIDIA. (2025). CUDA C++ best practices guide 12.9. NVIDIA Developer Documentation. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/

Ghidarcea, M., & Popescu, D. (2024). Prime time tactics: Sieve tweaks and boosters. Algorithms, 17(7), 291. https://doi.org/10.3390/a17070291

Naik, H., Mishra, M., Routray, G., & Behera, M. (2020). Performance optimization by integrating memoization and MPI_Info object for sieve of prime numbers. International Journal of Computer Applications, 177(47), 34-38. https://doi.org/10.5120/ijca2020920114

Seizert, C., Garland, M., & Temam, O. (2018). A highly efficient multi-GPU implementation of the sieve of Eratosthenes. Journal of Parallel and Distributed Computing, 113, 90-100. https://doi.org/10.1016/j.jpdc.2017.10.016

Chykunov, P., & Chernetskyi, V. (2023). Organization of prime number search process using NVIDIA CUDA. In Modern technologies in power engineering, electromechanics, control systems and mechanical engineering: Proceedings of the VI All-Ukrainian Scientific-Practical Internet Conference (pp. 40-41). Kharkiv. https://www.nnppi.in.ua/index.php/abit/2-uncategorised/270-naukovi-konferentsiyi

Shafie Khorassani, K., Hashmi, J., Chu, C. H., Chen, C. C., Subramoni, H., & Panda, D. K. (2021). Designing a ROCm-aware MPI library for AMD GPUs: Early experiences. In High Performance Computing (Vol. 12728). Springer. https://doi.org/10.1007/978-3-030-78713-4_7

Shama, E., & Grant, R. (2025). NAV: A comparative analysis tool for Nsight Systems GPU traces. In 2025 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) (pp. 536-543). IEEE. https://doi.org/10.1109/PDP66500.2025.00082

Massimo, J. (2020). An analysis of primality testing and its use in cryptographic applications [Doctoral dissertation, Royal Holloway, University of London]. https://pure.royalholloway.ac.uk/ws/portalfiles/portal/39023193/2020MassimoJPhD.pdf

Chykunov, P. O., Manakov, S. Yu., & Trofymenko, O. H.(2025).Efficiency of algorithms for systematic search of prime numbers. Scientific Works of Donetsk National Technical University. Series: Problems of Modeling and Design Automation,2(22),146-154.https://doi.org/10.31474/2074-7888-2025-2-22-146-154

Luo, W., Chen, X., Wang, Y., & Li, Z. (2025). Dissecting the NVIDIA Hopper architecture through microbenchmarking and multiple level analysis. arXiv. https://doi.org/10.48550/arXiv.2501.12084

Chykunov, P., & Kotov, A. (2023). Profiling multithreaded programs in the Visualizer Concurrency. In Modern technologies in power engineering, electromechanics, control systems and mechanical engineering: Proceedings of the VI All-Ukrainian Scientific-Practical Internet Conference (pp. 36-38). Kharkiv. https://www.nnppi.in.ua/index.php/abit/2-uncategorised/270-naukovi-konferentsiyi

EFFICIENCY OF THE NVIDIA CUDA PLATFORM FOR SYSTEMATIC PRIME NUMBER SEARCH

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

index

Language

Make a Submission

counter

Information

Developed By

Current Issue