EFFICIENCY OF THE NVIDIA CUDA PLATFORM FOR SYSTEMATIC PRIME NUMBER SEARCH
DOI:
https://doi.org/10.28925/2663-4023.2026.33.1215Keywords:
high-performance computing; prime numbers; GPGPU; CUDA; C++; pseudorandom number generation; sieving algorithms; parallel computing; profiling.Abstract
The article is devoted to the study of the efficiency of using the NVIDIA CUDA platform for systematic search for prime numbers in the range 1÷10⁹. The aim of this study is to optimize the processes of searching and testing prime numbers using GPGPU. The tasks are CPU and CUDA software implementation of sieving algorithms (sieve of Eratosthenes, wheel factorization, Atkin sieve, Sundaram sieve), probabilistic tests of primality (Miller-Rabin, Fermat, Solovay-Strassen, Lucas-Lehmer), as well as comparing their speed and scalability, determining the advantages and limitations of GPGPU. The algorithms of the sieve of Eratosthenes, Sundaram, wheel factorization, Miller-Rabin and Luc-Lemaire tests for CPU and GPU are implemented using Visual Studio and NVIDIA CUDA Toolkit. Experiments were conducted on ranges from one million to one billion numbers with search time, the number of primes found, and the maximum value recorded. Microsoft Concurrency Visualizer was used to assess system characteristics, which allowed analyzing CPU resource consumption, synchronization level, and load distribution efficiency. The authors developed an integral performance indicator for search algorithms that considers bandwidth, parallelization efficiency, and synchronization losses. The results showed a significant reduction in search time in CUDA implementations for algorithms with high parallelism. The Sieve of Eratosthenes provides a stable acceleration of 2 to 8 times, wheel factorization – up to 32 times, and the Miller-Rabin test – up to 3 times. At the same time, the GPU approach revealed limitations for sequential algorithms: the Sundaram sieve works up to 3 times slower, the Luc-Lemaire test – up to 4 times. Profiling revealed a high level of synchronization (over 80%), which reduces efficiency and indicates the possibility of further optimization. The results also showed an average decrease in CPU load of 20%, which opens prospects for creating hybrid computing systems with a combination of CPU and GPU resources. Based on the study, recommendations were formulated for the selection of algorithms and approaches to their implementation on GPUs for high-performance computing.
Downloads
References
Durant, L., Preda, M., Woltman, G., & Blosser, A. (2024). Discovery of the 52nd Mersenne prime 2^136279841-1. Great Internet Mersenne Prime Search (GIMPS). https://www.mersenne.org/primes/press/M136279841.html
Bentz, J., Kumar, R., & Singh, A. (2025). NVIDIA Blackwell and NVIDIA CUDA 12.9 introduce family-specific architecture features. NVIDIA Technical Blog. https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
Bahig, H. M., Hazber, M. A. G., Al-Utaibi, K., & Nassr, D. I. (2022). Efficient sequential and parallel prime sieve algorithms. Symmetry, 14(12), 2527. https://doi.org/10.3390/sym14122527
Frąckiewicz, M. (2025). High-performance computing highlights (June-July 2025): Exascale era, HPC-AI convergence, and global supercomputing advances. TechStock. https://ts2.tech/en/high-performance-computing-highlights-june-july-2025-exascale-era-hpc-ai-convergence-and-global-supercomputing-advances/
Månsson, J. (2021). Comparative study of CPU and GPGPU implementations of the sieves of Eratosthenes, Sundaram and Atkin [Master’s thesis, Blekinge Institute of Technology]. DiVA. https://www.diva-portal.org/smash/get/diva2:1531686/FULLTEXT01.pdf
Asaduzzaman, A., Maiti, A., & Yip, C. (2014). Fast effective deterministic primality test using CUDA/GPGPU. International Journal of Computer Applications, 98(11), 37-41. https://doi.org/10.5120/17234-7625
NVIDIA. (2025). CUDA C++ best practices guide 12.9. NVIDIA Developer Documentation. https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/
Ghidarcea, M., & Popescu, D. (2024). Prime time tactics: Sieve tweaks and boosters. Algorithms, 17(7), 291. https://doi.org/10.3390/a17070291
Naik, H., Mishra, M., Routray, G., & Behera, M. (2020). Performance optimization by integrating memoization and MPI_Info object for sieve of prime numbers. International Journal of Computer Applications, 177(47), 34-38. https://doi.org/10.5120/ijca2020920114
Seizert, C., Garland, M., & Temam, O. (2018). A highly efficient multi-GPU implementation of the sieve of Eratosthenes. Journal of Parallel and Distributed Computing, 113, 90-100. https://doi.org/10.1016/j.jpdc.2017.10.016
Chykunov, P., & Chernetskyi, V. (2023). Organization of prime number search process using NVIDIA CUDA. In Modern technologies in power engineering, electromechanics, control systems and mechanical engineering: Proceedings of the VI All-Ukrainian Scientific-Practical Internet Conference (pp. 40-41). Kharkiv. https://www.nnppi.in.ua/index.php/abit/2-uncategorised/270-naukovi-konferentsiyi
Shafie Khorassani, K., Hashmi, J., Chu, C. H., Chen, C. C., Subramoni, H., & Panda, D. K. (2021). Designing a ROCm-aware MPI library for AMD GPUs: Early experiences. In High Performance Computing (Vol. 12728). Springer. https://doi.org/10.1007/978-3-030-78713-4_7
Shama, E., & Grant, R. (2025). NAV: A comparative analysis tool for Nsight Systems GPU traces. In 2025 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) (pp. 536-543). IEEE. https://doi.org/10.1109/PDP66500.2025.00082
Massimo, J. (2020). An analysis of primality testing and its use in cryptographic applications [Doctoral dissertation, Royal Holloway, University of London]. https://pure.royalholloway.ac.uk/ws/portalfiles/portal/39023193/2020MassimoJPhD.pdf
Chykunov, P. O., Manakov, S. Yu., & Trofymenko, O. H.(2025).Efficiency of algorithms for systematic search of prime numbers. Scientific Works of Donetsk National Technical University. Series: Problems of Modeling and Design Automation,2(22),146-154.https://doi.org/10.31474/2074-7888-2025-2-22-146-154
Luo, W., Chen, X., Wang, Y., & Li, Z. (2025). Dissecting the NVIDIA Hopper architecture through microbenchmarking and multiple level analysis. arXiv. https://doi.org/10.48550/arXiv.2501.12084
Chykunov, P., & Kotov, A. (2023). Profiling multithreaded programs in the Visualizer Concurrency. In Modern technologies in power engineering, electromechanics, control systems and mechanical engineering: Proceedings of the VI All-Ukrainian Scientific-Practical Internet Conference (pp. 36-38). Kharkiv. https://www.nnppi.in.ua/index.php/abit/2-uncategorised/270-naukovi-konferentsiyi
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Павло Чикунов, Сергій Манаков, Олена Трофименко

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.