A. E. Maslov, A. A. Zorin Performance Analysis of Vectorized Algorithms
A. E. Maslov, A. A. Zorin Performance Analysis of Vectorized Algorithms

This paper is devoted to evaluating the efficiency of vectorization for algorithms, which are used in various tasks in order to improve performance. Rational use cases for the SIMD extension are determined. The possibilities of achieving the declared theoretical limit of performance increase are determined. Comparison of use of SSE and AVX extensions for various data types (double, float, complex float and double) is made.


vectorization; SIMD; SSE; AVX; dot product; convolution; correlation.

PP. 50-61.

DOI 10.14357/20718632220405

1. J. M. Cebrián, M. Jahre and L. Natvig, "Optimized hardware for suboptimal software: The case for SIMD-aware benchmarks," 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014, pp. 66-75, doi: 10.1109/ISPASS.2014.6844462.
2. Jakobs, T., Naumann, B. & Rünger, G. Performance and energy consumption of the SIMD Gram–Schmidt process for vector orthogonalization. J Supercomput 76, 1999– 2021 (2020).
3. Cui, C., Zhang, X., Jin, Z. (2019). Performance Analysis of Existing SIMD Architectures. In: Xu, W., Xiao, L., Li, J., Zhu, Z. (eds) Computer Engineering and Technology. NCCET 2019. Communications in Computer and Information Science, vol 1146. Springer, Singapore.
4. L. Zhang, X. Yang and W. Yu, "Acceleration study for the FDTD method using SSE and AVX instructions," 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), 2012, pp. 2342-2344, doi: 10.1109/CECNet.2012.6201608.
5. J. M. Cebrián, L. Natvig and J. C. Meyer, "Improving Energy Efficiency through Parallelization and Vectorizationon Intel Core i5 and i7 Processors," 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, 2012, pp. 675-684, doi: 10.1109/SC.Companion.2012.93.
6. Intel Intrinsics Guide. Available at:
index.html# (accessed July 5, 2022).
7. Shahbahrami A, Juurlink B, Vassiliadis S (2005) Efficient vectorization of the FIR filter. In: Proc b16th annual workshop on circuits, systems and signal processing (ProRISC2005), November, pp 432–437
8. Šverko, Z.; Vrankić, M.; Vlahinić, S.; Rogelj, P. Complex Pearson Correlation Coefficient for EEG Connectivity Analysis. Sensors 2022, 22, 1477.

2024 / 02
2024 / 01
2023 / 04
2023 / 03

© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".