Журнал «Информационные технологии и вычислительные системы» - M. G. Lobanov, D. L. Sholomov On the Acceleration of the Convolutional Neural Network Architecture Based on Resnet in the Task of Road Scene Objects Recognition

Просматривается номер 2019 / 03

IMAGE PROCESSING METHODS

K. O. Sorokina, V. A. Fedorenko, P. V. Giverts Evaluation of the Similarity of Images of Breech Face Marks Using the Method of Correlation Cells

A. V. Maltsev Methods for Real&Time Distributed Imitation of Indirect Shading in Virtual Environment on GPU

S. S. Magazov Image Recovery on Defective Pixels of a CMOS and CCD Arrays

PATTERN RECOGNITION

Yu. A. Kotov Comparative Analysis of Four Methods for Identifying Letters of Texts

M. G. Lobanov, D. L. Sholomov On the Acceleration of the Convolutional Neural Network Architecture Based on Resnet in the Task of Road Scene Objects Recognition

M. A. Povolotskiy, D. V. Tropin, T. S. Chernov, B. I. Savelyev Dynamic Programming Approach to Textual Structured Objects Segmentation in Images

MATHEMATICAL MODELING

G. P. Akimova, A. V. Solovyev, I. A. Tarkhanov Modeling the Reliability of Distributed Information Systems

Yu. G. Phylippov, V. F. Nikitin, E. V. Mikhalchenko, L. I. Stamov Numerical Three-Dimensional Modeling of Detonation Wave Rotation in a Detonaton Engine

A. V. Ilyin, V. D. Ilyin Solving Situationally Definable Linear Problems of Resource Planning: a Review of Updated Technology


	M. G. Lobanov, D. L. Sholomov On the Acceleration of the Convolutional Neural Network Architecture Based on Resnet in the Task of Road Scene Objects Recognition
Abstract. Recent approaches to road scene objects detection based on convolutional neural networks have reached an acceptable level to be used in autonomous vehicle control and ADAS systems. However, the best modern network architectures are rather heavy and cannot be integrated in real-time systems. Thus the most actual problem is to accelerate networks and to find the optimal balance between their quality and performance. This paper proposes a method to facilitate the architecture of the Deformable Convolutional Network based on ResNet backbone that provides a threefold increase in the inference performance. At the same time, the quality of detection of road scene objects is reduced not so significantly. In addition, the paper compares the quality of the network of this architecture trained on different open datasets – BDD and MS-COCO. Keywords: object detection, road scene objects, deformable convolutional network, ResNet, ADAS, convolutional network acceleration, BDD, MS-COCO, pedestrian detection, vehicle detection. PP. 57-65. DOI 10.14357/20718632190305 Reference 1. Prun V.E., Postnikov V.V., Sadekov R.N., Sholomov D.L. “Development of Active Safety Software of Road Freight Transport, Aimed at Improving Inter-City Road Safety, Based on Stereo Vision Technologies and Road Scene Analysis” // Proceedings of the Scientific-Practical Conference “Research and Development – 2016”, Springer, Cham, pp.209-218. – 2017 2. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, 2014. 3. R. Girshick, “Fast R-CNN,” in ICCV, 2015 4. S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. TPAMI, 2016. 5. J. Dai, Y. Li, K. He, and J. Sun. R-fcn: Object detection via region-based fully convolutional networks. In NIPS, 2016. 6. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei. Deformable convolutional networks. ICCV, 2017. 7. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV. 2014. 8. F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell. BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling. ArXiv e-prints, 2018. 9. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 10. G. Neuhold, T. Ollmann, S. R. Bulo, and P. Kontschieder, “The mapillary vistas dataset for semantic understanding of street scenes,” in Proceedings of the International Conference on Computer Vision (ICCV), Venice, Italy, 2017, pp. 22–29. 11. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 12. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition, 2009. 13. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 14. Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994 15. X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, 2010. 16. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML. (2010) 17. K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. In European Conference on Computer Vision, pages 630–645. Springer, 2016.

2025 / 02

2025 / 01

2024 / 04

2024 / 03

Abstract.

Keywords: