E. I. Andreeva, V. V. Arlazarov, A. V. Gayer, E. P. Dorokhov, A.V. Sheshkus, O.A. Slavin Document Recognition Method Based on Convolutional Neural Network Invariant to 180 Degree Rotation Angle
E. I. Andreeva, V. V. Arlazarov, A. V. Gayer, E. P. Dorokhov, A.V. Sheshkus, O.A. Slavin Document Recognition Method Based on Convolutional Neural Network Invariant to 180 Degree Rotation Angle


In this work we deal with the problem of recognition of printed document, captured by scanned devices and mobile phones. Recognition of documents’ images rotated by 180 degrees, by known approaches involves orientation detection of image, then rotation if necessary, and the actual document image recognition in the correct orientation. The proposed approach based on convolutional neural network that is invariant to the angle of rotation by 180 degrees, eliminates the steps of orientation detection and image rotation. This speeds up the recognition process on mobile platforms, which performance is currently concedes to server and desktop platforms. Recognition of two data sets was considered: scanned images of structured national documents and public SmartDoc dataset, which contains images captured by mobile phones. For this datasets the accuracy of document recognition was estimated. The accuracy of the orientation detection using the proposed method on the considered stands is 100%, which exceeds the accuracy of the orientation detections of the methods described in the works from the list of references.


document image recognition; orientation detection; rotation-invariant; image processing; mobile platforms.

PP. 87-93.

DOI 10.14357/20718632190408


1. D. S. Bloomberg, G. E. Kopec, and L. Dasari, “Measuring document image skew and orientation,” in Proc. SPIE Document Recognition II, pp. 302–316, (San Jose, CA, USA), Feb. 1995.
2. R. S. Caprari, “Algorithm for text page up/down orientation determination,” Pattern Recognition Letters 21(4), pp. 311–317, 2001
3. B. T. Avila and R. D. Lins, “A fast orientation and skew detection algorithm for monochromatic document ´images,” in DocEng ’05: Proc. ACM Symposium on Document Engineering, 2005, pp. 118–126. doi:10.1145/1096601.1096631
4. J. van Beusekom, F. Shafait, T. M. Breuel, "Resolution independent skew and orientation detection for document images", Proc. SPIE 7247, Document Recognition and Retrieval XVI, 72470K (19 January 2009); doi: 10.1117/12.807735
5. S. Lu, C. L. Tan, “Automatic document orientation detection and categorization through document vectorization”. In: K. Nahrstedt, M. Turk, Y. Rui, W. Klas, K. Mayer-Patel (eds.) Proc. 14th ACM International Conference on Multimedia October 23-27, 2006, Santa Barbara, CA, USA. pp. 113-116.
6. Y. Rangoni, F. Shafait, J. van Beusekom & T. M Breuel, “Recognition driven page orientation detection”. 2009 16th IEEE International Conference on Image Processing (ICIP). doi:10.1109/icip.2009.5413722
7. V. Konya, S. Eickeler & C. Seibert, “Fast seamless skew and orientation detection in document images”. 20th International Conference on Pattern Recognition, 2010. doi:10.1109/icpr.2010.474
8. URL:
9. E. Limonova, D. Ilin, D. Nikolaev, “Improving Neural Network Performance on SIMD Architectures”. Proc. SPIE 9875, Eighth International Conference on Machine Vision (ICMV 2015), 98750L (8 December 2015); doi:10.1117/12.2228594
10. V. Gayer, A. V. Sheshkus, Y. S. Chernyshova, “Effective real-time augmentation of training dataset for the neural networks learning” iCMV-2018
11. L. Wa, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus, “Regularization of Neural Networks using DropConnect” Proc. ICML'13 30th International Conference on International Conference on Machine Learning, vol. 28, 2013, pp. 1058-1066.
12. B. Graham, "Fractional max-pooling." arXiv preprint arXiv:1412.6071 (2014).
13. DA. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (ELUs),” arXiv:1511.07289, 2015.
14. J. Zhao, M. Mathieu, R. Goroshin, and Y. LeCun, “Stacked What-Where Auto-encoders,” arXiv:1506.02351.
15. CY. Lee, P.W. Gallagher, and Z. Tu, ``Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree,'' arXiv:1509.08985.
16. J.-C. Burie, J. Chazalon, M. Coustaty, S. Eskenazi, M. M. Luqman, M. Mehri, N. Nayef, J.-M. OGIER, S. Prum and M. Rusinol, “ICDAR2015 Competition on Smartphone Document Capture and OCR (SmartDoc)”, In 13th International Conference on Document Analysis and Recognition (ICDAR), 2015.
17. L. Blando, J. Kanai, and T. Nartker, “Prediction of OCR accuracyusing simple image features,” in International Conference on DocumentAnalysis and Recognition, vol. 1, 1995, pp. 319–322.
18. Chen, L., Wang, S., Fan, W., Sun, J., & Satoshi, N, “Deep learning based language and orientation recognition in document analysis”. 13th International Conference on Document Analysis and Recognition (ICDAR). 2015. doi:10.1109/icdar.2015.7333799
19. R. Wang, S. Wang, & J. Sun, “Offset Neural Network for Document Orientation Identification”. 13th IAPR International Workshop on Document Analysis Systems (DAS). 2018. doi:10.1109/das.2018.12
20. K. Bulatov, V. Arlazarov, T. Chernov, O. Slavin, and D. Nikolaev, “Smart IDReader: Document recognition in video stream” The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017), Workshops and Tutorials: November 9-12, Kyoto, Japan, 2017 – p. 39-44. ISSN: 2379-2140


2024 / 01
2023 / 04
2023 / 03
2023 / 02

© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".