Журнал «Информационные технологии и вычислительные системы» - В. В. Арлазаров "Анализ использования проблемно-ориентированных пакетов данных в научных исследованиях"

Просматривается номер 2022 / 03

В работе рассматривается проблемы создания и использования открытых проблемно-ориентированных пакетов данных для проведения экспериментальных исследований с проверяемыми и воспроизводимыми результатами, на примере опыта создания пакетов семейства MIDV, содержащих изображения и видеопоследовательности идентификационных документов. Проведен анализ опубликованных научных работ в областях компьютерного зрения, обработки изображений и вычислительной лингвистики, использующих эти пакеты данных, описаны основные проблемы, с которыми сталкивались научные группы, и выявлены общие закономерности и принципы, которые могут быть использованы для создания пакетов данных такого класса и для расширения существующих.

Ключевые слова: распознавание текста, анализ документов, пакеты данных, воспроизводимость исследований, OCR, обработка изображений.

DOI 10.14357/20718632220302

Литература

1. V. L. Arlazarov, V. V. Arlazarov, K. B. Bulatov, T. S. Chernov, D. P. Nikolaev, D. V. Polevoy, A. V. Sheshkus, N. S. Skoryukina, O. A. Slavin and S. A. Usilin, “Mobile ID Document Recognition-Coarse-to-Fine Approach,” Pattern Recognit. Image Anal., vol. 32, no 1, pp. 89-108, 2022, DOI: 10.1134/S1054661822010023.

2. A. Chandra and R. Stefanus, "An End-to-End Optical Character Recognition Pipeline for Indonesian Identity Card," 2021 9th International Conference on Information and Communication Technology (ICoICT), 2021, pp. 307-312, DOI: 10.1109/IcoICT52021.2021.9527436.

3. V. V. Arlazarov, K. Bulatov, T. Chernov and V. L. Arlazarov, “MIDV-500: A Dataset for Identity Document Analysis and Recognition on Mobile Devices in Video Stream,” Computer Optics, vol. 43, no 5, pp. 818-824, 2019, DOI: 10.18287/2412-6179-2019-43-5-818-824.

4. K. Bulatov, D. Matalov and V. V. Arlazarov, “MIDV-2019: Challenges of the Modern Mobile-Based Document OCR,” ICMV 2019, vol. 11433, pp. 114332N1-114332N6, 2020, DOI: 10.1117/12.2558438.

5. Y. S. Chernyshova, E. V. Emelianova, A. V. Sheshkus and V. V. Arlazarov, “MIDV-LAIT: a challenging dataset for recognition of IDs with Perso-Arabic, Thai, and Indian scripts,” ICDAR 2021, vol. 12822, pp. 258-272, 2021, DOI: 10.1007/978-3-030-86331-9_17.

6. K. B. Bulatov, E. V. Emelyanova, D. V. Tropin, N. S. Skoryukina, Y. S. Chernyshova, A. V. Sheshkus, S. A. Usilin, Z. Ming, J. Burie, M. Luqman and V. V. Arlazarov, “MIDV-2020: A Comprehensive Benchmark Dataset for Identity Document Analysis,” Computer Optics, vol. 46, no 2, pp. 252-270, 2022, DOI: 10.18287/2412-6179-CO-1006.

7. D. V. Tropin, A. M. Ershov, D. P. Nikolaev and V. V. Arlazarov, “Advanced Hough-based method for on-device document localization,” Computer Optics, vol. 45, no 5, pp. 702-712, 2021, DOI: 10.18287/2412- 6179-CO-895.

8. D. V. Tropin, S. A. Ilyukhin, D. P. Nikolaev and V. V. Arlazarov, “Approach for document detection by contours and contrasts,” ICPR 2020, pp. 9689-9695, 2021, DOI: 10.1109/ICPR48806.2021.9413271.

9. D. V. Tropin, I. A. Konovalenko, N. S. Skoryukina, D. P. Nikolaev and V. V. Arlazarov, “Improved algorithm of ID card detection by a priori knowledge of the document aspect ratio,” ICMV 2020, vol. 11605, 116051F, pp. 116051F1-116051F9, 2021, DOI: 10.1117/12.2587029.

10. N. Skoryukina, I. Farajev, K. Bulatov and V. V. Arlazarov, “Impact of geometrical restrictions in RANSAC sampling on the ID document classification,” ICMV 2019, vol. 11433, pp. 1143306-1-1143306-7, 2020, DOI: 10.1117/12.2559306.

11. N. Skoryukina, V. V. Arlazarov and D. P. Nikolaev, “Fast method of ID documents location and type identification for mobile and server application,” ICDAR 2019, pp. 850-857, 2020, DOI: 10.1109/ICDAR.2019.00141.

12. G. Chiron, N. Ghanmi and A. M. Awal, "ID documents matching and localization with multi-hypothesis constraints," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 3644-3651, DOI: 10.1109/ICPR48806.2021.9412437.

13. Chiron, G., Arrestier, F., Awal, A.M. Fast End-to-End Deep Learning Identity Document Detection, Classification and Cropping. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. Lecture Notes in Computer Science, vol 12824. Springer, Cham, 2021. DOI: 10.1007/978-3-030-86337-1_23

14. O. Naparstek, O. Azulai, D. Rotman, Y. Burshtein, P. Staar, U. Barzelay. BusiNet — a Light and Fast Text Detection Network for Business Documents. ArXiv preprints (2207.01220), URL: https://arxiv.org/abs/2207.01220

15. A. V. Sheshkus and D. Nikolaev, “Transfer of a high-level knowledge in HoughNet neural network,” ICMV 2019, vol. 11433, ISSN 0277-786X, ISBN 978-15-10636-44-6, vol. 11433, pp. 1143322-1-1143322-6, 2020, DOI: 10.1117/12.2559454.

16. A. Sheshkus, A. Ingacheva, V. Arlazarov and D. Nikolaev, “HoughNet: neural network architecture for vanishing points detection,” ICDAR 2019, pp. 844-849, 2020, DOI: 10.1109/ICDAR.2019.00140.

17. A. V. Sheshkus, D. P. Nikolaev and V. L. Arlazarov, “Houghencoder: neural network architecture for document image semantic segmentation,” IEEE ICIP 2020, pp. 1946-1950, 2020, DOI: 10.1109/ICIP40778.2020.9191182.

18. J. Shemiakina, I. Konovalenko, D. Tropin and I. Faradjev, “Fast projective image rectification for planar objects with Manhattan structure,” ICMV 2019, vol. 11433, pp. 114331N1-114331N9, 2020, DOI: 10.1117/12.2559630.

19. Baniadamdizaj, S. Localization Using DeepLab in Document Images Taken by Smartphones. In: Digital Interaction and Machine Intelligence. MIDI 2021. Lecture Notes in Networks and Systems, vol 440. Springer, Cham, 2022. DOI: 10.1007/978-3-031-11432-8_6.

20. S. B. Dizaj, M. Soheili and A. Mansouri. A New Image Dataset for Document Corner Localization. International Conference on Machine Vision and Image Processing (MVIP), 2020, pp. 1-4, DOI: 10.1109/MVIP49855.2020.9116896.

21. L. Teplyakov, K. Kaymakov, E. Shvets and D. Nikolaev, “Line detection via a lightweight CNN with a Hough Layer,” ICMV 2020, vol. 11605, pp. 116051B1-116051B10, 2021, DOI: 10.1117/12.2587167.

22. A. Sheshkus, A. Chirvonaya and V. L. Arlazarov, “Tiny CNN for feature point description for document analysis: approach and dataset,” Computer Optics, vol. 46, no 3, pp. 429-435, 2022, DOI: 10.18287/2412-6179-CO-1016.

23. D. P. Matalov, E. E. Limonova, N. S. Skoryukina and V. V. Arlazarov, “RFDoc: memory efficient local descriptors for ID documents localization and classification,” ICDAR 2021, vol. 12822, pp. 209-224, 2021, DOI: 10.1007/978-3-030-86331-9_14.

24. Buonanno, A.; Nogarotto, A.; Cacace, G.; Di Gennaro, G.; Palmieri, F.A.N.; Valenti, M.; Graditi, G. Bayesian Feature Fusion Using Factor Graph in Reduced Normal Form. Appl. Sci., vol. 11, p. 1934, 2021 DOI: 10.3390/app11041934.

25. S. Bakkali, M. M. Luqman, Z. Ming and J. Burie, "Face Detection in Camera Captured Images of Identity Documents Under Challenging Conditions," 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), 2019, pp. 55-60, DOI: 10.1109/ICDARW.2019.30065.

26. K. B. Bulatov, P. V. Bezmaternykh, D. P. Nikolaev and V. V. Arlazarov, “Towards a unified framework for identity documents analysis and recognition,” Computer Optics, vol. 46, no 3, pp. 436-454, 2022, DOI: 10.18287/2412-6179-CO-1024.

27. Y. S. Chernyshova, A. V. Sheshkus and V. V. Arlazarov, “Two-step CNN framework for text line recognition in camera-captured images,” IEEE Access, vol. 8, pp. 32587-32600, 2020, DOI: 10.1109/ACCESS.2020.2974051.

28. A. V. Trusov, E. E. Limonova, D. G. Slugin, D. P. Nikolaev and V. V. Arlazarov, “Fast Implementation of 4-bit Convolutional Neural Networks for Mobile Devices,” ICPR 2020, pp. 9897-9903, 2021, DOI: 10.1109/ICPR48806.2021.9412841.

29. P. K. Zlobin, Y. S. Chernyshova, A. V. Sheshkus and V. V. Arlazarov, “Character Sequence Prediction Method for Training Data Creation in the Task of Text Recognition,” ICMV 2021, vol. 12084, pp. 120840R1-120840R8, 2022, DOI: 10.1117/12.2623773.

30. Liu, Y., James, H., Gupta, O. et al. MRZ code extraction from visa and passport documents using convolutional neural networks. IJDAR vol. 25, pp. 29–39, 2022. DOI: 10.1007/s10032-021-00384-2.

31. Hartl, A., Arth, C., Schmalstieg, D. Real-time Detection and Recognition of Machine-Readable Zones with Mobile Devices. International Conference on Computer Vision Theory and Applications, pp. 79–87, 2015.

32. R. Sánchez-Rivero, P. V. Bezmaternykh, A. Morales-González and K. B. Bulatov, “Assessing the relationship between binarization and OCR in the context of deep learning-based ID document analysis,” IWAIPR 2021, vol. 13055, pp. 134-144, 2021, DOI: 10.1007/978-3-030-89691-1_14.

33. O. O. Petrova, K. B. Bulatov, V. V. Arlazarov and V. L. Arlazarov, “Weighted combination of per-frame recognition results for text recognition in a video stream,” Computer Optics, vol. 45, no 1, pp. 77-89, 2021, DOI: 10.18287/2412-6179-CO-795.

34. K. B. Bulatov, “A Method to Reduce Errors of String Recognition Based on Combination of Several Recognition Results with Per-Character Alternatives,” Bulletin of the South Ural State University, Series: Mathematical Modelling, Programming and Computer Software, vol. 12, no 3, pp. 74-88, 2019, DOI: 10.14529/mmp190307.

35. T. S. Chernov, S. A. Ilyuhin and V. V. Arlazarov, “Application of dynamic saliency maps to video stream recognition systems with image quality assessment,” ICMV 2018, vol. 11041, pp. 110410T1-110410T8, 2019, DOI: 10.1117/12.2522768.

36. Y. A. Shemyakina, E. E. Limonova, N. S. Skoryukina, V. V. Arlazarov and D. P. Nikolaev, “A method of image quality assessment for text recognition on camera-captured and projectively distorted documents,” Mathematics, vol. 9, no 17, pp. 1-22, 2021, DOI: 10.3390/math9172155.

37. K. Bulatov, N. Razumnyi and V. V. Arlazarov, “On optimal stopping strategies for text recognition in a video stream as an application of a monotone sequential decision model,” IJDAR, vol. 22, no 3, pp. 303-314, 2019, DOI: 10.1007/s10032-019-00333-0.

38. K. Bulatov, B. Savelyev and V. V. Arlazarov, “Next integrated result modelling for stopping the text field recognition process in a video using a result model with per-character alternatives,” ICMV 2019, vol. 11433, pp. 114332M1-114332M6, 2020, DOI: 10.1117/12.2559447.

39. K. B. Bulatov and V. V. Arlazarov, “Determining optimal frame processing strategies for real-time document recognition systems,” ICDAR 2021, vol. 12822, pp. 273-288, 2021, DOI: 10.1007/978-3-030-86331-9_18.

40. K. B. Bulatov, N. V. Fedotova and V. V. Arlazarov, “Fast Approximate Modelling of the Next Combination Result for Stopping the Text Field Recognition in a Video Stream,” ICPR 2020, pp. 239-246, 2021, DOI: 10.1109/ICPR48806.2021.9412574.

41. D. V. Polevoy, M. A. Aliev and D. P. Nikolaev, “Choosing the best image of the document owner’s photograph in the video stream on the mobile device,” ICMV 2020, vol. 11605, 116050F, pp. 116050F1-116050F9, 2021, DOI: 10.1117/12.2586939.

42. M. Al-Ghadi, P. Gomez-Kramer, J.-C. Burie. CheckScan: a reference hashing for identity document quality detection. ICMV 2021, 120840J, DOI: 10.1117/12.2623887.

43. E. Myasnikov and A. Savchenko, "Detection of Sensitive Textual Information in User Photo Albums on Mobile Devices," 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON), 2019, pp. 0384-0390, DOI: 10.1109/SIBIRCON48586.2019.8958325.

44. L. Kopeykina and A. V. Savchenko, "Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks," 2019 International Russian Automation Conference (RusAutoCon), 2019, pp. 1-6, DOI: 10.1109/RUSAUTOCON.2019.8867614.

45. H. Ahmed, I. Traore, S. Saad, M. Mamun. Automated detection of unstructured context-dependent sensitive information using deep learning. Internet of Things, vol. 16, 100444, 2021, DOI: 10.1016/j.iot.2021.100444.

46. A. Startseva, A. Vulfin, V. Vasilyev, A. Nikonov and A. Kirillova, "Analysis of Financial Payments Text Labels in the Dynamic Client Profile Construction," 2020 International Conference on Information Technology and Nanotechnology (ITNT), 2020, pp. 1-10, DOI: 10.1109/ITNT49337.2020.9253280.

47. M. Al-Ghadi, Z. Ming, P. Gomez-Kramer, J.-C. Burie. Identity documents authentication based on forgery detection of guilloche pattern. ArXiv preprints, 2206.10989, URL: https://arxiv.org/abs/2206.10989v1.

48. Kada, O., Kurtz, C., van Kieu, C., Vincent, N. Hologram Detection for Identity Document Authentication. In: El Yacoubi, M., Granger, E., Yuen, P.C., Pal, U., Vincent, N. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2022. Lecture Notes in Computer Science, vol 13363. Springer, Cham, 2022. DOI: 10.1007/978-3-031-09037-0_29.

49. C. Chen, L. Zhao, J. Yan, H. Li. A distortion model-based pre-screening method for document image tampering localization under recapturing attack. Signal Processing, vol. 200, 108666, 2022. DOI: 10.1016/j.sigpro.2022.108666.

50. D. V. Polevoy, I. V. Sigareva, D. M. Ershova, V. V. Arlazarov, D. P. Nikolaev, M. Zuheng, M. L. Muhammad and J. Burie, “Document Liveness Challenge dataset (DLC-2021),” J. Imaging, vol. 8, no 7, pp. 181-1-181-12, 2022, DOI: 10.3390/jimaging8070181.