P.V.Bezmaternyh, E.L.Pliskin and V.V.Farsobina Information system for structured documents OCR quality control
P.V.Bezmaternyh, E.L.Pliskin and V.V.Farsobina Information system for structured documents OCR quality control


To date, the computational experiment remains a daily routine procedure during development of machine learning (ML) based software, such as optical character recognition (OCR). Well-known approach of «continuous integration» (CI) is a natural choice for the development of ML software. CI involves frequent centralized program builds and execution of bench tests. This generates a large amount of test results, which should be readily available to developers for error analysis and software version comparison. This article suggests the architecture of the automatic quality control system for the structured documents OCR, including collection, storage and display of bench test results. The results of all software tests are loaded into the database. Builds and bench tests can execute on virtual servers running various operating systems (OS). For stability, the web-server and database use different hardware from the build server. Web technologies are used both for automatic uploading of test results to the database and for servicing user queries.


computer experiment, machine learning, data processing, web applications, regression testing, continuous integration, quality control.

PP. 94-102.


1. Duvall PM, Glover A, Matyas S. Continuous integration. Addison-Wesley Professional; 2007.
2. Elbaum, S., Rothermel, G. and Penix, J., 2014, November. Techniques for improving regression testing in continuous integration development environments. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (pp. 235-245). ACM.
3. Kastornova V.A., Mozhaeva M.G. Artificial neural networks as modern means of informatization. Information environment of education and science, 2012, №1 (7).
4. Open source automation server Jenkins,
5. Smart, J.F. «Jenkins: The Definitive Guide». O'Reilly Media, Inc. 2011. ISBN: 1449305350 9781449305352.
6. Kosenko D.V., Voronova L.I. and Voronov V.I., 2014. Development of software for processing complex-structured data of a scientific experiment. Bulletin of Nizhnevartovsk State University, № 3.
7. Polevoy D.V. and Samoilov O.S., 2009. Quality control in development of systems for optical recognition of printed text. Technologies of programming and data storage / Proceedings of the Institute of System Analysis of the Russian Academy of Sciences, 45 (2009): 251-259.
8. Polevoy D.V., 2011. Actual problems of creating mass data input systems using optical recognition for the transformation of complex structured paper documents in hybrid information systems // System analysis and information technologies: Proc. Fourth Intnl. Conf. (Abzakovo, Russia, August 17-23, 2011), vol.2, Chelyabinsk: Publishing house Chelyab. state University, 2011. p. 192-195.
9. Arlazarov V.L., Kuratov P.A. and Slavin O.A., 2000. Recognition of lines of printed texts. Collected works of ISA RAS «Methods and tools for working with documents». Moscow: Editorial URSS, 2000, p.31-51.
10. Bulatov K., Arlazarov V., Chernov T., Slavin O. and Nikolaev D., 2017. Smart IDReader: Document recognition in video stream // The 14th IAPR International Conference on Document Analysis and Recognition (ICDAR 2017), Workshops and Tutorials: November 9-12, Kyoto, Japan, 2017 – p. 39-44. ISSN: 2379-2140,
doi: 10.1109/ICDAR.2017.347.
11. Polevoy D., Bulatov K., Skoryukina N., Chernov T., Arlazarov V.V. and Sheshkus A.V., 2016. Key aspects of document recognition using small-sized digital cameras. Vestnik RFBR, 2016, No. 4 (92), pp. 97-108.
12. Skoryukina N., Chernov T., Bulatov K., Nikolaev D. and Arlazarov V.L., 2016. Snapscreen: TV-stream frame search with projectively distorted and noisy query. Proc. SPIE 10341, Ninth International Conference on Machine Vision (ICMV 2016), 103410Y, pp. 1-5. doi:10.1117/12.2268735.
13. Arlazarov V.L., Marchenko A. and Sholomov D., 2014. Cumulative contexts in the recognition problem. Proceedings of the ISA RAS, 2014, Vol. 64. No. 4, p. 64-72.
14. Budakovsky M.V. and Mikhailov A.A., 2014. The problems of formalizing markup of a graphic image of a document. Proceedings of ISA RAS, 2014, vol.64, № 4, p. 84-88.


2024 / 02
2024 / 01
2023 / 04
2023 / 03

© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".