|
D.L. Sholomov, A.G. Volkov, D.V. Polevoy Document identification in terms of linear programming |
|
Abstract. The paper presents a method for document template description by rules for relative location of primitive elements. Such description reduces the problem of identifying a weakly structured document to the problem of integer linear programming. In this case, the maximized functional describes the document template matching rate and the rules for relative location are transformed into a number of linear inequalities. Keywords: document recognition, template description, template matching, flexible forms, document identification, linear programming, mass document input, graphical primitives, text recognition, invoice recognition. PP. 74-80. Reference1. Postnikov V.V. Automatic identification and recognition of structured documents. // Dissertation for the degree of candidate of technical sciences. Moscow, 2001. 2. Cesarini F., Gori M., Marinai S., and Soda G., INFORMys: A Flexible Invoice-Like Form-Reader System. // IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 7, pp. 730-745, July 1998. 3. Cracknell C., Downton A.C., Du L., An Object-Oriented form Description Language and Approach to Handwritten Form Processing. // ICDAR’97, IEEE, 1997 4. Peng H., Long F., Chi Z., and Siu W.-C., Document image template matching based on component block list. // Pattern Recognition Letters, 2001 5. Taha H.A. Operations research: An introduction. // M.: Williams, Ed.6, 2001. 6. Shevchenko V.N., Zolotykh N.Yu. Linear and integer linear programming. // Ed. Nizhny Novgorod State University, 2002. 7. Sholomov D.L. Syntactic methods of contextual processing in problems of text recognition. // Dissertation for the degree of candidate of technical sciences. Moscow, 2007. 8. Sholomov D.L., Postnikov V.V., Marchenko A.A., Uskov A.V. Post-processing of OCR Results Using Automatically Constructed Partially Defined Syntax. // Proceedings of the Institute for System Analysis RAS, Vol. 16. pp. 146-163, 2005. 9. Sholomov D.L. Correction of recognized text using classification methods. // Proceedings of the Institute for System Analysis RAS, Vol. 29. pp. 356-380, 2007. 10. Arlazarov V.V., Malykh V.A., Sholomov D.L. Recognition of the document images with the usage of “Roulette” algorithm. // Proceedings of the Institute for System Analysis RAS. Vol. 63, №4, pp. 35-38, 2013.
|