O.A. Slavin, V.L. Arlazarov Method for classifying recognized pages of administrative documents on the basis of text key points |
Abstract. The paper considers the problem of classification of recognized pages of business documents. Administrative documents used in document circulation, including in the exchange of documents between organizations, have a certain standardization, they can be both unstructured and structured. In banks or insurance companies, such documents as a power of attorney, a contract, a card with samples of signatures and seals, a charter, a contract, an account, registration certificates, etc. are often needed. When creating and maintaining electronic archives, paper documents are digitized, and digital images of pages (page scans) can be recognized and analyzed. One of the tasks of the analysis is the classification of the page image, which consists in verifying that the page image belongs to a particular class. A simple method for classifying administrative documents that yields acceptable results is proposed. Keywords: classification of texts; recognition of documents; OCR; recognition error; template matching. PP. 32-42.