Журнал «Труды Института системного анализа Российской академии наук» - V.A. Malykh, V.A. Lyalin On Classification of Noisy Texts

Просматривается номер 2018-S1

Data mining and image recognition

N.S. Skoryukina, A.N. Milovzorov, D.V. Polevoy, V.V. Arlazarov Paintings recognition in uncontrolled conditions using one-shot learning

A.E. Zhukovsky Methods for interframe integration of document detection results in a video stream of a mobile device

I.A. Kunina, E. I. Panfilova, M.A. Povolotskiy Zebra-crossing detection on road images using dynamic time warping

O.A. Slavin, V.L. Arlazarov Method for classifying recognized pages of administrative documents on the basis of text key points

O.O. Petrova, K.B. Bulatov Methods of machine-readable zone recognition results post-processing

A.E. Marchenko, E.I. Ershov, D.A. Shepelev, D.S. Sidorchuk, V.P. Bozhkova, D.P. Nikolaev Designing of language of description of observable properties of recognized objects in the absence of samples

Intellectual systems and technologies

E.E. Limonova, N.L. Rzhenev, A.V. Uskov, M.I. Neiman-zade Fast implementation of Hamming distance on VLIW-architectures on the example of Elbrus platform

V.V. Arlazarov, K.B. Bulatov, A.V. Uskov A model of object recognition system in video stream of a mobile device

A.A. Ivanova, S.A. Gladilin, A.E. Zhukovsky, E.L. Pliskin Database for the administrative accounting of scientific publications

A.S. Ingacheva, A.V. Sheshkus, T. S. Chernov, E.E. Limonova, V.V. Arlazarov X-ray computed tomography scanner – a new tool in recognition

N.O. Beshaposhnikov, A.G. Kushnirenko, A.A. Levin A method for auto-calibration of the educational robot control parameters using computer vision library OpenCV

Image and signal processing

A.E. Zhukovsky, E.E. Limonova, D.P. Nikolaev Exact implementation of common image processing algorithms using fully convolutional networks

V.E. Prun Reducing the influence of high-absorbing inclusions on CT reconstructions using algebraic reconstruction technique

B.I. Savelyev, I.B. Mamay, D.P. Nikolaev, V.L. Arlazarov, K.B. Bulatov, N.S. Skoryukina A method of projective transformations graph adjustment for panorama stitching problem for images of planar objects

D.V. Tropin, D.P. Nikolaev, D.G. Slugin The method of image alignment based on sharpness maximization

J.A. Shemiakina, A.E. Zhukovsky, I.A. Konovalenko, D.P. Nikolaev Algorithm for automatic framing of digital images under projective transformation

MACHINE LEARNING

A.V. Gayer, A.V. Sheshkus, Y.S. Chernyshova Augmentation on the fly for the neural networks learning

V.V. Arlazarov, D.P. Matalov, S.A. Usilin Localization of the seal on the identity document image using machine learning approach

A.E. Lynchenko, A.V.Sheshkus, V.L.Arlazarov Identity document classifiaction algorithm based on similarity metric robust to projective distortions

V.A. Malykh, V.A. Lyalin On Classification of Noisy Texts

Y.S. Chernyshova, M.A. Aliev, A.V. Sheshkus Optical font recognition of images captured with mobile devices and its application for detecting identity documents forgery

D.A. Ilin Fast words boundaries localization in text fields for low quality document images

D.E. Ivanov, D.V. Polevoy, D.L. Sholomov Selection of informative elements for the training of a lightweight convolutional neural network classifier in the conditions of a strong imbalance of the training sample


	V.A. Malykh, V.A. Lyalin On Classification of Noisy Texts
Abstract. A classic task of text classification was studied in many works, but current approaches mostly devoted to improvement of classification quality for what we call clean corpora, not containing typos. In this work we present results of modern classification models testing in the presence of noise for two languages – English and Russian. Keywords: neural networks; text classification; noise robustness. PP. 174-182. DOI: 10.14357/20790279180520 References 1. Armand Joulin, Edouard Grave, Piotr Bojanowski and Tomas Mikolov. 2016. Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv: 1607.01759. 2. Valentin Malykh. 2018. Robust Word Vectors: Embeddings for Noisy Texts. 3. Yoon Kim, Yacine Jernite, David Sontag and Alexander M. Rush 2016. Character-Aware Neural Language Models. In AAAI, pages 2741–2749. 4. Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 142–150. 5. Rubtsova Yuliya. 2014. Automatic Term Extraction for Sentiment Classification of Dynamically Updated Text Collections into Three Classes In Knowledge Engineering and the Semantic Web, pp140-149, Springer 6. Bochkarev, V. V., Shevlyakova, A. V., and Solovyev, V. D. 2015. The average word length dynamics as an indicator of cultural changes in society. Social Evolution & History, 14(2), 153-175. 7. Cucerzan, S. and Brill, E. 2004. Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. 8. Joulin, A., Grave, E., Bojanowski, P. and Mikolov, T. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759. 9. Howard, J. and Ruder, S. 2018. Fine-tuned Language Models for Text Classification. arXiv preprint arXiv:1801.06146. 10. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. and Polosukhin, I. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (pp. 6000-6010). 11. Xiang Zhang, Junbo Jake Zhao and Yann Le-Cun. 2017. Character-level Convolutional Networks for Text Classification. arXiv preprint arXiv: 1509.01626 12. Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv: 1408.5882 13. KyungHyun Cho, Bart van Merrienboer, Dzmitry Bahdanau and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches arXiv preprint arXiv: 1409.1259 14. Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate arXiv preprint arXiv:1409.0473 15. Bengio, Y., Simard, P. and Frasconi, P. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2), pp.157-166. 16. Tutubalina, Elena, and Nikolenko, Sergey. 2015. Inferring sentiment-based priors in topic models. In Mexican International Conference on Artificial Intelligence, pp. 92-104. 17. Niu, J., Yang, Y., Zhang, S., Sun, Z. and Zhang, W. 2018. Multi-task Character-Level Attentional Networks for Medical Concept Normalization. Neural Processing Letters, pp.1-18. 18. Vinciarelli A. Noisy text categorization, 2005. IEEE Transactions on Pattern Analysis and Machine Intelligence. Dec;27(12):1882-95. 19. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R., 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), pp.1929-1958. 20. Pineda, F.J. 1987. Generalization of back-propagation to recurrent neural networks. Physical review letters, 59(19), p.2229. 21. Glorot, X. and Bengio, Y. 2010, March. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249-256).

2025-75-2

2025-75-1

2024-74-4

2024-74-3

Abstract.

Keywords: