Mathematical models of socio-economic processes
Системная диагностика социально-экономических процессов
Информатика сообществ и формирование социальных сетей
Scientometrics and management science
Компьютерный анализ текстов
A.O. Shelmanov, M.A. Kamenskaya Training semantic role labeler for Russian using automatically annotated corpus
A.O. Shelmanov, M.A. Kamenskaya Training semantic role labeler for Russian using automatically annotated corpus

Abstract.

The paper describes the research of methods for semantic role labeling based on semi-supervised machine learning. We present a method for training semantic role labeler using corpus automatically annotated by baseline dictionary-based (rule-based) semantic parser that improves the performance of the baseline. We also propose a method for labeling arguments of “unknown” predicates that are not present in the semantic dictionary of the baseline parser. The hybrid semantic parser is presented. It uses two models for “known” and “unknown” predicates as well as the dictionary-based parser. The experiments with the manually labeled test corpus in Russian show that modifications proposed in the paper improve recall and overall performance of semantic role labeling.

Keywords:

semantic role labeling, semi-supervised machine learning, semantic parsing, word embedding.

PP. 104-120.

REFERENCES

1. Fillmore C. J. The case for case // Universals in Linguistic Theory / Ed. by Emmon Bach, Robert T. Harms. — New York, 1968. — P. 1–88.
2. Gildea D., Jurafsky D. Automatic labeling of semantic roles // Computational Linguistics. — 2002. — Vol. 28, no. 3. — P. 245–288.
3. Plungyan, V. A. 2011. Vvedenie v grammaticheskuyu semantiku: grammaticheskie znacheniya i gramma-ticheskie sistemy yazykov mira: uchebnoe posobie [Introduction to grammatical semantics: grammatical meanings and grammatical system of the world’s languages]. Moscow: RSUH Publs. 672 p.
4. Kashkin, E. V., Lyashevskaya, O. N. 2013. Semanticheskie roli i set’ konstruktsiy v sisteme FrameBank [Semantic roles and constructs network in FrameBank system]. Trudy mezhdunarodnoy konferentsii “Dialog 2013” [International Conference “Dialogue-2013”]. Moscow. 325–343.
5. Shen D., Lapata M. Using semantic roles to improve question answering // Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). — Association for Computational Linguistics, 2007. — P. 12–21.
6. Kaisser M., Webber B. Question answering based on semantic roles // Proceedings of the Workshop on Deep Linguistic Processing. — Association for Computational Linguistics, 2007. — P. 41–48.
7. Shelmanov, A.O., Kamenskaya, M.A., Anan’eva, M.I., Smirnov, I.V. 2016. Semantikosintaksicheskiy analiz tekstov v zadachakh voprosno-otvetnogo poiska i izvlecheniya opredeleniy [Semantic-syntactic analysis for question-answering and definition extraction]. Iskusstvennyy intellekt i prinyatie resheniy [Artificial intelligence and decision-making]. (In the press.)
8. Liu D., Gildea D. Semantic role features for machine translation // Proceedings of the 23rd International Conference on Computational Linguistics. — Association for Computational Linguistics, 2010. — P. 716–724.
9. Relation alignment for textual entailment recognition / Mark Sammons, VG Vinod Vydiswaran, Tim Vieira et al. // Text Analysis Conference (TAC). — 2009.
10. Xue N., Palmer M. Calibrating features for semantic role labeling // Proceedings of EMNLP 2004. — Association for Computational Linguistics, 2004. — P. 88–94.
11. Shallow semantic parsing using support vector machines / Sameer S Pradhan, Wayne H Ward, Kadri Hacioglu et al. // HLT-NAACL 2004: Main Proceedings. — Association for Computational Linguistics, 2004. — P. 233–240.
12. Toutanova K., Haghighi A., Manning C. D. Joint learning improves semantic role labeling // Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. — Association for Computational Linguistics, 2005. — P. 589–596.
13. Punyakanok V., Roth D., Yih W.-t. The importance of syntactic parsing and inference in semantic role labeling // Computational Linguistics. — 2008. — Vol. 34, no. 2. — P. 257–287.
14. Palmer M., Gildea D., Kingsbury P. The proposition bank: An annotated corpus of semantic roles // Computational linguistics. — 2005. — Vol. 31, no. 1. — P. 71–106.
15. Fillmore C. J., Johnson C. R., Petruck M. R. Background to FrameNet // International journal of lexicography. — 2003. — Vol. 16, no. 3. — P. 235–250.
16. The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages / Jan Hajic, Massimiliano Ciaramita, Richard Johansson et al. // Proceedings of the Thirteenth Conference on Computational Natural Language Learning: Shared Task. — Association for Computational Linguistics, 2009. — P. 1–18.
17. Fung P., Chen B. BiFrameNet: bilingual frame semantics resource construction by cross-lingual induction // Proceedings of the 20th international conference on Computational Linguistics. — Association for Computational Linguistics, 2004.
18. Cross-language frame semantics transfer in bilingual corpora / Roberto Basili, Diego De Cao, Danilo Croce et al. // International Conference on Intelligent Text Processing and Computational Linguistics / Springer. — 2009. — P. 332–345.
19. Pado S., Lapata M. Cross-lingual annotation projection for semantic roles // Journal of Artificial Intelligence Research. — 2009. — Vol. 36. — P. 307–340.
20. Johansson R., Nugues P. A FrameNet-based semantic role labeler for Swedish // Proceedings of the COLING/ACL. — Association for Computational Linguistics, 2006. — P. 436–443.
21. Kozhevnikov M., Titov I. Cross-lingual transfer of semantic role labeling models // Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). — Association for Computational Linguistics, 2013. — P. 1190–1200.
22. Das D., Smith N. A. Semi-supervised framesemantic parsing for unknown predicates // Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. — Association for Computational Linguistics, 2011. — P. 1435– 1444.
23. Burchardt A., Erk K., Frank A. A WordNet detour to FrameNet // Sprachtechnologie, mobile Kommunikation und linguistische Resourcen. — 2005. — Vol. 8. — P. 408–421.
24. Miller G. A. WordNet: A lexical database for English // Communications of the ACM. — 1995. — Vol. 38, no. 1. — P. 39–41.
25. Johansson R., Nugues P. Using WordNet to extend FrameNet coverage // In Proceedings of the Workshop on Building Frame-semantic Resources for Scandinavian and Baltic Languages at the 16th Nordic Conference of Computational Linguistics (NODALIDA). — 2007. — P. 27–30.
26. Automatic induction of FrameNet lexical units / Marco Pennacchiotti, Diego De Cao, Roberto Basili et al. // Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. — Association for Computational Linguistics, 2008.
27. Furstenau H., Lapata M. Semi-supervised semantic role labeling via structural alignment // Computational Linguistics. — 2012. — Vol. 38, no. 1. — P. 135–171.
28. Do Q. T. N., Bethard S., Moens M.-F. Domain adaptation in semantic role labeling using a neural language model and linguistic resources // IEEE/ACM Transactions on Audio, Speech, and Language Processing. — 2015. — Vol. 23, no. 11. — P. 1812–1823.
29. Garg N., Henderson J. Unsupervised semantic role induction with global role ordering // Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2. — Association for Computational Linguistics, 2012. — P. 145–149.
30. Lang J., Lapata M. Similarity-driven semantic role induction via graph partitioning // Computational linguistics. — 2014. — Vol. 40, no. 3. — P. 633–669.
31. Titov I., Khoddam E. Unsupervised induction of semantic roles within a reconstruction error minimization framework // In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. — 2015.
32. Shelmanov A. O., Smirnov I. V. Methods for semantic role labeling of Russian texts // Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference “Dialogue” (2014). — No. 13. — 2014. — P. 607–620.
33. Kuznetsov I. Semantic role labeling for Russian language based on Russian FrameBank // International Conference on Analysis of Images, Social Networks and Texts / Springer. — 2015. — P. 333–338.
34. Sokirko, A. V. 2001. Semanticheskie slovari v avtomaticheskoy obrabotke teksta (po materialam sistemy DI-ALING) [Semantic dictionaries in automatic text processing]. PhD Thesis. Moscow.
35. Osipov, G. S., Shelmanov, A. O. 2015. Metod povysheniya kachestva sintaksicheskogo analiza na osnove vzaimodeystviya sintaksicheskikh i semanticheskikh pravil [Method of improving the quality of parsing based on the interaction of syntactic and semantic rules]. Trudy shestoy mezhdunarodnoy konfe-rentsii “Sistemnyy analiz i informatsionnye tekhnologii” (SAIT) [6th Conference “Systems Analysis and Information Technologies”]. Svetlogorsk. p. 229–240.
36. Smirnov, I. V., Shelmanov, A. O., Kuznetsova, E. S., Khramoin, I. V. Semantiko-sintaksicheskiy analiz estestvennykh yazykov. Chast’ II. Metod semantiko-sintaksicheskogo analiza tekstov [The semantic-syntactic analysis of natural languages. Part II. The method of semantic and syntactic analysis of texts]. Is-kusstvennyy intellekt i prinyatie resheniy [Artificial intelligence and decision-making]. 1: 11–24.
37. Osipov G. S., Smirnov I. V., Tikhomirov I. A. Reliacionno-situacionnyi metod poiska i Analisa tekstov I ego prilogienia // Iskusstvennyy intellekt i prinyatie resheniy [Artificial intelligence and decision-making]. 2008. — No 2. — p. 3–10.
38. Zolotova, G.A., Onipenko, N.K., Sidorova, M.Yu. 2004. Kommunikativnaya grammatika russkogo yazyka [Communicative Grammar of the Russian Language] // Moscow: Russian Vinogradov Language Institute of RAS. 544 p.
39. Apresyan, Yu. D., Boguslavskiy, I. M., Iomdin, B. L., i dr. 2005. Sintaksicheski i semanticheski annotirovannyy korpus russkogo yazyka: sovremennoe sostoyanie i perspektivy [Syntactically and semantically annotated corpus of Russian language: current status and prospects]. Natsional’nyy korpus rus-skogo yazyka [National Corpus of Russian Language]. P. 193–214.
40. Avtomaticheskaya obrabotka teksta [Automatic Text Processing]. Available at: http://www.aot. ru/ (Accesssed November 20, 2016).
41. MaltParser: A language-independent system for data-driven dependency parsing / Joakim Nivre, Johan Hall, Jens Nilsson et al. // Natural Language Engineering. — 2007. — Vol. 13, no. 2. — P. 95–135.
42. Distributed representations of words and phrases and their compositionality / Tomas Mikolov, Ilya Sutskever, Kai Chen et al. // Advances in neural information processing systems. — 2013. — P. 3111–3119.
43. Mnih A., Kavukcuoglu K. Learning word embeddings efficiently with noise-contrastive estimation // Advances in Neural Information Processing Systems. — 2013. — P. 2265–2273.
44. Kutuzov A., Andreev I. Texts in, meaning out: neural language models in semantic similarity task for Russian // Proceedings of the Dialog Conference. — 2015.
 

2019-69-1
2018-68-4
2018-S1
2018-68-3

© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".