Журнал «Труды Института системного анализа Российской академии наук» - V.V. Donitova, D.A. Kireev, E.V. Titova, A.A. Akimova Natural language processing models for extraction of stroke risk factors from electronic health records

Просматривается номер 2021-71-4

Systemic regulation of national and regional economy

V. A. Kubrina Personal capital as a factor of increasing the competitiveness of Russia

V.N. Livchits, O.M. Shatalova, O.V. Dmitrieva Managed Economy: An Overview of Current Public Administration Practices in the Digital Transformation

V.N. Livchits, O.M. Shatalova Economic mechanisms of management of the oil and gas complex of Russia. Memory of scientist and practice

S.A. Smolyak Optimizing the use of equipment items in time

General systems theory

G.A. Smirnov Intensional procedures for structuring objects

Ya.V. Shokin Evolution of economic behavior theories: from Aristotle to neuroeconomics

Information Technology

I.A. Tarkhanov, I.A. Shmelev Conceptual architecture of a social contact tracking system based on blockchain

V.A. Tishchenko OPC-trie structure as a new type of index in NIKA DBMS

Optimization, identification, the theory of games

S. A. Panov, A.D. Ragulsky Conditional optimization with adaptive metric: target programming in describing the dynamics of consumer preferences

System analysis in medicine and biology

V.V. Donitova, D.A. Kireev, E.V. Titova, A.A. Akimova Natural language processing models for extraction of stroke risk factors from electronic health records


	V.V. Donitova, D.A. Kireev, E.V. Titova, A.A. Akimova Natural language processing models for extraction of stroke risk factors from electronic health records
Abstract. High social impact of stroke makes early detection of stroke risk factors crucial for its prevention. It is important to use the most efficient natural language processing (NLP) methods for automatic extraction of information about risk factors from the electronic health records (EHRs) to improve the quality of preventive medical care. The authors have developed methods to extract information about diseases and health status of patients based on manually created rules, statistical machine learning and deep learning to solve the problem of named entity recognition (NER) in clinical records. Comparative experimental studies of the developed methods were conducted on a marked-up corpus of clinical records. As a result, conclusions are made on the effectiveness of the developed methods. Keywords: risk factors, natural language processing, named entity recognition, machine learning, deep learning. PP. 93-101. DOI: 10.14357/20790279210410 References 1. Johnson W., Onuma O., Owolabi M. and Sachdev S. Sep. 2016. Stroke: a global response is needed. Bull. World Health Organ., vol. 94, no. 9, pp. 634-634A. 2. Thrift A.G. et al. Jan. 2014. Global stroke statistics. Int. J. stroke Off. J. Int. Stroke Soc., vol. 9, no. 1, pp. 6–18. 3. Boehme A.K., Esenwa C. and M.S. V Elkind. Feb. 2017. Stroke Risk Factors, Genetics, and Prevention. Circ. Res., vol. 120, no. 3, pp. 472–495. 4. Blagosklonov N.A. et al. 2020. Linguistic analysis of disease history for identifying stroke risk factors. Trudy Instituta sistemnogo analiza rossiyskoy akademii nauk” (“Proceedings of the Institute for Systems Analysis of the Russian Academy of Science”), vol. 70, no. 3, pp. 75-85. 5. Devlin J., Chang M.-W., Lee K. and Toutanova K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 6. Neamatullah I. et al. 2008. Automated de-identification of free-text medical records. BMC Med. Inform. Decis. Mak., vol. 8, no. 1, p. 32. 7. Sondhi P., Gupta M., Zhai C. and Hockenmaier J. 2010. Shallow Information Extraction from Medical Forum Data. Coling 2010: Posters, pp. 1158–1166. Available at: https://www.aclweb.org/anthology/ C10-2133. 8. Nayel H. and Shashirekha H.L. “Improving {NER} for Clinical Texts by Ensemble Approach using Segment Representations,” in Proceedings of the 14th International Conference on Natural Language Processing ({ICON}-2017), 2017, pp. 197–204. Available at: https://www.aclweb.org/anthology/W17-7525. 9. Arbabi A., Adams D.R., Fidler S. and Brudno M. May 2019. Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning. JMIR Med. informatics, vol. 7, no. 2, p. e12596, 10. PubMed. Available at: https://pubmed.ncbi.nlm.nih.gov/ (дата обращения 15.04.2021). 11. Hahn U. and Oleynik M. Aug. 2020. Medical Information Extraction in the Age of Deep Learning. Yearb. Med. Inform., vol. 29, no. 1, pp. 208–220. 12. Shelmanov A. et al. 2019. Active Learning with Deep Pre-trained Models for Sequence Tagging of Clinical and Biomedical Texts. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 482–489. 13. Lee J. et al. Feb. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, vol. 36, no. 4, pp. 1234–1240. 14. Gligic L., Kormilitzin A., Goldberg P. and Nevado-Holgado A. Jan. 2020. Named entity recognition in electronic health records using transfer learning bootstrapped Neural Networks. Neural Netw., vol. 121, pp. 132–139. 15. Stenetorp P. et al. 2012. BRAT: a web-based tool for NLP-assisted text annotation. Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Pp. 102-107. 16. Yargy: Rule-based facts extraction for Russian language. Available at: https://github.com/natasha/yargy (дата обращения 15.04.2021). 17. Lafferty J.D., McCallum A. and Pereira F.C.N. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289. 18. Pedregosa F. et al. 2011. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res., vol. 12, no. null, pp. 2825–2830. 19. Sklearn-crfsuite: scikit-learn inspired API for CRFsuite. Available at: https://github.com/Team-HG-Memex/sklearn-crfsuite (дата обращения 15.04.2021). 20. Python-crfsuite: a python binding for crfsuite, Available at: https://github.com/scrapinghub/python-crfsuite (дата обращения 15.04.2021). 21. Okazaki N. 2007. CRFsuite: a fast implementation of conditional random fields (CRFs). Available at: http//www. chokkan. org/software/crfsuite. 22. Kuratov Y. and Arkhipov M. 2019.Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. 23. Wolf T. et al. 2020.HuggingFace’s Transformers: State-of-the-art Natural Language Processing.

2024-74-1

2023-73-4

2023-73-3

2023-73-2