Systemic regulation of national and regional economy
General systems theory
Information Technology
Optimization, identification, the theory of games
System analysis in medicine and biology
V.V. Donitova, D.A. Kireev, E.V. Titova, A.A. Akimova Natural language processing models for extraction of stroke risk factors from electronic health records
V.V. Donitova, D.A. Kireev, E.V. Titova, A.A. Akimova Natural language processing models for extraction of stroke risk factors from electronic health records
Abstract. 

High social impact of stroke makes early detection of stroke risk factors crucial for its prevention. It is important to use the most efficient natural language processing (NLP) methods for automatic extraction of information about risk factors from the electronic health records (EHRs) to improve the quality of preventive medical care. The authors have developed methods to extract information about diseases and health status of patients based on manually created rules, statistical machine learning and deep learning to solve the problem of named entity recognition (NER) in clinical records. Comparative experimental studies of the developed methods were conducted on a marked-up corpus of clinical records. As a result, conclusions are made on the effectiveness of the developed methods.

Keywords: 

risk factors, natural language processing, named entity recognition, machine learning, deep learning.

PP. 93-101.

DOI: 10.14357/20790279210410
 
References

1. Johnson W., Onuma O., Owolabi M. and Sachdev S. Sep. 2016. Stroke: a global response is needed. Bull. World Health Organ., vol. 94, no. 9, pp. 634-634A.
2. Thrift A.G. et al. Jan. 2014. Global stroke statistics. Int. J. stroke Off. J. Int. Stroke Soc., vol. 9, no. 1, pp. 6–18.
3. Boehme A.K., Esenwa C. and M.S. V Elkind. Feb. 2017. Stroke Risk Factors, Genetics, and Prevention. Circ. Res., vol. 120, no. 3, pp. 472–495.
4. Blagosklonov N.A. et al. 2020. Linguistic analysis of disease history for identifying stroke risk factors. Trudy Instituta sistemnogo analiza rossiyskoy akademii nauk” (“Proceedings of the Institute for Systems Analysis of the Russian Academy of Science”), vol. 70, no. 3, pp. 75-85.
5. Devlin J., Chang M.-W., Lee K. and Toutanova K. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
6. Neamatullah I. et al. 2008. Automated de-identification of free-text medical records. BMC Med. Inform. Decis. Mak., vol. 8, no. 1, p. 32.
7. Sondhi P., Gupta M., Zhai C. and Hockenmaier J. 2010. Shallow Information Extraction from Medical Forum Data. Coling 2010: Posters, pp. 1158–1166. Available at: https://www.aclweb.org/anthology/ C10-2133.
8. Nayel H. and Shashirekha H.L. “Improving {NER} for Clinical Texts by Ensemble Approach using Segment Representations,” in Proceedings of the 14th International Conference on Natural Language Processing ({ICON}-2017), 2017, pp. 197–204. Available at: https://www.aclweb.org/anthology/W17-7525.
9. Arbabi A., Adams D.R., Fidler S. and Brudno M. May 2019. Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning. JMIR Med. informatics, vol. 7, no. 2, p. e12596,
10. PubMed. Available at: https://pubmed.ncbi.nlm.nih.gov/ (дата обращения 15.04.2021).
11. Hahn U. and Oleynik M. Aug. 2020. Medical Information Extraction in the Age of Deep Learning. Yearb. Med. Inform., vol. 29, no. 1, pp. 208–220.
12. Shelmanov A. et al. 2019. Active Learning with Deep Pre-trained Models for Sequence Tagging of Clinical and Biomedical Texts. 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 482–489.
13. Lee J. et al. Feb. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, vol. 36, no. 4, pp. 1234–1240.
14. Gligic L., Kormilitzin A., Goldberg P. and Nevado-Holgado A. Jan. 2020. Named entity recognition in electronic health records using transfer learning bootstrapped Neural Networks. Neural Netw., vol. 121, pp. 132–139.
15. Stenetorp P. et al. 2012. BRAT: a web-based tool for NLP-assisted text annotation. Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Pp. 102-107.
16. Yargy: Rule-based facts extraction for Russian language. Available at: https://github.com/natasha/yargy (дата обращения 15.04.2021).
17. Lafferty J.D., McCallum A. and Pereira F.C.N. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–289.
18. Pedregosa F. et al. 2011. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res., vol. 12, no. null, pp. 2825–2830.
19. Sklearn-crfsuite: scikit-learn inspired API for CRFsuite. Available at: https://github.com/Team-HG-Memex/sklearn-crfsuite (дата обращения 15.04.2021).
20. Python-crfsuite: a python binding for crfsuite, Available at: https://github.com/scrapinghub/python-crfsuite (дата обращения 15.04.2021).
21. Okazaki N. 2007. CRFsuite: a fast implementation of conditional random fields (CRFs). Available at: http//www. chokkan. org/software/crfsuite.
22. Kuratov Y. and Arkhipov M. 2019.Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language.
23. Wolf T. et al. 2020.HuggingFace’s Transformers: State-of-the-art Natural Language Processing.

 

2024-74-1
2023-73-4
2023-73-3
2023-73-2

© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".