Аннотация. Выполнен обзор методов и систем интеллектуального анализа медицинских данных и клинических текстов на естественном языке. Проанализирован типовой состав данных многопрофильного педиатрического центра и выявлены направления применения и задачи комплексного интеллектуального анализа медицинских данных. Предложена архитектура системы комплексного интеллектуального анализа медицинских данных, а также выбраны платформы для ее реализации. Ключевые слова: интеллектуальный анализ медицинских данных, автоматическая обработка медицинских текстов, медицинская информационная система, большие данные, grid-системы. Стр. 81-93. A. A. Baranov, L. S. Namazova-Baranova, I. V. Smirnov, D. A. Deviatkin, A. O. Shelmanov, E. A. Vishneva, E. V. Antonova, V. I. Smirnov, A. V. Latyshev"Methods and systems for data and text mining in healthcare."Abstract. The paper reviews methods and systems for data mining in healthcare and systems for natural language processing of clinical texts. We analyze the typical data structure of the multidisciplinary pediatric center and identify the tasks and objectives of mining these data. We also propose the architecture of a system for complex mining of medical data and texts and choose the program platforms for the implementation of the system Keywords: data Mining in healthcare, natural language processing for clinical texts, hospital information system, Big Data, grid. Полная версия статьи в формате pdf. 1. 2014 AA UMLS MeSH Russian source information. 2014 (okt.). http://www.nlm.nih.gov/research/umls/sourcerelease docs/current/MSHRUS/index.html. 2. Mezhdunarodnaya klassifikatsiya bolezney 10-go peresmotra (MKB-10). 2014 (okt.). http://mkb-10.com/. 3. Agrawal R., Imielinski T., Swami A. Mining association rules between sets of items in large databases // ACM SIGMOD Record / ACM. 1993. V. 22. P. 207–216. 4. American Psychiatric Association. The Diagnostic and Statistical Manual of Mental Disorders: DSM 5. Arlington, VA : American Psychiatric Association, 2013. 5. Apache OpenNLP. 2014 (okt.). https://opennlp.apache. org/index.html. 6. Aronson A. R. Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program // Proceedings of the AMIA Symposium / American Medical Informatics Association. 2001. P. 17–21. 7. Gene ontology: tool for the unification of biology / Michael Ashburner, Catherine A Ball, Judith A Blake et al. // Nature genetics. 2000. V. 25. № 1. P. 25–29. 8. Big data in health care: using analytics to identify and manage high-risk and high-cost patients / David W. Bates, Suchi Saria, Lucila Ohno-Machado et al. // Health Affairs. 2014. V. 33. № 7. P. 1123–1131. 9. A data mining system for infection control surveillance / S. E. Brossette, A. P. Sprague, W. T. Jones, S. A. Moser // Methods of information in medicine. 2000. V. 39, № 4/5. P. 303–310. 10. Data analysis services in the knowledge grid / Eugenio Cesario, Antonio Congiusta, Domenico Talia, Paolo Trunfio // Data Mining Techniques in Grid Computing Environments. 2008. P. 17–36. 11. Data mining approach to policy analysis in a health insurance domain / Young Moon Chae, Seung Hee Ho, Kyoung Won Cho et al. // International journal of medical informatics. 2001. V. 62. № 2. P. 103–111. 12. Chen T.-J., Chou L.-F., Hwang S.-J. Application of a data-mining technique to analyze coprescription patterns for antacids in Taiwan // Clinical therapeutics. 2003. V. 25. № 9. P. 2453–2463. 13. Christensen L. M., Haug P. J., Fiszman M. MPLUS: a probabilistic medical language understanding system // Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain. V. 3 / Association for Computational Linguistics. 2002. P. 29–36. 14. Automatically extracting cancer disease characteristics from pathology reports into a disease knowledge representation model / Anni Coden, Guergana Savova, Igor Sominsky et al. // Journal of biomedical informatics. 2009. V. 42. № 5. P. 937–949. 15. Cunningham H. GATE, a general architecture for text engineering // Computers and the Humanities. 2002. V. 36. № 2. P. 223–254. 16. Delen D., Walker G., Kadam A. Predicting breast cancer survivability: a comparison of three data mining methods // Artificial intelligence in medicine. 2005. V. 34. № 2. P. 113–127. 17. Ferrucci D., Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment // Natural Language Engineering. 2004. V. 10. № 3–4. P. 327–348. 18. Foster I., Kesselman C. The Grid 2: Blueprint for a new computing infrastructure. Elsevier, 2003. 19. Foster I., Maguire T., Snelling D. Ogsa wsrf basic profile 1.0. 2014. http://www.ogf.org/documents/GFD.72.pdf. 20. Fox G. C., Furmanski W. PETAOPS and EXAOPS: Supercomputing on the web // Internet Computing, IEEE.1997. V. 1. № 2. P. 38–46. 21. Friedman C. A broad-coverage natural language processing system // Proceedings of the AMIA Symposium / American Medical Informatics Association. 2000. P. 270–274. 22. Natural language processing in an operational clinical information system / Carol Friedman, George Hripcsak, William DuMouchel et al. // Natural Language Engineering. 1995. V. 1. № 01. P. 83–108. 23. A novel data mining system points out hidden relationships between immunological markers in multiple sclerosis / Maira Gironi, Marina Saresella, Marco Rovaris et al. // Immun Ageing. 2013. V. 10. № 1. http://www.biomedcentral. com/content/pdf/1742–4933–10–1.pdf. 24. Globus toolkit. 2014 (okt.). http://toolkit.globus.org/toolkit/. 25. Todd R. Golub, Donna K. Slonim, Pablo Tamayo et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring // Science. 1999. V. 286. № 5439. P. 531–537. 26. Harper P. R. A review and comparison of classification algorithms for medical decision making // Health Policy. 2005. V. 71. № 3. P. 315–331. 27. A comparative study of classification methods for microarray data analysis / Hong Hu, Jiuyong Li, Ashley Plank et al. // Proceedings of the fifth Australasian conference on data mining and analystics / Australian Computer Society, Inc. 2006. V. 61. P. 33–37. 28. Stanley M. Huff, Roberto A. Rocha, Clement J. McDonald et al. Development of the logical observation identifier names and codes (LOINC) vocabulary // Journal of the American Medical Informatics Association. 1998. V. 5. № 3. P. 276–292. 29. ICD-10 Version:2010. 2014 (okt.). http://apps.who.int/classifications/icd10/browse/2010/en. 30. Kent ridge bio-medical dataset. 2014 (okt.). http:// datam.i2r.a-star.edu.sg/datasets/krbd/. 31. Lindberg D. A., Humphreys B. L., McCray A. T. The unified medical language system // Methods of information in medicine. 1993. V. 32. № 4. P. 281–291. 32. Text analytics for life science using the unstructured information management architecture / R. Mack, Sougata Mukherjea, Aya Soffer et al. // IBM Systems Journal. 2004. V. 43. № 3. P. 490–515. 33. Mayo clinic. 2014. http://www.mayoclinic.org/. 34. Medical subject headings. 2014 (okt.). http://www.nlm. nih.gov/mesh/. 35. Danielle L. Mowery, B. South, L. Christensen et al. Task 2: ShARe/CLEF eHealth evaluation lab 2014 // CLEF 2014 Evaluation Labs and Workshop: Online Working Notes. 2014. 36. Obenshain M. K. Application of data mining techniques to healthcare data // Infection Control and Hospital Epidemiology. 2004. V. 25. № 8. P. 690–695. 37. Relational-situational method for intelligent search and analysis of scientific publications / Gennady Osipov, Ivan Smirnov, Ilya Tikhomirov, Artem Shelmanov // Proceedings of the Workshop on Integrating IR technologies for Professional Search, in conjunction with the 35th European Conference on Information Retrieval (ECIR’13). V. 968. Moscow, Russia: CEUR Workshop Proceedings, 2013. 38. Potter R. Comparison of classification algorithms applied to breast cancer diagnosis and prognosis // Advances in Data Mining. 7th Industrial Conference, ICDM 2007, Leipzig, Germany, July 2007, Poster and Workshop Proceedings. 2007. P. 40–49. 39. Task 1: ShARe/CLEF eHealth evaluation lab 2013 / Sameer Pradhan, Noemie Elhadad, B South et al. // Online Working Notes of the CLEF 2013 Evaluation Labs and Workshop. 2013. 40. Data mining meets grid computing: Time to dance? / Alberto Sánchez, Jesús Montes, Werner Dubitzky et al. // Data Mining Techniques in Grid Computing Environments. 2008. P. 1–16. 41. A data mining system for providing analytical information on brain tumors to public health decision makers / R. S. Santos, S. M. F. Malheiros, S. Cavalheiro, J. M. De Oliveira // Computer methods and programs in biomedicine. 2013. V. 109. № 3. P. 269–282. 42. Guergana K. Savova, James J. Masanz, Philip V. Ogren et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications // Journal of the American Medical Informatics Association. 2010. V. 17. № 5. P. 507–513. 43. SemEval-2014 Task 7. 2014. http://alt.qcri.org/semeval2014 /task7/index.php?id=task-description. 44. SemEval-2015 Task 14. 2014. http://alt.qcri.org/semeval2015/ task14/index.php?id=task-description. 45. Shah S., Kusiak A., Dixon B. Data mining in predicting survival of kidney dialysis patients // Biomedical Optics 2003 / International Society for Optics and Photonics. 2003. P. 73–79. 46. SNOMED Clinical Terms. 2014. http://www.nlm.nih.gov/ research/umls/Snomed/snomed_main.html. 47. SOAP Specifications — World Wide Web Consortium. 2014 (okt.). http://www.w3.org/TR/soap/. 48. Sun W., Rumshisky A., Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2 challenge // Journal of the American Medical Informatics Association. 2013. V. 20. № 5. P. 806–813. 49. Tassy O., Pourquié O. Manteia, a predictive data mining system for vertebrate genes and its applications to human genetic diseases // Nucleic acids research. 2014. V. 42. № D1. P. D882–D891. 50. UNICORE — Distributed computing and data resources. 2014. http://www.unicore.eu/. 51. Ozlem Uzuner, Brett R. South, Shuying Shen, Scott L. DuVall. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text // Journal of the American Medical Informatics Association. 2011. P. 552–556. 52. Gene expression profiling predicts clinical outcome of breast cancer / Laura J van’t Veer, Hongyue Dai, Marc J Van De Vijver et al. // Nature. 2002. V. 415. № 6871. P. 530–536. 53. Weka 3: Data mining software in java. 2014. http://www.cs.waikato.ac.nz/ml/weka/. 54. MedEx: a medication information extraction system for clinical narratives / Hua Xu, Shane P Stenner, Son Doan et al. // Journal of the American Medical Informatics Association. 2010. V. 17. № 1. P. 19–24. 55. Data mining in healthcare and biomedicine: a survey of the literature / Illhoi Yoo, Patricia Alafaireet, Miroslav Marinov et al. // Journal of medical systems. 2012. V. 36. № 4. P. 2431–2448. 56. Piloting IBM Watson Oncology within Memorial Sloan Kettering’s regional network. / Marjorie Glass Zauderer, Ayca Gucalp, Andrew S Epstein et al. // ASCO Annual Meeting Proceedings. 2014. V. 32. P. e17653. 57. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: Evaluation of a natural language processing system / Qing T Zeng, Sergey Goryachev, Scott Weiss et al. // BMC medical informatics and decision making. 2006. V. 6. № 30.
|