Scientometrics and management science
Modeling of activity characteristics of sectoral and regional subsystems
Computer analysis of texts
Ananyeva M., Devyatkin D., Kobozeva M., Smirnov I., Solovyev F., Chepovskiy A. The study of extremist texts features
Ananyeva M., Devyatkin D., Kobozeva M., Smirnov I., Solovyev F., Chepovskiy A. The study of extremist texts features


This article presents methods for identifying the extremist activities of violent groups and individuals within the Internet. We describe our training and testing datasets in Russian and Tatar, as well as research of Russian extremist text characteristics. This resulted in a formation of a feature set for the extremist texts. The applicability of these features for detection of extremist messages was empirically showed.


extremist texts, psycholinguistic features, separating features, text classification.

PP. 86-97.


1. Cohen K., Johansson F., Kaati L. and Mork J.C. Detecting linguistic markers for radical violence in social media // Terrorism and Political Violence 2014. Vol. 26, No 1. pp. 256–256.
2. Finlayson M. A., Halverson J. R., Corman S. R. The N2 corpus: A semantically annotated collection of Islamist extremist stories //LREC. – 2014. – p. 896-902.
3. Osipov G. 2011. Metody iskusstvennogo intellekta [Methods for artificial intelligence]. Moscow: Fizmatlit. 296 p.
4. Chepovskiy A. M. 2015. Informatsionnyye modeli v zadachakh obrabotki tekstov na yestestvennykh yazykakh. Vtoroye izdaniye [Information models for text processing. Second edition]. Moscow: The National Open University “INTUIT”. 276 p.
5. Polyakov I.V., Sokolova T.V., Chepovskiy A.A., Chepovskiy A.M. 2015. Problema klassifikatsii tekstov i differentsiruyushchiye priznaki [The problem of text classification and separating features] Vestnik Novosibirskogo gosudarstvennogo universiteta. Seriya: Informatsionnyye tekhnologii [Bulletin of the Novosibirsk State University. Series: Information Technology] 13. № 2:55–63.
6. Ceran B. et al. A semantic triplet based story classifier //Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012). – IEEE Computer Society, 2012. – p. 573-580.
7. Correa D., Sureka A. Solutions to detect and analyze online radicalization: a survey //arXiv preprint arXiv:1301.4916. – 2013.
8. Ananyeva M.I., Kobozeva M.I., Solovyev F.N., Polyakov I.V., Chepovskiy A.M. 2016. O probleme vyyavleniya ekstremistskoy napravlennosti v tekstakh [About the problem of extremist texts identification]. Vestnik Novosibirskogo gosudarstvennogo universiteta [Bulletin of the Novosibirsk State University]. 14. №. 4:5-13.
9. Zhdanova S.Yu. et al. 2012. Osobennosti reprezentatsii etnicheskoy agressii v korpuse soobshcheniy permskogo segmenta sotsial’noy seti «Vkontakte» (Vk. Com) [Features of representation of ethnic aggression in the Permian segment of the social network “Vkontakte” (]. Vektor nauki Tol’yattinskogo gosudarstvennogo universiteta. Seriya: Pedagogika, psikhologiy [Vector of Science. Togliatti State University: Pedagogy, Psychology]. 4 (11).
10. Chen H. Exploring extremism and terrorism on the web: the dark web project //Pacific-Asia Workshop on Intelligence and Security Informatics. – Springer Berlin Heidelberg, 2007. – С. 1-20.
11. Prentice S. et al. Analyzing the semantic content and persuasive composition of extremist media: A case study of texts produced during the Gaza conflict //Information Systems Frontiers. – 2011. – Vol. 13(1). – pp. 61-73
12. Agarwal S., Sureka A. Using KNN and SVM based one-class classifier for detecting online radicalization on twitter //International Conference on Distributed Computing and Internet Technology. – Springer International Publishing, 2015. – pp. 431-442.
13. Ashcroft M. et al. Detecting jihadist messages on twitter //Intelligence and Security Informatics Conference (EISIC), 2015 European. – IEEE, 2015. – pp. 161-164.
14. Scanlon J. R., Gerber M. S. Automatic detection of cyber-recruitment by violent extremists // Security Informatics. – 2014. – Vol. 3 (1). – p. 1.
15. Huang J., Ling C. X. Using AUC and accuracy in evaluating learning algorithms //IEEE Transactions on knowledge and Data Engineering. – 2005. – Vol. 17(3). – pp. 299-310
16. Bodine-Baron E. et al. Examining ISIS Support and Opposition Networks on Twitter //RAND Corporation. – 2016. – pp. 29-30.
17. Wadhwa P., Bhatia M. P. S. Classification of radical messages in Twitter using security associations // Case studies in secure computing: Achievements
and trends. – 2014. – pp. 273-294.
18. Mikhaylova A.S., Sokolova T.V., Chepovskiy A.A., Chepovskiy A.M. 2016. Vyyavleniye tematicheskoy napravlennosti tekstov na yestestvennykh yazykakh [Identification of thematic focus of texts]. Iskusstvennyy intellekt i prinyatiye resheniy [Artificial intelligence and decision-making].1:9–17.
19. Zurini M. Stylometry Metrics Selection for Creating a Model for Evaluating the Writing Style of Authors According to Their Cultural Orientation //Informatica Economica. – 2015. – Vol. 19 (3). – pp. 107.
20. Bhargava M., Mehndiratta P., Asawa K. Stylometric analysis for authorship attribution on twitter //International Conference on Big Data Analytics. – Springer International Publishing, 2013. – pp. 37-47.
21. Brocardo M. L., Traore I., Woungang I. Toward a framework for continuous authentication using stylometry //Advanced Information Networking and Applications (AINA), 2014 IEEE 28th International Conference on. – IEEE, 2014. – С. 106-115.
22. Nirkhi S. M., Dharaskar R. V., Thakare V. M. Authorship Attribution of online messages using Stylometry: An Exploratory Study //International Conference on Advances in Engineering and Technology (ICAET’2014). – 2014.
23. Osipov G. et al. Relational-situational method for intelligent search and analysis of scientific publications //Proceedings of the Integrating IR Technologies for Professional Search Workshop. – 2013. – pp. 57-64.
24. Vybornova O. et al. Social tension detection and intention recognition using natural language semantic analysis: On the material of Russianspeaking social networks and Web forums // Intelligence and Security Informatics Conference (EISIC), 2011 European. – IEEE, 2011. – pp. 277- 281.
25. Dral A.A., Sochenkov I.V., Mbaykodzi E. 2012. Metod avtomaticheskoy klassifikatsii korotkikh tekstovykh soobshcheniy [The method of automatic classification of short text messages]. Informatsionnyye tekhnologii i vychislitel’nyye sistemy [Information technology and computer systems]. 93-102p.
26. Kira K., Rendell L.A. The feature selection problem: Traditional methods and a new algorithm // AAAI. – 1992. – Т. 2. – С. 129 – 134.
27. Pedregosa F. et al. Scikit-learn: Machine learning in Python //Journal of Machine Learning Research. – 2011. – Vol. 12. – No. Oct. – pp. 2825-2830


© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".