Information Technology
Data Mining
N.D. Moskin, K.A. Kulakov, A.A. Rogov, R.V. Abramov "Research the Stability of Decision Trees Using Distances on Graphs"
Methods and Models in Natural Sciences
Computer analysis of texts
The article deals with the problem of stability of classifiers based on decision trees for the problem of text attribution. Such a task arises, for example, in the study of the authorship of articles from the pre-revolutionary journals “Time” (1861–1863), “Epoch” (1864–1865) and the weekly “Citizen” (1873–1874). The texts were divided into separate parts of different sizes using the sliding window method, then the frequency of n-grams (encoded sequences of parts of speech) in each fragment was determined. Further, these indicators were used to build various classifiers. The resulting decision trees were compared with each other using the tree edit distance. For this purpose, a procedure for processing, comparing and visualizing graphs was implemented in the SMALT software package. As a result of experiments using different weights for editing operations, patterns were revealed between the parameters for constructing text fragments and the decision trees obtained on their basis.


text attribution, n-gram, decision tree, graph matching, tree edit distance, software complex “SMALT”.

Стр. 94-100.

DOI: 10.14357/20790279230111

