Statistical Classification of Scientific Publications
Volume 21, Issue 4 (2010), pp. 471–486
Pub. online: 1 January 2010
Type: Research Article
Received
1 July 2009
1 July 2009
Accepted
1 September 2010
1 September 2010
Published
1 January 2010
1 January 2010
Abstract
The problem of automatic classification of scientific texts is considered. Methods based on statistical analysis of probabilistic distributions of scientific terms in texts are discussed. The procedures for selecting the most informative terms and the method of making use of auxiliary information related to the terms positions are presented. The results of experimental evaluation of proposed algorithms and procedures over real-world data are reported.