New Technology of Library and Information Service  2013, Vol. 29 Issue (7/8): 55-62    DOI: 10.11925/infotech.1003-3513.2013.07-08.08
Model Construction and Experiment Analysis of Automatic Indexing for Chinese Books
Wang Hao, Zou Jieli, Deng Sanhong
School of Information Management, Nanjing University, Nanjing 210093, China
Abstract  For the problem of automatic keywords indexing for Chinese books, this paper introduces the machine learning algorithm of Condition Radom Fields to deal with it. The method generates an annotation model including semantic relations and rule features among sequence entities though training the large number of existing keywords data of Chinese books indexed by manual, then uses the annotation model for machine predicting so that to automatically extract the books' keywords. The paper mainly solves two problems. First, because the parameters choice of CRFs will affect the indexing performance, the authors make comparative tests from several angles so as to identify the optimal parameter set of CRFs for the specific problem of keywords indexing for Chinese books. Second, the authors discusse the effect of different observed features to the keywords indexing, and demonstrate four observed features which can improve the indexing performance effectively through the experiments analysis. Finally, the optimal model of keywords indexing oriented to Chinese books is constructed.
Key wordsCondition Random Fields      Keywords indexing      Feature template      Word length of window      Feature function      Soft boundary parameter      Observed feature roles     
Received: 27 May 2013      Published: 02 September 2013



Wang Hao, Zou Jieli, Deng Sanhong. Model Construction and Experiment Analysis of Automatic Indexing for Chinese Books. New Technology of Library and Information Service, 2013, 29(7/8): 55-62.

