New Technology of Library and Information Service  2011, Vol. 27 Issue (12): 39-45    DOI: 10.11925/infotech.1003-3513.2011.12.06
Research on Chinese Keywords Extraction Based on Characters Sequence Annotation
Wang Hao, Deng Sanhong, Su Xinning
Department of Information Management, Nanjing University, Nanjing 210093, China
Abstract  Based on the whole Chinese booklist of a certain university library as well as the analysis of its book indexing information, the paper summarizes the features and extracting laws of Chinese keywords, and establishes a Chinese keywords extraction model based on characters sequence annotation, which proposes the basic idea and implementation scheme for extracting keywords. It verifies the feasibility, rationality and practicality of the model by large-scale experiments, and basically solves the problems of Chinese keywords extraction without executing words segmentation, which shows that characters sequence annotation is better than words sequence annotation.
Key wordsSequence annotation      Conditional random fields      Keywords extraction      Machine learning      Characters sequence      Words sequence     
Received: 08 October 2011      Published: 02 February 2012



Wang Hao, Deng Sanhong, Su Xinning. Research on Chinese Keywords Extraction Based on Characters Sequence Annotation. New Technology of Library and Information Service, 2011, 27(12): 39-45.

