New Technology of Library and Information Service  2012, Vol. 28 Issue (2): 41-47    DOI: 10.11925/infotech.1003-3513.2012.02.07
Research on Chinese New Word Recognition in Specialized Field Based on N-Gram
Duan Yufeng, Ju Fei
Business School, East China Normal University, Shanghai 200241, China
Abstract  The paper researches automatic new word recognition in specialized field which is represented by phytology. A set of 200 documents on plant description randomly drawn from “Flora of China” is taken as sample set. At first, draw new words candidates are drawn by N-Gram method based on words split by ICTCLAS. Then all the new words candidates are sorted respectively by term frequency (TF), document frequency (D) and average term frequency (TF/D) and the candidates are selected among certain boundary as true new words. The experiments show that new words recognition according to TF is the best and F measurement is 0.65. This method can automatically produce user dictionary of specialized field and is highly portable.
Key wordsN-Gram      New word recognition      Term frequency     
Received: 12 December 2011      Published: 23 March 2012



Duan Yufeng, Ju Fei. Research on Chinese New Word Recognition in Specialized Field Based on N-Gram. New Technology of Library and Information Service, 2012, 28(2): 41-47.

