%A Ni Weijian,Sun Haohao,Liu Tong,Zeng Qingtian %T An Unsupervised Approach to Optimize Chinese Word Segmentation on Domain Literature %0 Journal Article %D 2018 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.2096-3467.2017.0990 %P 96-104 %V 2 %N 2 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4479.shtml} %8 2018-02-25 %X

[Objective] This paper aims to improve the performance of Chinese word segmentation techniques on domain literature by optimizing results of existing approaches. [Methods] First, we proposed a new criteria of Term Frequency Deviation (TFD) to capture word formation characteristics of domain literature based on the analysis of segmentation errors. Then, we developed an unsupervised segmentation refining approach with the help of TFD. [Results] We examined the proposed approach with agriculture documents. It improved the segmentation results of three popular Chinese word segmentation approaches (i.e., ICTCLAS, THULAC and LTP) by 2%~3% in F1 measure. The proposed approach was easy to use and robustness to parameters. [Limitations] The recall of the proposed approach needs to be improved. [Conclusions] The new Chinese word segmentation approach, which imrpoves the performance of traditional methods on domain literature, could be applied to other fields due to its independence of domain-specific vocabulary and annotated corpus.