%A Xianlai Chen,Chaopeng Han,Ying An,Li Liu,Zhongmin Li,Rong Yang %T Extracting New Words with Mutual Information and Logistic Regression %0 Journal Article %D 2019 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.2096-3467.2018.1445 %P 105-113 %V 3 %N 8 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4691.shtml} %8 2019-08-25 %X

[Objective] This paper modified the method for new word extraction, which are used to improve the performance of medical text segmentation models. [Methods] With the help of traditional mutual information model, we obtained the statistics of words and strings. Then, we established a logical regression classification model with these data, and built an algorithm for new word identification. [Results] A series of experiments were carried out on the texts of electronic medical records from Dermatology Department of Xiangya Hospital. Compared with PMI, PMI 2 and PMI 3, our model with logistic regression achieved the highest accuracy of new words extraction (0.803). [Limitations] To establish the logistic regression model for classification, we have to manually judge whether or not the training strings are words. [Conclusions] The proposed model and algorithm could effectively identify new words from medical records.