[Objective] This research aims to improve the effectiveness of clustering policy texts with the help of LDA topic model. [Methods] First, we pre-processed the policy texts with the LDA model to generate the training data set. Then, we used the weighted algorithm to determine the optimal number of topics and then clustered the policy texts. [Results] We found that the G value of the weighted clustering results reached peak while the k value was 4. Our results, which were consistent with those of the manual classification, also obtained higher purity and F values. Therefore, the proposed method is effective. [Limitations] Results of each operation in our study will influence the accuracy of the final policy text clustering. [Conclusions] The proposed method could provide directions for the making of new policies, the evaluation of current policies, and the mechanism of two-way interactions.
(Li Jiang, Liu Yuanhao, Huang Cui, et al.Remolding the Policy Text Data Through Documents Quantitative Research: The Formation, Transformation and Method Innovation of Policy Documents Quantitative Research[J]. Journal of Public Management, 2015, 12(2): 138-144.)
(Liu Gang, Liu Ying, Du Yudan, et al.Recursive Descent Evaluation Algorithm on Policy Context Similarity[J]. Application Research of Computers, 2015, 32(2): 343-346.)
Kar M, Nunes S, Ribeiro C.Summarization of Changes in Dynamic Text Collections Using Latent Dirichlet Allocation Model[J]. Information Processing and Management, 2015, 51(6): 809-833.
(Cao Juan, Zhang Yongdong, Li Jintao, et al.A Method of Adaptively Selecting Best LDA Model Based on Density[J]. Chinese Journal of Computers, 2008, 31(10): 1781-1787.)
(Lv Yawei, Li Fang, Dai Longlong.Chinese Word Similarity Computing Based on the Latent Dirichelet Allocation(LDA) Model[J]. Journal of Beijing University of Chemical Technology: Natural Science, 2016, 43(5): 79-83.)
(Li Xiangdong, Ba Zhichao, Huang Li.A Text Feature Selection Method Based on Weighted Latent Dirichlet Allocation and Multi-granularity[J]. New Technology of Library and Information Service, 2015(5): 42-49.)
(Deng Hancheng, Wang Minfang, Wang Ying.Theoretical Study of the Relationship Between Recall and Precision Ratio[J]. Journal of the China Society for Scientific and Technical Information, 2000, 19(4): 359-362.)