Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (9): 59-65    DOI: 10.11925/infotech.2096-3467.2018.0273
Clustering Policy Texts Based on LDA Topic Model
Tao Zhang1(),Haiqun Ma2
1Information and Network Center, Heilongjiang University, Harbin 150080, China
2Research Center of Information Resource Management, Heilongjiang University, Harbin 150080, China
[Objective] This research aims to improve the effectiveness of clustering policy texts with the help of LDA topic model. [Methods] First, we pre-processed the policy texts with the LDA model to generate the training data set. Then, we used the weighted algorithm to determine the optimal number of topics and then clustered the policy texts. [Results] We found that the G value of the weighted clustering results reached peak while the k value was 4. Our results, which were consistent with those of the manual classification, also obtained higher purity and F values. Therefore, the proposed method is effective. [Limitations] Results of each operation in our study will influence the accuracy of the final policy text clustering. [Conclusions] The proposed method could provide directions for the making of new policies, the evaluation of current policies, and the mechanism of two-way interactions.

Key wordsPolicy Text      LDA      Topic Model      Text Clustering     
Received: 12 March 2018      Published: 25 October 2018

Tao Zhang,Haiqun Ma. Clustering Policy Texts Based on LDA Topic Model. Data Analysis and Knowledge Discovery, 2018, 2(9): 59-65.

