|
|
Topic Recognition Research on Topic Imbalanced News Text Data Set
|
Wang Hongbin,Wang Jianxiong,Zhang Yafei,Yang Heng
|
(Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China)
(Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming 650500, China)
(YUN NAN WEI HENG JI YE Co., Ltd., Kunming 650000, China)
|
|
|
Abstract
[Objective] The traditional LDA model is not accurate for text topic recognition,because of the number of different topic texts in news text dataset is not balanced. [Methods] This paper proposes a topic recognition method based on the traditional LDA model on unbalanced news text data sets, which combines three different feature detection methods: independence detection, variance detection and information entropy detection. [Results] Experiments are conducted on 10000 news texts, the proposed method improves recall by 0.2121, precision by 0.0407 and F1 value by 0.152, compared with the traditional LDA topic recognition method. [Limitations] Due to the large number of new words in news text, the segmentation accuracy of word segmentation tools used in the experiment will be reduced, and the effect of news text topic recognition is affected by the dependence on the accuracy of segmentation. [Conclusions] Experimental results show that the proposed method can solve the problem of LDA topic recognition on unbalanced number of texts between different topics in news text dataset to a certain extent.
|
Published: 11 November 2020
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|