%A Yan Qiang,Zhang Xiaoyan,Zhou Simin %T Extracting Keywords Based on Sememe Similarity %0 Journal Article %D 2021 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.2096-3467.2020.0748 %P 80-89 %V 5 %N 4 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4950.shtml} %8 2021-04-25 %X

[Objective] This study introduces word semantics to TextRank algorithm, aiming to improve the performance of keywords extraction methods. [Methods] First, we used the semantic information from HowNet to calculate similarity of words. Then, we constructed graph and matrix for semantic words passing a similarity threshold. Finally, the semantic matrix and co-occurrence matrix were weighted to obtain transition probability matrix. [Results] The improved algorithm is better than TextRank, TF-IDF and LDA on short texts, which increased the F-scores by 6.6%, 9.0% and 10.3% respectively. On long texts, the results were inferior to TF-IDF, but close to TextRank. [Limitations] The segmentation program could not effectively identify compound words, new words and entities, which extracted incomplete keywords and reduced F-scores. In addition, the semantic similarity algorithm could also be improved. [Conclusions] The proposed method effectively extracts keywords from short texts with the help of co-occurrence and semantic relations of words.