%A Yu Yan,Zhao Naixuan %T Choosing Stopwords for Patent Topic Analysis Based on Auxiliary Set %0 Journal Article %D 2018 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.2096-3467.2018.0240 %P 95-103 %V 2 %N 11 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4582.shtml} %8 2018-11-25 %X

[Objective] This paper proposes a new method to automatically choose domain specific stopwords, aiming to improve the performance of patent topic analysis. [Methods] First, we introduced an auxiliary set and proposed two indexes of document frequency and entropies among categories based on this auxiliary set. Then, we measured the distribution of words from the auxiliary set to choose the domain specific stopwords automatically. [Results] The proposed method improved the quality of identified patent topics. [Limitations] The types and members of the auxiliary set need to be further studied. [Conclusions] The proposed stopwords selection methods could measure the characteristics of words, which helps us find the domain specific stopwords for patent analysis more effectively.