1Information Service Department, Nanjing Tech University, Nanjing 210009, China 2Computer Science Department, Southeast University Chengxian College, Nanjing 211816, China
[Objective] This study aims to address the issues facing the topic model of patent text analysis such as the inclining to high frequency words and low discrimination rates. [Methods] First, we proposed a word weighting method for the traditional topic model. Then, the modified model assigned different weights to the words, and changed the probability of generating new words. [Results] Compared with traditional methods, the weighted patent topic model could identify the subjects more effectively. [Limitations] The weighting algorithm needs to be validated and optimized with more datasets. [Conclusions] The proposed model could effectively analyze the patent texts.
俞琰, 赵乃瑄. 加权专利文本主题模型研究*[J]. 数据分析与知识发现, 2018, 2(4): 81-89.
Yu Yan,Zhao Naixuan. Weighted Topic Model for Patent Text Analysis. Data Analysis and Knowledge Discovery, 2018, 2(4): 81-89.
(Guo Weiqiang, Dai Tian, Wen Guihua.A Patent Classification Method Based on Domain Knowledge[J]. Computer Engineering, 2005, 31(23): 52-54. )
doi: 10.3969/j.issn.1000-3428.2005.23.019
[3]
Kim M, Park Y, Yoon J.Generating Patent Development Maps for Technology Monitoring Using Semantic Patent- Topic Analysis[J]. Computers & Industrial Engineering, 2016, 98: 289-299.
doi: 10.1016/j.cie.2016.06.006
(Gao Lidan, Xiao Guohua, Zhang Xian, et al.The Application Study of Co-occurrence Analysis in Patent Map[J]. Journal of Modern Information, 2009, 29(7): 36-39, 43.)
doi: 10.3969/j.issn.1008-0821.2009.07.011
(Zhang Jie, Liu Meijia, Zhai Dongsheng.Technology Topic in RFID Based on Patent Co-word Analysis[J]. Science and Technology Management Research, 2013, 33(10): 129-132.)
[6]
Tang J, Wang B, Yang Y, et al.PatentMiner: Topic-driven Patent Analysis and Mining[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2012: 1366-1374.
[7]
Wang B, Liu S, Ding K, et al.Identifying Technological Topics and Institution-Topic Distribution Probability for Patent Competitive Intelligence Analysis: A Case Study in LTE Technology[J]. Scientometrics, 2014, 101(1): 685-704.
doi: 10.1007/s11192-014-1342-3
[8]
Chen H, Zhang G, Lu J, et al.A Fuzzy Approach for Measuring Development of Topics in Patents Using Latent Dirichlet Allocation[C]//Proceedings of the 2015 IEEE International Conference on Fuzzy Systems. IEEE, 2015.
[9]
Suominen A, Toivanen H, Seppänen M.Firms’ Knowledge Profiles: Mapping Patent Data with Unsupervised Learning[J]. Technological Forecasting & Social Change, 2016, 115: 131-142.
doi: 10.1016/j.techfore.2016.09.028
(Fan Yu, Fu Hongguang, Wen Yi.Patent Information Clustering Technique Based on Latent Dirichlet Allocation Model[J]. Journal of Computer Applications, 2013, 33(S1): 87-89, 93.)
(Wu Feifei, Zhang Yaru, Huang Lucheng, et al.Multi-dimension Dynamic Evolution Analysis of Technology Topics Based on AToT by Taking Graphene Technology as an Example[J]. Library and Information Service, 2017, 61(5): 95-102.)
doi: 10.13266/j.issn.0252-3116.2017.05.013
(Liao Liefa, Le Fugang.Research on Patent Technology Evolution Based on LDA Model and Classification Number[J]. Journal of Modern Information, 2017, 37(5): 13-18.)
(Chen Liang, Zhang Jing, Zhang Haichao, et al.Research on Application of Hierarchical Topic Model on Technological Evolution Analysis[J]. Library and Information Service, 2017, 61(5): 103-108.)
doi: 10.13266/j.issn.0252-3116.2017.05.014
[15]
Wallach H M.Topic Modeling: Beyond Bag-of-Words[C]// Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 977-984.
[16]
Wilson A T, Chew P A.Term Weighting Schemes for Latent Dirichlet Allocation[C]// Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 465-473.
(Ba Zhichao, Li Gang, Zhu Shiwei.Research on Keyword Selection and Semantic Measurement of Co-word Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(2): 197-207.)
(Tang Xiaobo, Xiang Kun.Hotspot Mining Based LDA Model and Microblog Heat[J]. Library and Information Service, 2014, 58(5): 58-63.)
doi: 10.13266/j.issn.0252-3116.2014.05.010
(Li Xiangdong, Ba Zhichao, Huang Li.A Text Feature Selection Method Based on Weighted Latent Dirichlet Allocation and Multi-granularity[J]. New Technology of Library and Information Service, 2015(5): 42-49.)
(Hao Jie, Xie Jun, Su Jingqiong, et al.An Unsupervised Approach for Sentiment Classification Based on Weighted Latent Dirichlet Allocation[J]. CAAI Transactions on Intelligent Systems, 2016, 11(4): 539-545.)
doi: 10.11992/tis.201606007
[21]
Yu Y, Mo L, Wang J.Identifying Topic-Specific Experts on Microblog[J]. KSII Transactions on Internet & Information Systems, 2016, 10(6): 2627-2647.
doi: 10.3837/tiis.2016.06.010
(Diao Qian, Wang Yongcheng, Zhang Huihui, et al.A Shannon Entropy Approach to Term Weighting in VSM[J]. Journal of the China Society for Scientific and Technical Information, 2000, 19(4): 354-358.)
doi: 10.3969/j.issn.1000-0135.2000.04.012
[24]
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[25]
Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Science of the Unites States of America, 2004, 101(S1): 5228-5235.
doi: 10.1073/pnas.0307752101