|
|
Weighted Topic Model for Patent Text Analysis |
Yu Yan1,2(), Zhao Naixuan1 |
1Information Service Department, Nanjing Tech University, Nanjing 210009, China 2Computer Science Department, Southeast University Chengxian College, Nanjing 211816, China |
|
|
Abstract [Objective] This study aims to address the issues facing the topic model of patent text analysis such as the inclining to high frequency words and low discrimination rates. [Methods] First, we proposed a word weighting method for the traditional topic model. Then, the modified model assigned different weights to the words, and changed the probability of generating new words. [Results] Compared with traditional methods, the weighted patent topic model could identify the subjects more effectively. [Limitations] The weighting algorithm needs to be validated and optimized with more datasets. [Conclusions] The proposed model could effectively analyze the patent texts.
|
Received: 26 October 2017
Published: 11 May 2018
|
|
[1] |
Yoon B, Park Y.A Text-mining-based Patent Network: Analytical Tool for High-technology Trend[J]. Journal of High Technology Management Research, 2004, 15(1): 37-50.
doi: 10.1016/j.hitech.2003.09.003
|
[2] |
郭炜强, 戴天, 文贵华. 基于领域知识的专利自动分类[J]. 计算机工程, 2005, 31(23): 52-54.
doi: 10.3969/j.issn.1000-3428.2005.23.019
|
[2] |
(Guo Weiqiang, Dai Tian, Wen Guihua.A Patent Classification Method Based on Domain Knowledge[J]. Computer Engineering, 2005, 31(23): 52-54. )
doi: 10.3969/j.issn.1000-3428.2005.23.019
|
[3] |
Kim M, Park Y, Yoon J.Generating Patent Development Maps for Technology Monitoring Using Semantic Patent- Topic Analysis[J]. Computers & Industrial Engineering, 2016, 98: 289-299.
doi: 10.1016/j.cie.2016.06.006
|
[4] |
高利丹, 肖国华, 张娴, 等. 共现分析在专利地图中的应用研究[J]. 现代情报, 2009, 29(7): 36-39, 43.
doi: 10.3969/j.issn.1008-0821.2009.07.011
|
[4] |
(Gao Lidan, Xiao Guohua, Zhang Xian, et al.The Application Study of Co-occurrence Analysis in Patent Map[J]. Journal of Modern Information, 2009, 29(7): 36-39, 43.)
doi: 10.3969/j.issn.1008-0821.2009.07.011
|
[5] |
张杰, 刘美佳, 翟东升. 基于专利共词分析的RFID领域技术主题研究[J]. 科技管理研究, 2013, 33(10): 129-132.
|
[5] |
(Zhang Jie, Liu Meijia, Zhai Dongsheng.Technology Topic in RFID Based on Patent Co-word Analysis[J]. Science and Technology Management Research, 2013, 33(10): 129-132.)
|
[6] |
Tang J, Wang B, Yang Y, et al.PatentMiner: Topic-driven Patent Analysis and Mining[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2012: 1366-1374.
|
[7] |
Wang B, Liu S, Ding K, et al.Identifying Technological Topics and Institution-Topic Distribution Probability for Patent Competitive Intelligence Analysis: A Case Study in LTE Technology[J]. Scientometrics, 2014, 101(1): 685-704.
doi: 10.1007/s11192-014-1342-3
|
[8] |
Chen H, Zhang G, Lu J, et al.A Fuzzy Approach for Measuring Development of Topics in Patents Using Latent Dirichlet Allocation[C]//Proceedings of the 2015 IEEE International Conference on Fuzzy Systems. IEEE, 2015.
|
[9] |
Suominen A, Toivanen H, Seppänen M.Firms’ Knowledge Profiles: Mapping Patent Data with Unsupervised Learning[J]. Technological Forecasting & Social Change, 2016, 115: 131-142.
doi: 10.1016/j.techfore.2016.09.028
|
[10] |
范宇, 符红光, 文奕. 基于LDA模型的专利信息聚类技术[J]. 计算机应用, 2013, 33(S1): 87-89, 93.
|
[10] |
(Fan Yu, Fu Hongguang, Wen Yi.Patent Information Clustering Technique Based on Latent Dirichlet Allocation Model[J]. Journal of Computer Applications, 2013, 33(S1): 87-89, 93.)
|
[11] |
王博, 刘盛博, 丁堃, 等. 基于LDA主题模型的专利内容分析方法[J]. 科研管理, 2015, 36(3): 111-117.
|
[11] |
(Wang Bo, Liu Shengbo, Ding Kun, et al.Patent Analysis Method Based on LDA Topic Model[J]. Science Research Management, 2015, 36(3): 111-117.)
|
[12] |
吴菲菲, 张亚茹, 黄鲁成, 等. 基于AToT模型的技术主题多维动态演化分析——以石墨烯技术为例[J]. 图书情报工作, 2017, 61(5): 95-102.
doi: 10.13266/j.issn.0252-3116.2017.05.013
|
[12] |
(Wu Feifei, Zhang Yaru, Huang Lucheng, et al.Multi-dimension Dynamic Evolution Analysis of Technology Topics Based on AToT by Taking Graphene Technology as an Example[J]. Library and Information Service, 2017, 61(5): 95-102.)
doi: 10.13266/j.issn.0252-3116.2017.05.013
|
[13] |
廖列法, 勒孚刚. 基于LDA模型和分类号的专利技术演化研究[J]. 现代情报, 2017, 37(5): 13-18.
|
[13] |
(Liao Liefa, Le Fugang.Research on Patent Technology Evolution Based on LDA Model and Classification Number[J]. Journal of Modern Information, 2017, 37(5): 13-18.)
|
[14] |
陈亮, 张静, 张海超, 等. 层次主题模型在技术演化分析上的应用研究[J]. 图书情报工作, 2017, 61(5): 103-108.
doi: 10.13266/j.issn.0252-3116.2017.05.014
|
[14] |
(Chen Liang, Zhang Jing, Zhang Haichao, et al.Research on Application of Hierarchical Topic Model on Technological Evolution Analysis[J]. Library and Information Service, 2017, 61(5): 103-108.)
doi: 10.13266/j.issn.0252-3116.2017.05.014
|
[15] |
Wallach H M.Topic Modeling: Beyond Bag-of-Words[C]// Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 977-984.
|
[16] |
Wilson A T, Chew P A.Term Weighting Schemes for Latent Dirichlet Allocation[C]// Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 465-473.
|
[17] |
巴志超, 李纲, 朱世伟. 共现分析中的关键词选择与语义度量方法研究[J]. 情报学报, 2016, 35(2): 197-207.
|
[17] |
(Ba Zhichao, Li Gang, Zhu Shiwei.Research on Keyword Selection and Semantic Measurement of Co-word Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(2): 197-207.)
|
[18] |
唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5): 58-63.
doi: 10.13266/j.issn.0252-3116.2014.05.010
|
[18] |
(Tang Xiaobo, Xiang Kun.Hotspot Mining Based LDA Model and Microblog Heat[J]. Library and Information Service, 2014, 58(5): 58-63.)
doi: 10.13266/j.issn.0252-3116.2014.05.010
|
[19] |
李湘东, 巴志超, 黄莉. 一种基于加权LDA模型和多粒度的文本特征选择方法[J]. 现代图书情报技术, 2015(5): 42-49.
|
[19] |
(Li Xiangdong, Ba Zhichao, Huang Li.A Text Feature Selection Method Based on Weighted Latent Dirichlet Allocation and Multi-granularity[J]. New Technology of Library and Information Service, 2015(5): 42-49.)
|
[20] |
郝洁, 谢珺, 苏婧琼, 等. 基于词加权LDA算法的无监督情感分类[J]. 智能系统学报, 2016, 11(4): 539-545.
doi: 10.11992/tis.201606007
|
[20] |
(Hao Jie, Xie Jun, Su Jingqiong, et al.An Unsupervised Approach for Sentiment Classification Based on Weighted Latent Dirichlet Allocation[J]. CAAI Transactions on Intelligent Systems, 2016, 11(4): 539-545.)
doi: 10.11992/tis.201606007
|
[21] |
Yu Y, Mo L, Wang J.Identifying Topic-Specific Experts on Microblog[J]. KSII Transactions on Internet & Information Systems, 2016, 10(6): 2627-2647.
doi: 10.3837/tiis.2016.06.010
|
[22] |
覃世安, 李法运. 文本分类中TF-IDF方法的改进研究[J]. 现代图书情报技术, 2013(10): 27-30.
|
[22] |
(Qin Shian, Li Fayun.Improved TF-IDF Method in Text Classification[J]. New Technology of Library and Information Service, 2013(10): 27-30.)
|
[23] |
刁倩, 王永成, 张惠惠, 等. VSM中词权重的信息熵算法[J]. 情报学报, 2000, 19(4): 354-358.
doi: 10.3969/j.issn.1000-0135.2000.04.012
|
[23] |
(Diao Qian, Wang Yongcheng, Zhang Huihui, et al.A Shannon Entropy Approach to Term Weighting in VSM[J]. Journal of the China Society for Scientific and Technical Information, 2000, 19(4): 354-358.)
doi: 10.3969/j.issn.1000-0135.2000.04.012
|
[24] |
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
|
[25] |
Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Science of the Unites States of America, 2004, 101(S1): 5228-5235.
doi: 10.1073/pnas.0307752101
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|