Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (4): 81-89    DOI: 10.11925/infotech.2096-3467.2017.1068
Orginal Article Current Issue | Archive | Adv Search |
Weighted Topic Model for Patent Text Analysis
Yan Yu1,2(),Naixuan Zhao1
1Information Service Department, Nanjing Tech University, Nanjing 210009, China
2Computer Science Department, Southeast University Chengxian College, Nanjing 211816, China
Download: PDF(1140 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study aims to address the issues facing the topic model of patent text analysis such as the inclining to high frequency words and low discrimination rates. [Methods] First, we proposed a word weighting method for the traditional topic model. Then, the modified model assigned different weights to the words, and changed the probability of generating new words. [Results] Compared with traditional methods, the weighted patent topic model could identify the subjects more effectively. [Limitations] The weighting algorithm needs to be validated and optimized with more datasets. [Conclusions] The proposed model could effectively analyze the patent texts.

Key wordsText Analysis      Patent      Weighted Topic Model     
Received: 26 October 2017      Published: 11 May 2018

Cite this article:

Yan Yu,Naixuan Zhao. Weighted Topic Model for Patent Text Analysis. Data Analysis and Knowledge Discovery, 2018, 2(4): 81-89.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.1068     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I4/81

[1] Yoon B, Park Y.A Text-mining-based Patent Network: Analytical Tool for High-technology Trend[J]. Journal of High Technology Management Research, 2004, 15(1): 37-50.
[2] 郭炜强, 戴天, 文贵华. 基于领域知识的专利自动分类[J]. 计算机工程, 2005, 31(23): 52-54.
[2] (Guo Weiqiang, Dai Tian, Wen Guihua.A Patent Classification Method Based on Domain Knowledge[J]. Computer Engineering, 2005, 31(23): 52-54. )
[3] Kim M, Park Y, Yoon J.Generating Patent Development Maps for Technology Monitoring Using Semantic Patent- Topic Analysis[J]. Computers & Industrial Engineering, 2016, 98: 289-299.
[4] 高利丹, 肖国华, 张娴, 等. 共现分析在专利地图中的应用研究[J]. 现代情报, 2009, 29(7): 36-39, 43.
[4] (Gao Lidan, Xiao Guohua, Zhang Xian, et al.The Application Study of Co-occurrence Analysis in Patent Map[J]. Journal of Modern Information, 2009, 29(7): 36-39, 43.)
[5] 张杰, 刘美佳, 翟东升. 基于专利共词分析的RFID领域技术主题研究[J]. 科技管理研究, 2013, 33(10): 129-132.
[5] (Zhang Jie, Liu Meijia, Zhai Dongsheng.Technology Topic in RFID Based on Patent Co-word Analysis[J]. Science and Technology Management Research, 2013, 33(10): 129-132.)
[6] Tang J, Wang B, Yang Y, et al.PatentMiner: Topic-driven Patent Analysis and Mining[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2012: 1366-1374.
[7] Wang B, Liu S, Ding K, et al.Identifying Technological Topics and Institution-Topic Distribution Probability for Patent Competitive Intelligence Analysis: A Case Study in LTE Technology[J]. Scientometrics, 2014, 101(1): 685-704.
[8] Chen H, Zhang G, Lu J, et al.A Fuzzy Approach for Measuring Development of Topics in Patents Using Latent Dirichlet Allocation[C]//Proceedings of the 2015 IEEE International Conference on Fuzzy Systems. IEEE, 2015.
[9] Suominen A, Toivanen H, Sepp?nen M.Firms’ Knowledge Profiles: Mapping Patent Data with Unsupervised Learning[J]. Technological Forecasting & Social Change, 2016, 115: 131-142.
[10] 范宇, 符红光, 文奕. 基于LDA模型的专利信息聚类技术[J]. 计算机应用, 2013, 33(S1): 87-89, 93.
[10] (Fan Yu, Fu Hongguang, Wen Yi.Patent Information Clustering Technique Based on Latent Dirichlet Allocation Model[J]. Journal of Computer Applications, 2013, 33(S1): 87-89, 93.)
[11] 王博, 刘盛博, 丁堃, 等. 基于LDA主题模型的专利内容分析方法[J]. 科研管理, 2015, 36(3): 111-117.
[11] (Wang Bo, Liu Shengbo, Ding Kun, et al.Patent Analysis Method Based on LDA Topic Model[J]. Science Research Management, 2015, 36(3): 111-117.)
[12] 吴菲菲, 张亚茹, 黄鲁成, 等. 基于AToT模型的技术主题多维动态演化分析——以石墨烯技术为例[J]. 图书情报工作, 2017, 61(5): 95-102.
[12] (Wu Feifei, Zhang Yaru, Huang Lucheng, et al.Multi-dimension Dynamic Evolution Analysis of Technology Topics Based on AToT by Taking Graphene Technology as an Example[J]. Library and Information Service, 2017, 61(5): 95-102.)
[13] 廖列法, 勒孚刚. 基于LDA模型和分类号的专利技术演化研究[J]. 现代情报, 2017, 37(5): 13-18.
[13] (Liao Liefa, Le Fugang.Research on Patent Technology Evolution Based on LDA Model and Classification Number[J]. Journal of Modern Information, 2017, 37(5): 13-18.)
[14] 陈亮, 张静, 张海超, 等. 层次主题模型在技术演化分析上的应用研究[J]. 图书情报工作, 2017, 61(5): 103-108.
[14] (Chen Liang, Zhang Jing, Zhang Haichao, et al.Research on Application of Hierarchical Topic Model on Technological Evolution Analysis[J]. Library and Information Service, 2017, 61(5): 103-108.)
[15] Wallach H M.Topic Modeling: Beyond Bag-of-Words[C]// Proceedings of the 23rd International Conference on Machine Learning. ACM, 2006: 977-984.
[16] Wilson A T, Chew P A.Term Weighting Schemes for Latent Dirichlet Allocation[C]// Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 2010: 465-473.
[17] 巴志超, 李纲, 朱世伟. 共现分析中的关键词选择与语义度量方法研究[J]. 情报学报, 2016, 35(2): 197-207.
[17] (Ba Zhichao, Li Gang, Zhu Shiwei.Research on Keyword Selection and Semantic Measurement of Co-word Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(2): 197-207.)
[18] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5): 58-63.
[18] (Tang Xiaobo, Xiang Kun.Hotspot Mining Based LDA Model and Microblog Heat[J]. Library and Information Service, 2014, 58(5): 58-63.)
[19] 李湘东, 巴志超, 黄莉. 一种基于加权LDA模型和多粒度的文本特征选择方法[J]. 现代图书情报技术, 2015(5): 42-49.
[19] (Li Xiangdong, Ba Zhichao, Huang Li.A Text Feature Selection Method Based on Weighted Latent Dirichlet Allocation and Multi-granularity[J]. New Technology of Library and Information Service, 2015(5): 42-49.)
[20] 郝洁, 谢珺, 苏婧琼, 等. 基于词加权LDA算法的无监督情感分类[J]. 智能系统学报, 2016, 11(4): 539-545.
[20] (Hao Jie, Xie Jun, Su Jingqiong, et al.An Unsupervised Approach for Sentiment Classification Based on Weighted Latent Dirichlet Allocation[J]. CAAI Transactions on Intelligent Systems, 2016, 11(4): 539-545.)
[21] Yu Y, Mo L, Wang J.Identifying Topic-Specific Experts on Microblog[J]. KSII Transactions on Internet & Information Systems, 2016, 10(6): 2627-2647.
[22] 覃世安, 李法运. 文本分类中TF-IDF方法的改进研究[J]. 现代图书情报技术, 2013(10): 27-30.
[22] (Qin Shian, Li Fayun.Improved TF-IDF Method in Text Classification[J]. New Technology of Library and Information Service, 2013(10): 27-30.)
[23] 刁倩, 王永成, 张惠惠, 等. VSM中词权重的信息熵算法[J]. 情报学报, 2000, 19(4): 354-358.
[23] (Diao Qian, Wang Yongcheng, Zhang Huihui, et al.A Shannon Entropy Approach to Term Weighting in VSM[J]. Journal of the China Society for Scientific and Technical Information, 2000, 19(4): 354-358.)
[24] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[25] Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Science of the Unites States of America, 2004, 101(S1): 5228-5235.
[1] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[2] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[3] Jiang Wu,Yinghui Zhao,Jiahui Gao. Research on Weibo Opinion Leaders Identification and Analysis in Medical Public Opinion Incidents[J]. 数据分析与知识发现, 2019, 3(4): 53-62.
[4] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
[5] Xueying Wang,Hao Wang,Zixuan Zhang. Recognizing Semantics of Continuous Strings in Chinese Patent Documents[J]. 数据分析与知识发现, 2018, 2(5): 11-22.
[6] Yan Yu,Naixuan Zhao. Choosing Stopwords for Patent Topic Analysis Based on Auxiliary Set[J]. 数据分析与知识发现, 2018, 2(11): 95-103.
[7] Shanshan Jia,Chang Liu,Lianying Sun,Xiaoan Liu,Tao Peng. Patent Classification Based on Multi-feature and Multi-classifier Integration[J]. 数据分析与知识发现, 2017, 1(8): 76-84.
[8] Shuying Li,Shu Fang. Review of Data Analysis Methods in Measuring Technology Fusion and Trend[J]. 数据分析与知识发现, 2017, 1(7): 2-12.
[9] Dongsheng Zhai,Cheng Guo,Jie Zhang,Jun Xia. Recommending Potential R&D Partners Based on Patents[J]. 数据分析与知识发现, 2017, 1(3): 10-20.
[10] Dongsheng Zhai,Dengjin Hu,Jie Zhang,Xijun He,He Liu. Hierarchical Classification Model for Invention Patents[J]. 数据分析与知识发现, 2017, 1(12): 63-73.
[11] Jianhua Hou,Shuang Guo. Analyzing Emerging Issues with Technology Entropy Method Based on Patents: Case Study of Carbon Capture[J]. 数据分析与知识发现, 2017, 1(1): 55-63.
[12] Zhang Jinzhu,Zhang Xiaolin. Radical Innovation Identification Based on Topic Mutation of Scientific Knowledge Cited in Patents[J]. 现代图书情报技术, 2016, 32(7-8): 42-50.
[13] Ren Zhijun,Qiao Xiaodong,Zhang Jiangtao. Discover Emerging Technologies with LDA Model[J]. 现代图书情报技术, 2016, 32(7-8): 60-69.
[14] Wang Miping,Wang Hao,Deng Sanhong,Wu Zhixiang. Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[15] Chen Longlong,Zhang Wende,An Jie. An Enterprise Patent Value Analysis System Based on ACO[J]. 现代图书情报技术, 2016, 32(4): 97-103.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn