Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (12): 63-73    DOI: 10.11925/infotech.2096-3467.2017.0820
Orginal Article Current Issue | Archive | Adv Search |
Hierarchical Classification Model for Invention Patents
Dongsheng Zhai,Dengjin Hu(),Jie Zhang,Xijun He,He Liu
School of Economics and Management, Beijing University of Technology, Beijing 100124, China
Download: PDF(1046 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new model to process patent information based on machine learning classification algorithm, aiming to determine the level of invention. [Methods] First, we extracted the technology feature words from the patent texts. Then, we constructed the patent technology feature vector with an algorithm trained by Word2Vec. Third, we calculated patent text indicators and backward references to build the training set. Finally, we constructed the new model with machine learning classification algorithm. [Results] We retrieved patents in the field of speech recognition technology with the proposed model. We found that the proportion of advanced level to entry level patents was around 1:4, which was in line with the actual situation. [Limitations] The WordNet dictionary will limit the results of extraction. [Conclusions] The proposed model could effectively identify the advanced patents and recommend them to the business owners.

Key wordsPatent Invention Level      Technical Feature Vector      Word Vector      Machine Learning     
Received: 15 August 2017      Published: 29 December 2017

Cite this article:

Dongsheng Zhai,Dengjin Hu,Jie Zhang,Xijun He,He Liu. Hierarchical Classification Model for Invention Patents. Data Analysis and Knowledge Discovery, 2017, 1(12): 63-73.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0820     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I12/63

[1] Mann D L.Better Technology Forecasting Using Systematic Innovation Methods[J]. Technological Forecasting & Social Change, 2003, 70(8): 779-795.
[2] 张剑, 屈丹, 李真. 基于词向量特征的循环神经网络语言模型[J]. 模式识别与人工智能, 2015, 28(4): 299-305.
[2] (Zhang Jian, Qu Dan, Li Zhen.Recurrent Neural Network Language Model Based on Word Vector Features[J]. Pattern Recognition and Artificial Intelligence, 2015, 28(4): 299-305.)
[3] Bengio Y.Deep Learning of Representations: Looking Forward[C]// Proceedings of the 1st International Conference on Statistical Language and Speech Processing, Tarragona, Spain. Berlin, Heidelberg: Springer, 2013: 1-37.
[4] Wolf L, Hanani Y, Bar K, et al.Joint Word2Vec Networks for Bilingual Semantic Representations[J]. International Journal of Computational Linguistics and Applications, 2014, 5(1): 27-44.
[5] Su Z, Xu H, Zhang D, et al.Chinese Sentiment Classification Using a Neural Network Tool—Word2Vec[C]//Proceedings of the 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems, Beijing, China. Piscataway, USA: IEEE, 2014: 1-6.
[6] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
[7] 根里奇·斯拉维奇·阿奇舒勒. 创新算法[M]. 谭培波, 茹海燕, Wenling Babbitt 译. 武汉: 华中科技大学出版社, 2008.
[7] (Genrikh Altshuller.The Innovation Algorithm: TRIZ, Systematic Innovation and Technical Creativity [M]. Translated by Tan Peibo, Ru Haiyan, Wenling Babbitt. Wuhan: Huazhong University of Science and Technology Press, 2008.)
[8] Li Z, Tate D, Lane C, et al.A Framework for Automatic TRIZ Level of Invention Estimation of Patents Using Natural Language Processing, Knowledge-transfer and Patent Citation Metrics[J]. Computer-Aided Design, 2012, 44(10): 987-1010.
[9] 王艳领. 专利等级划分方法的研究与实现[D]. 天津: 河北工业大学, 2011.
[9] (Wang Yanling.Research and Implementation of the Mean of the Patent Classification [D]. Tianjin: Hebei University of Technology, 2011.)
[10] Regazzoni D, Nani R.TRIZ-Based Patent Investigation by Evaluating Inventiveness[A]// Computer-Aided Innovation (CAI)[M]. Springer US, 2008: 247-258.
[11] Verbitsky M.Semantic TRIZ[R]. Boston: Invention Machine Corporation, 2004.
[12] 张惠, 邱清盈, 冯培恩, 等. 产品专利设计知识获取方法研究[J]. 哈尔滨工程大学学报, 2009, 30(7): 785-791.
[12] (Zhang Hui, Qiu Qingying, Feng Peien, et al.An Automated Method for Acquiring Design Knowledge from Product Patents[J]. Journal of Harbin Engineering University, 2009, 30(7): 785-791.)
[13] 袁里驰. 基于改进的隐马尔科夫模型的词性标注方法[J]. 中南大学学报: 自然科学版, 2012, 43(8): 3053-3057.
[13] (Yuan Lichi.A Part-of-Speech Tagging Method Based on Improved Hidden Markov Model[J]. Jouranl of Central South University: Science and Technology, 2012, 43(8): 3053-3057.)
[14] Porter M F.An Algorithm for Suffix Stripping[A]// Readings in Information Retrieval[M]. Morgan Kaufmann Publishers Inc., 2006: 130-137.
[15] 吴思竹, 钱庆, 胡铁军, 等. 词形还原方法及实现工具比较分析[J]. 现代图书情报技术, 2012(3): 27-34.
[15] (Wu Sizhu, Qian Qing, Hu Tiejun, et al.Contrast Analysis of Methods and Tools for Lemmatization[J]. New Technology of Library and Information Service, 2012(3): 27-34.)
[16] 饶齐, 王裴岩, 张桂平. 面向中文专利SAO结构抽取的文本特征比较研究[J]. 北京大学学报: 自然科学版, 2015, 51(2): 349-356.
[16] (Rao Qi, Wang Peiyan, Zhang Guiping.Text Feature Analysis on SAO Structure Extraction from Chinese Patent Literatures[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2): 349-356.)
[17] 李欣, 王静静, 杨梓, 等. 基于SAO结构语义分析的新兴技术识别研究[J]. 情报杂志, 2016, 35(3): 80-84.
[17] (Li Xin, Wang Jingjing, Yang Zi, et al.Identifying Emerging Technologies Based on Subject-Action-Object[J]. Journal of Intelligence, 2016, 35(3): 80-84.)
[18] 许幸荣. 基于SAO结构分析的技术发展路径预测研究[D]. 北京: 北京理工大学, 2015.
[18] (Xu Xingrong.Research on Forecasting Technological Development Paths Based on SAO Structure Analysis[D]. Beijing: Beijing Institute of Technology, 2015.)
[1] Jiahui Hu,An Fang,Wanqing Zhao,Chenliu Yang,Huiling Ren. Annotating Chinese E-Medical Record for Knowledge Discovery[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[2] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[3] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[4] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[5] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[6] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[7] Zixuan Zhang,Hao Wang,Liping Zhu,Sanhong eng. Identifying Risks of HS Codes by China Customs[J]. 数据分析与知识发现, 2019, 3(1): 72-84.
[8] Hui Li,Yaqing Chai. Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
[9] Lina Liu,Jiayin Qi,Zhenping Zhang,Dan Zeng. Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand[J]. 数据分析与知识发现, 2018, 2(9): 10-21.
[10] Xinlei Li,Hao Wang,Xiaomin Liu,Sanhong Deng. Comparing Text Vector Generators for Weibo Short Text Classification[J]. 数据分析与知识发现, 2018, 2(8): 41-50.
[11] Longjia Jia,Bangzuo Zhang. Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo[J]. 数据分析与知识发现, 2018, 2(7): 55-62.
[12] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[13] Li Wang,Lixue Zou,Xiwen Liu. Visualizing Document Correlation Based on LDA Model[J]. 数据分析与知识发现, 2018, 2(3): 98-106.
[14] Xinyue Fan,Lei Cui. Predicting Antineoplastic Drug Targets Based on Network Properties[J]. 数据分析与知识发现, 2018, 2(12): 98-108.
[15] Yang Zhao,Xini Yuan,Yawen Chen,Liqiang Wu. Predicting Conversion Rate of APP Advertising with Machine Learning[J]. 数据分析与知识发现, 2018, 2(11): 2-9.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn