[Objective] This paper proposes a new model to process patent information based on machine learning classification algorithm, aiming to determine the level of invention. [Methods] First, we extracted the technology feature words from the patent texts. Then, we constructed the patent technology feature vector with an algorithm trained by Word2Vec. Third, we calculated patent text indicators and backward references to build the training set. Finally, we constructed the new model with machine learning classification algorithm. [Results] We retrieved patents in the field of speech recognition technology with the proposed model. We found that the proportion of advanced level to entry level patents was around 1:4, which was in line with the actual situation. [Limitations] The WordNet dictionary will limit the results of extraction. [Conclusions] The proposed model could effectively identify the advanced patents and recommend them to the business owners.
Mann D L.Better Technology Forecasting Using Systematic Innovation Methods[J]. Technological Forecasting & Social Change, 2003, 70(8): 779-795.
doi: 10.1016/S0040-1625(02)00357-8
(Zhang Jian, Qu Dan, Li Zhen.Recurrent Neural Network Language Model Based on Word Vector Features[J]. Pattern Recognition and Artificial Intelligence, 2015, 28(4): 299-305.)
doi: 10.16451/j.cnki.issn1003-6059.201504002
[3]
Bengio Y.Deep Learning of Representations: Looking Forward[C]// Proceedings of the 1st International Conference on Statistical Language and Speech Processing, Tarragona, Spain. Berlin, Heidelberg: Springer, 2013: 1-37.
[4]
Wolf L, Hanani Y, Bar K, et al.Joint Word2Vec Networks for Bilingual Semantic Representations[J]. International Journal of Computational Linguistics and Applications, 2014, 5(1): 27-44.
[5]
Su Z, Xu H, Zhang D, et al.Chinese Sentiment Classification Using a Neural Network Tool—Word2Vec[C]//Proceedings of the 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems, Beijing, China. Piscataway, USA: IEEE, 2014: 1-6.
[6]
Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
(Genrikh Altshuller.The Innovation Algorithm: TRIZ, Systematic Innovation and Technical Creativity [M]. Translated by Tan Peibo, Ru Haiyan, Wenling Babbitt. Wuhan: Huazhong University of Science and Technology Press, 2008.)
[8]
Li Z, Tate D, Lane C, et al.A Framework for Automatic TRIZ Level of Invention Estimation of Patents Using Natural Language Processing, Knowledge-transfer and Patent Citation Metrics[J]. Computer-Aided Design, 2012, 44(10): 987-1010.
doi: 10.1016/j.cad.2011.12.006
[9]
王艳领. 专利等级划分方法的研究与实现[D]. 天津: 河北工业大学, 2011.
[9]
(Wang Yanling.Research and Implementation of the Mean of the Patent Classification [D]. Tianjin: Hebei University of Technology, 2011.)
(Yuan Lichi.A Part-of-Speech Tagging Method Based on Improved Hidden Markov Model[J]. Jouranl of Central South University: Science and Technology, 2012, 43(8): 3053-3057.)
[14]
Porter M F.An Algorithm for Suffix Stripping[A]// Readings in Information Retrieval[M]. Morgan Kaufmann Publishers Inc., 2006: 130-137.
(Wu Sizhu, Qian Qing, Hu Tiejun, et al.Contrast Analysis of Methods and Tools for Lemmatization[J]. New Technology of Library and Information Service, 2012(3): 27-34.)
(Rao Qi, Wang Peiyan, Zhang Guiping.Text Feature Analysis on SAO Structure Extraction from Chinese Patent Literatures[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2): 349-356.)
doi: 10.13209/j.0479-8023.2015.049
(Li Xin, Wang Jingjing, Yang Zi, et al.Identifying Emerging Technologies Based on Subject-Action-Object[J]. Journal of Intelligence, 2016, 35(3): 80-84.)
[18]
许幸荣. 基于SAO结构分析的技术发展路径预测研究[D]. 北京: 北京理工大学, 2015.
[18]
(Xu Xingrong.Research on Forecasting Technological Development Paths Based on SAO Structure Analysis[D]. Beijing: Beijing Institute of Technology, 2015.)