Patent Classification Based on Multi-feature and Multi-classifier Integration
Jia Shanshan1, Liu Chang2, Sun Lianying3, Liu Xiaoan1, Peng Tao2()
1College of Intellectualized City, Beijing Union University, Beijing 100101, China 2College of Robotics, Beijing Union University, Beijing 100101, China 3College of Urban Rail Transit and Logistics, Beijing Union University, Beijing 100101, China
[Objective] This paper aims to automatically allocate correct IPC to patent applications with the help of multi-feature and multi-classifier integration method. [Methods] First, we extracted the TFIDF features of all dictionaries and information gains, as well as the vector features of document and topic models from patent applications. Then, we used the collected data to train the NB, SVM, and AdaBoost classifiers. Finally, we established the feature-class matrix and predicted the final IPC with the F1 weight matrix. [Results] We examined our new method with 10 patent classes from 2014 to 2016 in the field of engine and pump. The accuracy of top prediction, all categories, and two guesses were 78.9%, 80.1% and 91.2% respectively. [Limitations] The size of training corpus is limited, which only includes 3 years patent data. [Conclusions] The proposed method could effectively improve the accuracy of patent classification in the field of engine and pump.
(Liu Guifeng, Wang Manrong, Liu Haijun.Probabilistic Hypergraph Based Semi-supervised Learning Method for Patent Document Categorization[J]. Journal of Intelligence, 2016, 35(9): 187-191, 173.)
Venugopalan S, Rai V.Topic Based Classification and Pattern Identification in Patents[J]. Technological Forecasting and Social Change, 2015, 94: 236-250.
(Miu Jianming, Jia Guangwei, Zhang Yunliang.The Rapid Automatic Categorization of Patent Based on Abstract Text[J]. Information Studies: Theory & Application, 2016, 39(8): 103-105, 91.)
Le Q V, Mikolov T.Distributed Representations of Sentences and Document[OL]. arXiv Preprint, arXiv: 1405.4053.
Mikolov T.Statistical Language Models Based on Neural Networks[D]. Brno University of Technology, 2012.
Turian J, Ratinov L, Bengio Y.Word Representations: A Simple and General Method for Semi-supervised Learning[C]////Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 384-394.
Rosen-Zvi M, Griffiths M, Steyvers M, et al.The Author-topic Model for Authors and Documents[C]//// Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2012: 487-494.
Fall C J, Törcsvári A, Benzineb K, et al.Automated Categorization in the International Patent Classification[J] . ACM SIGIR Forum, 2003, 37(1): 10-25.