Patent Classification Based on Multi-feature and Multi-classifier Integration
Jia Shanshan1, Liu Chang2, Sun Lianying3, Liu Xiaoan1, Peng Tao2()
1College of Intellectualized City, Beijing Union University, Beijing 100101, China 2College of Robotics, Beijing Union University, Beijing 100101, China 3College of Urban Rail Transit and Logistics, Beijing Union University, Beijing 100101, China
[Objective] This paper aims to automatically allocate correct IPC to patent applications with the help of multi-feature and multi-classifier integration method. [Methods] First, we extracted the TFIDF features of all dictionaries and information gains, as well as the vector features of document and topic models from patent applications. Then, we used the collected data to train the NB, SVM, and AdaBoost classifiers. Finally, we established the feature-class matrix and predicted the final IPC with the F1 weight matrix. [Results] We examined our new method with 10 patent classes from 2014 to 2016 in the field of engine and pump. The accuracy of top prediction, all categories, and two guesses were 78.9%, 80.1% and 91.2% respectively. [Limitations] The size of training corpus is limited, which only includes 3 years patent data. [Conclusions] The proposed method could effectively improve the accuracy of patent classification in the field of engine and pump.
(Cai Hong, Jiang Renai, Wu Kai.Contribution of Intellectual Property Protection to the Technological Progresses in China[J]. Journal of Systems & Management, 2015, 24(3): 314-320.)
(Liu Guifeng, Wang Manrong, Liu Haijun.Probabilistic Hypergraph Based Semi-supervised Learning Method for Patent Document Categorization[J]. Journal of Intelligence, 2016, 35(9): 187-191, 173.)
doi: 10.3969/j.issn.1002-1965.2016.09.033
[4]
Venugopalan S, Rai V.Topic Based Classification and Pattern Identification in Patents[J]. Technological Forecasting and Social Change, 2015, 94: 236-250.
doi: 10.1016/j.techfore.2014.10.006
(Liao Liefa, Le Fugang, Zhu Yalan.The Application of LDA Model in Patent Text Classification[J]. Journal of Modern Information, 2017, 37(3): 35-39.)
[6]
马双刚. 基于深度学习理论与方法的中文专利自动分类研究[D]. 镇江: 江苏大学, 2016.
[6]
(Ma Shuanggang.The Study of Automatic Chinese Patent Classification Based on Deep Learning Theory and Method [D]. Zhenjiang: Jiangsu University, 2016. )
[7]
孔旗. 基于并行机器学习的大规模专利分类[D]. 上海: 上海交通大学, 2011.
[7]
(Kong Qi.Large-scale Patent Classification Based on Parallel Machine Learning [D]. Shanghai: Shanghai Jiaotong University, 2011.)
(Miu Jianming, Jia Guangwei, Zhang Yunliang.The Rapid Automatic Categorization of Patent Based on Abstract Text[J]. Information Studies: Theory & Application, 2016, 39(8): 103-105, 91.)
[9]
Le Q V, Mikolov T.Distributed Representations of Sentences and Document[OL]. arXiv Preprint, arXiv: 1405.4053.
[10]
Mikolov T.Statistical Language Models Based on Neural Networks[D]. Brno University of Technology, 2012.
[11]
Turian J, Ratinov L, Bengio Y.Word Representations: A Simple and General Method for Semi-supervised Learning[C]////Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 384-394.
[12]
Rosen-Zvi M, Griffiths M, Steyvers M, et al.The Author-topic Model for Authors and Documents[C]//// Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2012: 487-494.
[13]
Fall C J, Törcsvári A, Benzineb K, et al.Automated Categorization in the International Patent Classification[J] . ACM SIGIR Forum, 2003, 37(1): 10-25.
doi: 10.1145/945546.945547