|
|
MPMFC: A Traditional Chinese Medicine Patent Classification Model Integrating Network Neighborhood Structural Features and Patent Semantic Features |
Deng Na1,He Xinyang1(),Chen Weijie1,Chen Xu2 |
1School of Computer Science, Hubei University of Technology, Wuhan 430068, China 2School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China |
|
|
Abstract [Objective] To solve the problem of low accuracy in classification models for Traditional Chinese Medicine (TCM) patents due to the complexity of TCM and insufficient extracted information on the characteristics of TCM patents. [Methods] We proposed a classification model for TCM patents called MPMFC (Medicine Patent Multi-feature Fusion Classifier). Firstly, we constructed a TCM patent similarity network based on the similarity information of the patent core fields. Then, we used the Node2Vec algorithm to capture the neighborhood structure information of potential patents from the global structure of the TCM patent similarity network, which was mapped to low-dimensional vectors as additional features. Finally, the attention mechanism was utilized to fuse the patent semantic feature vector pre-trained by RoBERTa-Tiny with their corresponding supplementary features to classify TCM patents automatically. [Results] We examined the MPMFC model on a corpus of 7,000 TCM patents. It achieved the accuracy, recall, and F1 values of 0.8436, 0.8017, and 0.822 1, respectively, which were 1.58%, 2.59%, and 2.11% higher than the baseline classification model. [Limitations] The weight allocation when constructing the similarity network of TCM patents has subjectivity issues. There may be some classification errors when Non-TCM researchers label patents. [Conclusions] The MPMFC model can acquire and learn more comprehensive feature representations from multiple perspectives during TCM patent classification, improving classification accuracy.
|
Received: 05 May 2022
Published: 07 June 2023
|
|
Fund:National Natural Science Foundation of China(61902116) |
Corresponding Authors:
He Xinyang,ORCID:0000-0002-0668-0276,E-mail:hexinyang@foxmail.com
|
[1] |
赵帅眉, 宋江秀, 杜茂波, 等. 浅谈我国经典名方的专利保护现状及思考[J]. 中国中药杂志, 2019, 44(18): 4067-4071.
doi: 10.19540/j.cnki.cjcmm.20190629.305
pmid: 31872747
|
[1] |
(Zhao Shuaimei, Song Jiangxiu, Du Maobo, et al. Current Situation and Consideration of Patent Protection in Classical Representative Famous Prescriptions in China[J]. China Journal of Chinese Materia Medica, 2019, 44(18): 4067-4071.)
doi: 10.19540/j.cnki.cjcmm.20190629.305
pmid: 31872747
|
[2] |
Deng N, Fu H, Chen X. Named Entity Recognition of Traditional Chinese Medicine Patents Based on BiLSTM-CRF[J]. Wireless Communications and Mobile Computing, 2021, 2021: 1-12.
|
[3] |
王凯, 谢小丽, 胡璇, 等. 中药防治糖尿病专利信息挖掘及其用药规律分析[J]. 中国中医药图书情报杂志, 2022, 46(6): 8-16.
|
[3] |
(Wang Kai, Xie Xiaoli, Hu Xuan, et al. Patent Information Mining and Drug Law Analysis of Traditional Chinese Medicine for the Prevention and Treatment of Diabetes[J]. Chinese Journal of Library and Information Science for Traditional Chinese Medicine, 2022, 46(6): 8-16.)
|
[4] |
刘小玲, 谭宗颖. 基于专利多属性融合的技术主题划分方法研究[J]. 数据分析与知识发现, 2022, 6(2/3): 45-54.
|
[4] |
(Liu Xiaoling, Tan Zongying. Clustering Technology Topics Based on Patent Multi-Attribute Fusion[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 45-54.)
|
[5] |
周成, 魏红芹. 专利价值评估与分类研究——基于自组织映射支持向量机[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
|
[5] |
(Zhou Cheng, Wei Hongqin. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. Data Analysis and Knowledge Discovery, 2019, 3(5): 117-124.)
|
[6] |
Yang Z C, Yang D Y, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2016: 1480-1489.
|
[7] |
包翔, 刘桂锋, 崔靖华. 多示例多标签学习在中文专利自动分类中的应用研究[J]. 图书情报工作, 2021, 65(8): 107-113.
doi: 10.13266/j.issn.0252-3116.2021.08.011
|
[7] |
(Bao Xiang, Liu Guifeng, Cui Jinghua. Application of Multi Instance Multi Label Learning in Chinese Patent Automatic Classification[J]. Library and Information Service, 2021, 65(8): 107-113.)
doi: 10.13266/j.issn.0252-3116.2021.08.011
|
[8] |
符川川, 陈国华, 袁勤俭. 基于机器学习的专利质量分析与分类预测研究——以区块链技术专利为例[J]. 现代情报, 2021, 41(7): 110-120.
doi: 10.3969/j.issn.1008-0821.2021.07.011
|
[8] |
(Fu Chuanchuan, Chen Guohua, Yuan Qinjian. Research on Patent Quality Analysis and Classification Forecast Based on Machine Learning—Taking Blockchain as an Example[J]. Journal of Modern Information, 2021, 41(7): 110-120.)
doi: 10.3969/j.issn.1008-0821.2021.07.011
|
[9] |
郑永锋. 中医药专利大全[M]. 北京: 中国中医药出版社, 1994.
|
[9] |
(Zheng Yongfeng. Patent Collection of Traditional Chinese Medicine[M]. Beijing: China Press of Traditional Chinese Medicine Co., Ltd, 1994.)
|
[10] |
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. ACM, 2013: 3111-3119.
|
[11] |
Le Q V, Mikolov T. Distributed Representations of Sentences and Documents[OL]. arXiv Preprint, arXiv: 1405.4053.
|
[12] |
Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2014: 1532-1543.
|
[13] |
Peters M, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2018: 2227-2237.
|
[14] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6000-6010.
|
[15] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
|
[16] |
刘红光, 马双刚, 刘桂锋. 基于机器学习的专利文本分类算法研究综述[J]. 图书情报研究, 2016, 9(3): 79-86.
|
[16] |
(Liu Hongguang, Ma Shuanggang, Liu Guifeng. A Review of Research on Patent Document Classification Algorithms Based on Machine Learning[J]. Library and Information Studies, 2016, 9(3): 79-86.)
|
[17] |
苏金树, 张博锋, 徐昕. 基于机器学习的文本分类技术研究进展[J]. 软件学报, 2006, 17(9): 1848-1859.
doi: 10.1360/jos171848
|
[17] |
(Su Jinshu, Zhang Bofeng, Xu Xin. Advances in Machine Learning Based Text Categorization[J]. Journal of Software, 2006, 17(9): 1848-1859.)
doi: 10.1360/jos171848
|
[18] |
Wu C H, Ken Y, Huang T. Patent Classification System Using a New Hybrid Genetic Algorithm Support Vector Machine[J]. Applied Soft Computing, 2010, 10(4): 1164-1177.
doi: 10.1016/j.asoc.2009.11.033
|
[19] |
廖列法, 勒孚刚, 朱亚兰. LDA模型在专利文本分类中的应用[J]. 现代情报, 2017, 37(3): 35-39.
doi: 10.3969/j.issn.1008-0821.2017.03.007
|
[19] |
(Liao Liefa, Le Fugang, Zhu Yalan. The Application of LDA Model in Patent Text Classification[J]. Journal of Modern Information, 2017, 37(3): 35-39.)
doi: 10.3969/j.issn.1008-0821.2017.03.007
|
[20] |
胡学钢, 杨恒宇, 林耀进, 等. 基于协同过滤的专利TRIZ分类方法[J]. 情报学报, 2018, 37(5): 512-518.
|
[20] |
(Hu Xuegang, Yang Hengyu, Lin Yaojin, et al. Study on Classification of Patents Collaborative Filtering Oriented to TRIZ[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(5): 512-518.)
|
[21] |
吕璐成, 韩涛, 周健, 等. 基于深度学习的中文专利自动分类方法研究[J]. 图书情报工作, 2020, 64(10): 75-85.
doi: 10.13266/j.issn.0252-3116.2020.10.009
|
[21] |
(Lyu Lucheng, Han Tao, Zhou Jian, et al. Research on the Method of Chinese Patent Automatic Classification Based on Deep Learning[J]. Library and Information Service, 2020, 64(10): 75-85.)
doi: 10.13266/j.issn.0252-3116.2020.10.009
|
[22] |
Li S B, Hu J, Cui Y X, et al. DeepPatent: Patent Classification with Convolutional Neural Networks and Word Embedding[J]. Scientometrics, 2018, 117(2): 721-744.
doi: 10.1007/s11192-018-2905-5
|
[23] |
马建红, 王瑞杨, 姚爽, 等. 基于深度学习的专利分类方法[J]. 计算机工程, 2018, 44(10): 209-214.
|
[23] |
(Ma Jianhong, Wang Ruiyang, Yao Shuang, et al. Patent Classification Method Based on Depth Learning[J]. Computer Engineering, 2018, 44(10): 209-214.)
|
[24] |
温超东, 曾诚, 任俊伟, 等. 结合ALBERT和双向门控循环单元的专利文本分类[J]. 计算机应用, 2021, 41(2): 407-412.
doi: 10.11772/j.issn.1001-9081.2020050730
|
[24] |
(Wen Chaodong, Zeng Cheng, Ren Junwei, et al. Patent Text Classification Based on ALBERT and Bidirectional Gated Recurrent Unit[J]. Journal of Computer Applications, 2021, 41(2): 407-412.)
doi: 10.11772/j.issn.1001-9081.2020050730
|
[25] |
佟昕瑀, 赵蕊洁, 路永和. 基于预训练模型的多标签专利分类研究[J]. 数据分析与知识发现, 2022, 6(2/3): 129-137.
|
[25] |
(Tong Xinyu, Zhao Ruijie, Lu Yonghe. Multi-Label Patent Classification with Pre-Training Model[J]. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 129-137.)
|
[26] |
Lee J S, Hsiang J. Patent Classification by Fine-Tuning BERT Language Model[J]. World Patent Information, 2020, 61: 101965.
doi: 10.1016/j.wpi.2020.101965
|
[27] |
Cui Y M, Che W X, Liu T, et al. Pre-Training with Whole Word Masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504-3514.
doi: 10.1109/TASLP.2021.3124365
|
[28] |
Grover A, Leskovec J. Node2Vec:Scalable Feature Learning for Networks[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016: 855-864.
|
[29] |
孙海生. 文献耦合网络与同被引网络比较实证研究——以Scientometrics载文为例[J]. 现代情报, 2019, 39(4): 134-142.
doi: 10.3969/j.issn.1008-0821.2019.04.016
|
[29] |
(Sun Haisheng. Empirical Research Comparison of Bibliographic Coupling Network and Co-Citation Network—A Case Study of Articles Published in Scientometrics[J]. Journal of Modern Information, 2019, 39(4): 134-142.)
doi: 10.3969/j.issn.1008-0821.2019.04.016
|
[30] |
Xu L, Zhang X W, Dong Q Q. CLUECorpus2020: A Large-Scale Chinese Corpus for Pre-Training Language Model[OL]. arXiv Preprint, arXiv: 2003.01355.
|
[31] |
Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
|
[32] |
Miech A, Laptev I, Sivic J. Learnable Pooling with Context Gating for Video Classification[OL]. arXiv Preprint, arXiv: 1706.06905.
|
[33] |
Yadrintsev V, Bakarov A, Suvorov R, et al. Fast and Accurate Patent Classification in Search Engines[J]. Journal of Physics Conference Series, 2018, 1117(1): 012004.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|