[Objective] This paper merged the artificial and machine features of scientific and technological literature with the help of deep learning method, aiming to improve the efficiency of knowledge element extraction. [Methods] We constructed 26 artificial features based on the characteristics of these literature, which mainly included texts, sentences and words. Then, we combinted these features with Word2Vec, one-hot and other machine features using LSTM, CNN and BERT models and extracted knowledge elements. [Results] The accuracy of feature vertical merging for knowledge element extraction reached 0.91, which was 6 percentage points higher than the performance of most traditional methods. [Limitations] The deep learning model needs to be optimized to process larger amount of data. [Conclusions] The proposed method could effectively improve the results of knowledge element extraction.
柴庆凤, 史霖炎, 梅珊, 熊海涛, 贺惠新. 基于人工特征和机器特征融合的科技文献知识元抽取*[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features. Data Analysis and Knowledge Discovery, 2021, 5(8): 132-144.
( He Huixin, Liu Lijuan. Research on Indexing System of Research Objects of Scientific and Technological Literature Based on Active Learning[J]. Data Analysis and Knowledge Discovery, 2016, 32(3):67-73.)
化柏林. 国内外知识抽取研究进展综述[J]. 情报杂志, 2008, 27(2):60-62.
( Hua Bolin. Development of Research on Knowledge Extraction in China and Overseas[J]. Journal of Information, 2008, 27(2):60-62.)
( Zhu Ling, Zhu Yan, Yang Feng. Knowledge Extraction Research for Semantic Expression of Diseases in Chinese Medicine[J]. World Science and Technology-Modernization of Traditional Chinese Medicine, 2016, 18(8):1241-1250.)
( Shi Xiang, Liu Ping. Extraction and Representation of Domain Knowledge with Semantic Description Model and Knowledge Elements——Case Study of Information Retrieval[J]. Data Analysis and Knowledge Discovery, 2021, 5(4):123-133.)
( Zhang Jinzhu, Hu Yiming. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. Data Analysis and Knowledge Discovery, 2019, 3(5):68-76.)
Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]// Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. 2004: 22-25.
Huang Z X, Xie Z P. A Patent Keywords Extraction Method Using TextRank Model with Prior Public Knowledge[J]. Complex & Intelligent Systems, 2021. https://doi.org/10.1007/s40747-021-00343-8.
Liu S, He T H, Dai J H. A Survey of CRF Algorithm Based Knowledge Extraction of Elementary Mathematics in Chinese[J]. Mobile Networks and Applications, 2021. https://doi.org/10.1007/s11036-020-01725-x.
Zelenko D, Aone C, Richardella A. Kernel Methods for Relation Extraction[J]. Journal of Machine Learning Research, 2003, 3:1083-1106.
Lin Y F, Tsai T, Chou W C, et al. A Maximum Entropy Approach to Biomedical Named Entity Recognition[C]// Proceedings of the 4th International Conference on Data Mining in Bioinformatics. 2008: 56-61.
Arovski S, Osipyan H, Oladele M I, et al. Automatic Knowledge Extraction of Any Chatbot from Conversation[J]. Expert Systems with Applications, 2019, 137:343-348.
Londhe S N, Shah S. A Novel Approach for Knowledge Extraction from Artificial Neural Networks[J]. ISH Journal of Hydraulic Engineering, 2019, 25(3):269-281.
Jiao Y R, Qu Q X. A Proposal for Kansei Knowledge Extraction Method Based on Natural Language Processing Technology and Online Product Reviews[J]. Computers in Industry, 2019, 108:1-11.
Li P L, Yuan Z M, Tu We B, et al. Medical Knowledge Extraction and Analysis from Electronic Medical Records Using Deep Learning[J]. Chinese Medical Science Journal, 2019, 34(2):133-139.
( Sun An. Research on Ensemble Learning of Different Input Feature Combinations and Transdcutive Learning in Sequense Labeling Modeling—A Case Study about Clinical Named Entity Recognition of CCKS-2018[J]. Journal of Intelligence, 2019, 38(10):176-184.)
( Zhang Chi, Zhang Guanhong. Text Clustering Algorithm Based on Word Vector and Multi-feature Semantic Distance[J]. Journal of Chongqing University of Science and Technology (Natural Science Edition), 2019, 21(3):69-72, 77.)
( Wang Bin, Guo Jianyi, Xian Yantuan, et al. Entity Relation Extraction in Chinese Domain Based on Distant Supervison with Multi-feature Fusion[J]. Pattern Recognition and Artificial Intelligence, 2019, 32(2):133-143.)
( Shi Yijin, Wang Zhongyi, Shen Xueying, et al. Extraction of Knowledge Elements in Scientific Literature Based on Sequential Patterns[J]. Information Studies: Theory & Application, 2020, 43(11):144-149.)
Chang X, Zheng Q H. Knowledge Element Extraction for Knowledge-based Learning Resources Organization[M]. Heidelberg: Spinger, 2008: 102-113.
黎丹雨. 基于多特征融合的电影推荐系统[J]. 计算机与现代化, 2019(8):121-126.
( Li Danyu. Movie Recommendation System Based on Multi-feature Fusion[J]. Computer and Modernization, 2019(8):121-126.)
王哲. 多特征融合的深层网络图像高级语义识别方法研究[D]. 太原: 太原理工大学, 2019.
( Wang Zhe. Research on Image Advanced Semantic Recognition Method of Deep Network with Multi-feature Fusion[D]. Taiyuan: Taiyuan University of Technology, 2019.)
( Maimaitiayifu, Wushouer Silamu, Aisikaer Aimudoula, et al. Uyghur Sentiment Classification Based on Multi-features and Deep Neural Network[J]. Application Research of Computers, 2020, 37(5):1368-1374, 1379.)
Zhong W F, Fang X, Fan C H, et al. Fusion of Deep Shallow Features and Models for Speaker Recognition[J]. Chinese Journal of Acoustics, 2018, 43(2):263-272.
Yang L Z, Ban X J, Mukeshimana M, et al. Multiple Feature Fusion for Unimodal Emotion Recognition[J]. The Journal of China Universities of Posts and Telecommunications, 2019, 26(2):17-29.
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[C]// Proceedings of the 1st International Conference on Learning Representations. 2013.