[Objective] This paper merged the artificial and machine features of scientific and technological literature with the help of deep learning method, aiming to improve the efficiency of knowledge element extraction. [Methods] We constructed 26 artificial features based on the characteristics of these literature, which mainly included texts, sentences and words. Then, we combinted these features with Word2Vec, one-hot and other machine features using LSTM, CNN and BERT models and extracted knowledge elements. [Results] The accuracy of feature vertical merging for knowledge element extraction reached 0.91, which was 6 percentage points higher than the performance of most traditional methods. [Limitations] The deep learning model needs to be optimized to process larger amount of data. [Conclusions] The proposed method could effectively improve the results of knowledge element extraction.
柴庆凤, 史霖炎, 梅珊, 熊海涛, 贺惠新. 基于人工特征和机器特征融合的科技文献知识元抽取*[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features. Data Analysis and Knowledge Discovery, 2021, 5(8): 132-144.
( He Huixin, Liu Lijuan. Research on Indexing System of Research Objects of Scientific and Technological Literature Based on Active Learning[J]. Data Analysis and Knowledge Discovery, 2016, 32(3):67-73.)
[4]
化柏林. 国内外知识抽取研究进展综述[J]. 情报杂志, 2008, 27(2):60-62.
[4]
( Hua Bolin. Development of Research on Knowledge Extraction in China and Overseas[J]. Journal of Information, 2008, 27(2):60-62.)
( Feng Qingwen. Analysis on Status of Knowledge Extraction in China[J]. Journal of Changzhou Vocational College of Information Technology, 2017, 16(2):32-36.)
( Zhu Ling, Zhu Yan, Yang Feng. Knowledge Extraction Research for Semantic Expression of Diseases in Chinese Medicine[J]. World Science and Technology-Modernization of Traditional Chinese Medicine, 2016, 18(8):1241-1250.)
( Ding Junjun, Zheng Yanning, Hua Bolin. Rule Based Attribute Extraction of Academic Concepts[J]. Information Studies:Theory & Application, 2011, 34(12):10-14, 33.)
[8]
Alam M, Gangemi A, Presutti V, et al. Semantic Role Labeling for Knowledge Graph Extraction from Text[J]. Progress in Artificial Intelligence, 2021. https://doi.org/10.1007/s13748-021-00241-7.
( Shi Xiang, Liu Ping. Extraction and Representation of Domain Knowledge with Semantic Description Model and Knowledge Elements——Case Study of Information Retrieval[J]. Data Analysis and Knowledge Discovery, 2021, 5(4):123-133.)
( Wang Zhongyi, Shen Xueying, Huang Jing. Research on Extraction of Method Knowledge Element in Scientific Literature[J]. Information Science, 2021, 39(1):13-20.)
( Zhang Jinzhu, Hu Yiming. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. Data Analysis and Knowledge Discovery, 2019, 3(5):68-76.)
[13]
Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]// Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. 2004: 22-25.
[14]
Huang Z X, Xie Z P. A Patent Keywords Extraction Method Using TextRank Model with Prior Public Knowledge[J]. Complex & Intelligent Systems, 2021. https://doi.org/10.1007/s40747-021-00343-8.
[15]
Liu S, He T H, Dai J H. A Survey of CRF Algorithm Based Knowledge Extraction of Elementary Mathematics in Chinese[J]. Mobile Networks and Applications, 2021. https://doi.org/10.1007/s11036-020-01725-x.
[16]
Zelenko D, Aone C, Richardella A. Kernel Methods for Relation Extraction[J]. Journal of Machine Learning Research, 2003, 3:1083-1106.
[17]
Lin Y F, Tsai T, Chou W C, et al. A Maximum Entropy Approach to Biomedical Named Entity Recognition[C]// Proceedings of the 4th International Conference on Data Mining in Bioinformatics. 2008: 56-61.
[18]
Arovski S, Osipyan H, Oladele M I, et al. Automatic Knowledge Extraction of Any Chatbot from Conversation[J]. Expert Systems with Applications, 2019, 137:343-348.
doi: 10.1016/j.eswa.2019.07.014
[19]
Londhe S N, Shah S. A Novel Approach for Knowledge Extraction from Artificial Neural Networks[J]. ISH Journal of Hydraulic Engineering, 2019, 25(3):269-281.
[20]
Jiao Y R, Qu Q X. A Proposal for Kansei Knowledge Extraction Method Based on Natural Language Processing Technology and Online Product Reviews[J]. Computers in Industry, 2019, 108:1-11.
doi: 10.1016/j.compind.2019.02.011
[21]
Li P L, Yuan Z M, Tu We B, et al. Medical Knowledge Extraction and Analysis from Electronic Medical Records Using Deep Learning[J]. Chinese Medical Science Journal, 2019, 34(2):133-139.
( Sun An. Research on Ensemble Learning of Different Input Feature Combinations and Transdcutive Learning in Sequense Labeling Modeling—A Case Study about Clinical Named Entity Recognition of CCKS-2018[J]. Journal of Intelligence, 2019, 38(10):176-184.)
( Zhang Chi, Zhang Guanhong. Text Clustering Algorithm Based on Word Vector and Multi-feature Semantic Distance[J]. Journal of Chongqing University of Science and Technology (Natural Science Edition), 2019, 21(3):69-72, 77.)
( Wang Bin, Guo Jianyi, Xian Yantuan, et al. Entity Relation Extraction in Chinese Domain Based on Distant Supervison with Multi-feature Fusion[J]. Pattern Recognition and Artificial Intelligence, 2019, 32(2):133-143.)
( Wu Fan, Li Shoushan, Zhou Guodong. Movie Review Professionalism Classification Using LSTM and Feature Fusion[J]. Computer Science, 2019, 46(6A):74-79.)
( Han Pu, Zhang Zhanpeng, Zhang Mingtao, et al. Normalization of Chinese Disease Names Based on Multi Feature Fusion[J]. Data Analysis and Knowledge Discovery, 2021, 5(5):83-94.)
( Shi Yijin, Wang Zhongyi, Shen Xueying, et al. Extraction of Knowledge Elements in Scientific Literature Based on Sequential Patterns[J]. Information Studies: Theory & Application, 2020, 43(11):144-149.)
[28]
Chang X, Zheng Q H. Knowledge Element Extraction for Knowledge-based Learning Resources Organization[M]. Heidelberg: Spinger, 2008: 102-113.
[29]
黎丹雨. 基于多特征融合的电影推荐系统[J]. 计算机与现代化, 2019(8):121-126.
[29]
( Li Danyu. Movie Recommendation System Based on Multi-feature Fusion[J]. Computer and Modernization, 2019(8):121-126.)
[30]
王哲. 多特征融合的深层网络图像高级语义识别方法研究[D]. 太原: 太原理工大学, 2019.
[30]
( Wang Zhe. Research on Image Advanced Semantic Recognition Method of Deep Network with Multi-feature Fusion[D]. Taiyuan: Taiyuan University of Technology, 2019.)
( Ma Zhongqi, Zhu Haosheng, Yang Haishi, et al. Facial Expression Recognition Based on Multi-Feature Fusion Dense Residual CNN[J]. Computer Applications and Software, 2019, 36(7):197-201.)
[32]
凌海彬. 基于多特征融合的微博情感分析研究[D]. 桂林: 桂林电子科技大学, 2019.
[32]
( Ling Haibin. Research on Microblog Emotion Analysis Based on Multi Feature Fusion[D]. Guilin: Guilin University of Electronic Tecnology, 2019.)
( Maimaitiayifu, Wushouer Silamu, Aisikaer Aimudoula, et al. Uyghur Sentiment Classification Based on Multi-features and Deep Neural Network[J]. Application Research of Computers, 2020, 37(5):1368-1374, 1379.)
[34]
Zhong W F, Fang X, Fan C H, et al. Fusion of Deep Shallow Features and Models for Speaker Recognition[J]. Chinese Journal of Acoustics, 2018, 43(2):263-272.
[35]
Yang L Z, Ban X J, Mukeshimana M, et al. Multiple Feature Fusion for Unimodal Emotion Recognition[J]. The Journal of China Universities of Posts and Telecommunications, 2019, 26(2):17-29.
[36]
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[37]
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
pmid: 9377276
[38]
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
doi: 10.1109/5.726791
[39]
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[C]// Proceedings of the 1st International Conference on Learning Representations. 2013.