|
|
Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features |
Chai Qingfeng1,2,Shi Linyan2,Mei Shan2,Xiong Haitao2,He Huixin1() |
1College of Computer Science and Technology, Huaqiao University, Quanzhou 361021, China 2Tongfang Knowledge Network Technology Co., Ltd. (Beijing), Beijing 100192, China |
|
|
Abstract [Objective] This paper merged the artificial and machine features of scientific and technological literature with the help of deep learning method, aiming to improve the efficiency of knowledge element extraction. [Methods] We constructed 26 artificial features based on the characteristics of these literature, which mainly included texts, sentences and words. Then, we combinted these features with Word2Vec, one-hot and other machine features using LSTM, CNN and BERT models and extracted knowledge elements. [Results] The accuracy of feature vertical merging for knowledge element extraction reached 0.91, which was 6 percentage points higher than the performance of most traditional methods. [Limitations] The deep learning model needs to be optimized to process larger amount of data. [Conclusions] The proposed method could effectively improve the results of knowledge element extraction.
|
Received: 06 December 2020
Published: 15 September 2021
|
|
Fund:National Social Science Fund of China(19BXW110) |
Corresponding Authors:
He Huixin ORCID: 0000-0002-1764-6727
E-mail: huixinhe@qq.com
|
[1] |
刘则渊. 知识图谱的若干问题思考[R]. 大连理工大学 WISE 实验室, 2010.
|
[1] |
( Liu Zeyuan. Some Thoughts on Knowledge Graph[R]. WISE Laboratory of Dalian University of Technology, 2010.)
|
[2] |
高继平, 丁堃, 潘云涛, 等. 知识元研究述评[J]. 情报理论与实践, 2015, 38(7):134-138.
|
[2] |
( Gao Jiping, Ding Kun, Pan Yuntao, et al. A Review of Knowledge Unit Research[J]. Information Studies: Theory & Application, 2015, 38(7):134-138.)
|
[3] |
贺惠新, 刘丽娟. 主动学习的科技文献研究对象标引体系研究[J]. 数据分析与知识发现, 2016, 32(3):67-73.
|
[3] |
( He Huixin, Liu Lijuan. Research on Indexing System of Research Objects of Scientific and Technological Literature Based on Active Learning[J]. Data Analysis and Knowledge Discovery, 2016, 32(3):67-73.)
|
[4] |
化柏林. 国内外知识抽取研究进展综述[J]. 情报杂志, 2008, 27(2):60-62.
|
[4] |
( Hua Bolin. Development of Research on Knowledge Extraction in China and Overseas[J]. Journal of Information, 2008, 27(2):60-62.)
|
[5] |
冯青文. 知识抽取国内研究现状分析[J]. 常州信息职业技术学院学报, 2017, 16(2):32-36.
|
[5] |
( Feng Qingwen. Analysis on Status of Knowledge Extraction in China[J]. Journal of Changzhou Vocational College of Information Technology, 2017, 16(2):32-36.)
|
[6] |
朱玲, 朱彦, 杨峰. 基于中医疾病相关语义关系的正则表达式及知识抽取研究[J]. 世界科学技术:中医药现代化, 2016, 18(8):1241-1250.
|
[6] |
( Zhu Ling, Zhu Yan, Yang Feng. Knowledge Extraction Research for Semantic Expression of Diseases in Chinese Medicine[J]. World Science and Technology-Modernization of Traditional Chinese Medicine, 2016, 18(8):1241-1250.)
|
[7] |
丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011, 34(12):10-14, 33.
|
[7] |
( Ding Junjun, Zheng Yanning, Hua Bolin. Rule Based Attribute Extraction of Academic Concepts[J]. Information Studies:Theory & Application, 2011, 34(12):10-14, 33.)
|
[8] |
Alam M, Gangemi A, Presutti V, et al. Semantic Role Labeling for Knowledge Graph Extraction from Text[J]. Progress in Artificial Intelligence, 2021. https://doi.org/10.1007/s13748-021-00241-7.
|
[9] |
石湘, 刘萍. 基于知识元语义描述模型的领域知识抽取与表示研究——以信息检索领域为例[J]. 数据分析与知识发现, 2021, 5(4):123-133.
|
[9] |
( Shi Xiang, Liu Ping. Extraction and Representation of Domain Knowledge with Semantic Description Model and Knowledge Elements——Case Study of Information Retrieval[J]. Data Analysis and Knowledge Discovery, 2021, 5(4):123-133.)
|
[10] |
翟劼, 裘江南. 基于规则的知识元属性抽取方法研究[J]. 情报科学, 2016, 34(4):43-47.
|
[10] |
( Zhai Jie, Qiu Jiangnan. Research on the Rule-based Knowledge Unit Attributes Extraction Method[J]. Information Science, 2016, 34(4):43-47.)
|
[11] |
王忠义, 沈雪莹, 黄京. 科技文献资源中方法知识元的抽取研究[J]. 情报科学, 2021, 39(1):13-20.
|
[11] |
( Wang Zhongyi, Shen Xueying, Huang Jing. Research on Extraction of Method Knowledge Element in Scientific Literature[J]. Information Science, 2021, 39(1):13-20.)
|
[12] |
张金柱, 胡一鸣. 融合表示学习与机器学习的专利科学引文标题自动抽取研究[J]. 数据分析与知识发现, 2019, 3(5):68-76.
|
[12] |
( Zhang Jinzhu, Hu Yiming. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. Data Analysis and Knowledge Discovery, 2019, 3(5):68-76.)
|
[13] |
Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]// Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. 2004: 22-25.
|
[14] |
Huang Z X, Xie Z P. A Patent Keywords Extraction Method Using TextRank Model with Prior Public Knowledge[J]. Complex & Intelligent Systems, 2021. https://doi.org/10.1007/s40747-021-00343-8.
|
[15] |
Liu S, He T H, Dai J H. A Survey of CRF Algorithm Based Knowledge Extraction of Elementary Mathematics in Chinese[J]. Mobile Networks and Applications, 2021. https://doi.org/10.1007/s11036-020-01725-x.
|
[16] |
Zelenko D, Aone C, Richardella A. Kernel Methods for Relation Extraction[J]. Journal of Machine Learning Research, 2003, 3:1083-1106.
|
[17] |
Lin Y F, Tsai T, Chou W C, et al. A Maximum Entropy Approach to Biomedical Named Entity Recognition[C]// Proceedings of the 4th International Conference on Data Mining in Bioinformatics. 2008: 56-61.
|
[18] |
Arovski S, Osipyan H, Oladele M I, et al. Automatic Knowledge Extraction of Any Chatbot from Conversation[J]. Expert Systems with Applications, 2019, 137:343-348.
doi: 10.1016/j.eswa.2019.07.014
|
[19] |
Londhe S N, Shah S. A Novel Approach for Knowledge Extraction from Artificial Neural Networks[J]. ISH Journal of Hydraulic Engineering, 2019, 25(3):269-281.
|
[20] |
Jiao Y R, Qu Q X. A Proposal for Kansei Knowledge Extraction Method Based on Natural Language Processing Technology and Online Product Reviews[J]. Computers in Industry, 2019, 108:1-11.
doi: 10.1016/j.compind.2019.02.011
|
[21] |
Li P L, Yuan Z M, Tu We B, et al. Medical Knowledge Extraction and Analysis from Electronic Medical Records Using Deep Learning[J]. Chinese Medical Science Journal, 2019, 34(2):133-139.
|
[22] |
孙安. 序列标注模型中不同输入特征组合的集成学习与直推学习方法研究——以CCKS-2018电子病历命名实体识别任务为例[J]. 情报杂志, 2019, 38(10):176-184.
|
[22] |
( Sun An. Research on Ensemble Learning of Different Input Feature Combinations and Transdcutive Learning in Sequense Labeling Modeling—A Case Study about Clinical Named Entity Recognition of CCKS-2018[J]. Journal of Intelligence, 2019, 38(10):176-184.)
|
[23] |
张弛, 张贯虹. 基于词向量和多特征语义距离的文本聚类算法[J]. 重庆科技学院学报(自然科学版), 2019, 21(3):69-72, 77.
|
[23] |
( Zhang Chi, Zhang Guanhong. Text Clustering Algorithm Based on Word Vector and Multi-feature Semantic Distance[J]. Journal of Chongqing University of Science and Technology (Natural Science Edition), 2019, 21(3):69-72, 77.)
|
[24] |
王斌, 郭剑毅, 线岩团, 等. 融合多特征的基于远程监督的中文领域实体关系抽取[J]. 模式识别与人工智能, 2019, 32(2):133-143.
|
[24] |
( Wang Bin, Guo Jianyi, Xian Yantuan, et al. Entity Relation Extraction in Chinese Domain Based on Distant Supervison with Multi-feature Fusion[J]. Pattern Recognition and Artificial Intelligence, 2019, 32(2):133-143.)
|
[25] |
吴璠, 李寿山, 周国栋. 基于LSTM和多特征组合的电影评论专业程度分类[J]. 计算机科学, 2019, 46(6A):74-79.
|
[25] |
( Wu Fan, Li Shoushan, Zhou Guodong. Movie Review Professionalism Classification Using LSTM and Feature Fusion[J]. Computer Science, 2019, 46(6A):74-79.)
|
[26] |
韩普, 张展鹏, 张明淘, 等. 基于多特征融合的中文疾病名称归一化研究[J]. 数据分析与知识发现, 2021, 5(5):83-94.
|
[26] |
( Han Pu, Zhang Zhanpeng, Zhang Mingtao, et al. Normalization of Chinese Disease Names Based on Multi Feature Fusion[J]. Data Analysis and Knowledge Discovery, 2021, 5(5):83-94.)
|
[27] |
石义金, 王忠义, 沈雪莹, 等. 基于序列模式的科技文献中知识元抽取研究[J]. 情报理论与实践, 2020, 43(11):144-149.
|
[27] |
( Shi Yijin, Wang Zhongyi, Shen Xueying, et al. Extraction of Knowledge Elements in Scientific Literature Based on Sequential Patterns[J]. Information Studies: Theory & Application, 2020, 43(11):144-149.)
|
[28] |
Chang X, Zheng Q H. Knowledge Element Extraction for Knowledge-based Learning Resources Organization[M]. Heidelberg: Spinger, 2008: 102-113.
|
[29] |
黎丹雨. 基于多特征融合的电影推荐系统[J]. 计算机与现代化, 2019(8):121-126.
|
[29] |
( Li Danyu. Movie Recommendation System Based on Multi-feature Fusion[J]. Computer and Modernization, 2019(8):121-126.)
|
[30] |
王哲. 多特征融合的深层网络图像高级语义识别方法研究[D]. 太原: 太原理工大学, 2019.
|
[30] |
( Wang Zhe. Research on Image Advanced Semantic Recognition Method of Deep Network with Multi-feature Fusion[D]. Taiyuan: Taiyuan University of Technology, 2019.)
|
[31] |
马中启, 朱好生, 杨海仕, 等. 基于多特征融合密集残差CNN的人脸表情识别[J]. 计算机应用与软件, 2019, 36(7):197-201.
|
[31] |
( Ma Zhongqi, Zhu Haosheng, Yang Haishi, et al. Facial Expression Recognition Based on Multi-Feature Fusion Dense Residual CNN[J]. Computer Applications and Software, 2019, 36(7):197-201.)
|
[32] |
凌海彬. 基于多特征融合的微博情感分析研究[D]. 桂林: 桂林电子科技大学, 2019.
|
[32] |
( Ling Haibin. Research on Microblog Emotion Analysis Based on Multi Feature Fusion[D]. Guilin: Guilin University of Electronic Tecnology, 2019.)
|
[33] |
买买提阿依甫, 吾守尔·斯拉木, 艾斯卡尔·艾木都拉, 等. 基于多特征和深度神经网络的维吾尔文情感分类[J]. 计算机应用研究, 2020, 37(5):1368-1374, 1379.
|
[33] |
( Maimaitiayifu, Wushouer Silamu, Aisikaer Aimudoula, et al. Uyghur Sentiment Classification Based on Multi-features and Deep Neural Network[J]. Application Research of Computers, 2020, 37(5):1368-1374, 1379.)
|
[34] |
Zhong W F, Fang X, Fan C H, et al. Fusion of Deep Shallow Features and Models for Speaker Recognition[J]. Chinese Journal of Acoustics, 2018, 43(2):263-272.
|
[35] |
Yang L Z, Ban X J, Mukeshimana M, et al. Multiple Feature Fusion for Unimodal Emotion Recognition[J]. The Journal of China Universities of Posts and Telecommunications, 2019, 26(2):17-29.
|
[36] |
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
|
[37] |
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8):1735-1780.
pmid: 9377276
|
[38] |
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
doi: 10.1109/5.726791
|
[39] |
Mikolov T, Chen K, Corrado G, et al. Efficient Estimation of Word Representations in Vector Space[C]// Proceedings of the 1st International Conference on Learning Representations. 2013.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|