Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (3): 14-24    DOI: 10.11925/infotech.2096-3467.2018.0607
Current Issue | Archive | Adv Search |
Review of Knowledge Extraction of Scientific Literature
Hongxia Xu(),Chunwang Li
National Science Library, Chinese Academy of Sciences, Beijing 100190, China
Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China
Download: PDF (495 KB)   HTML ( 13
Export: BibTeX | EndNote (RIS)      

[Objective] The paper reviews knowledge extraction of scientific literature. [Coverage] We searched research literatures in CNKI and Google Scholar, and then obtained a total of 68 representive literatures on knowledge extraction. [Methods] We used literature survey method. First, we reviewd knowledge extraction in the Library & Information Science and Computer Science. Then, we classified and summarized the key extraction technology. [Results] Investigating the current research status and technological system, this paper gives the pros & cons and the roadmap of knowledge extraction technology. [Limitations] There is little comparative study on knowledge extraction is different subjects. [Conclusions] The research framework is helpful to get a thorough understanding of the present status and provides some good advice for scholars.

Key wordsKnowledge Extraction      Scientific Literature      Machine Learning     
Received: 01 June 2018      Published: 17 April 2019

Cite this article:

Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature. Data Analysis and Knowledge Discovery, 2019, 3(3): 14-24.

URL:     OR

[1] Hey T, Tansley S, Tolle K.The Fourth Paradigm[M]. Microsoft Press, 2009.
[2] 刘丽佳, 郭剑毅, 周兰江, 等. 基于LM算法的领域概念实体属性关系抽取[J]. 中文信息学报, 2014, 28(6): 216-222.
[2] (Liu Lijia, Guo Jianyi, Zhou Lanjiang, et al.Domain Concepts Entity Attribute Relation Extraction Based on LM Algorithm[J]. Journal of Chinese Information Processing, 2014, 28(6): 216-222.)
[3] 王宁, 陈湧, 郭玮, 等. 基于知识元的突发事件案例信息抽取方法[J]. 系统工程, 2014, 32(12): 133-139.
[3] (Wang Ning, Chen Yong, Guo Wei, et al.A Method for Emergency Case Information Extraction Based on Knowledge Element[J]. Systems Engineering, 2014, 32(12): 133-139.)
[4] Demner-Fushman D, Few B, Hauser S E, et al.Automatically Identifying Health Outcome Information in Medline Records[J]. Journal of the American Medical Informatics Association, 2006, 13(1): 52-60.
[5] Lenat D B.CYC: A Large-scale Investment in Knowledge Infrastructure[J]. Communications of the ACM, 1995, 38(11): 33-38.
[6] Ernst P, Meng C, Siu A, et al.KnowLife: A Knowledge Graph for Health and Life Sciences[C]//Proceedings of the 30th International Conference on Data Engineering. 2014.
[7] 张力元, 姬东鸿. LS-SVM与条件随机场结合的生物证据句子抽取[J]. 计算机工程, 2015, 41(5): 207-212.
[7] (Zhang Liyuan, Ji Donghong.Biological Evidence Sentence Extraction with Combination of LS-SVM and Conditional Random Field[J]. Computer Engineering, 2015, 41(5): 207-212.)
[8] 刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 53(2): 247-261.
[8] (Liu Zhiyuan, Sun Maosong, Lin Yankai, et al.Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016, 53(2): 247-261.)
[9] Chambers N, Jurafsky D.Unsupervised Learning of Narrative Schemas and Their Participants[C]// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 602-610.
[10] 王洋洋. 基于海量学术资源的知识元抽取研究[D]. 宁波: 宁波大学, 2014.
[10] (Wang Yangyang.Research on Knowledge Element Extraction Based on Massive Academic Resources[D]. Ningbo: Ningbo University, 2014.)
[11] Rak R, Kurgan L, Reformat M.Use of OWL 2 to Facilitate a Biomedical Knowledge Base Extracted from the GENIA Corpus[C]//Proceedings of the 5th OWLED Workshop on OWL: Experiences and Directions, Collocated with the 7th International Semantic Web Conference. 2008.
[12] 孙静, 杨帆, 邓文萍, 等. 基于本体的中医症状知识表示模型构建[J]. 医学信息学杂志, 2017, 38(2): 52-56.
[12] (Sun Jing, Yang Fan, Deng Wenping, et al.Construction of TCM Symptoms Knowledge Representation Model Based on Ontology[J]. Journal of Medical Informatics, 2017, 38(2): 52-56.)
[13] 刘盛博, 丁堃, 张春博. 引文分析的新阶段:从引文著录分析到引用内容分析[J]. 图书情报知识, 2015(3): 25-34.
[13] (Liu Shengbo, Ding Kun, Zhang Chunbo.New Stage of Citation Analysis: From Citation Description Analysis to Citation Context Analysis[J]. Documentation, Information & Knowledge, 2015(3): 25-34.)
[14] Jeong Y K, Song M, Ding Y.Content-based Author Co-citation Analysis[J]. Journal of Informatrics, 2014, 8(1): 197-211.
[15] 冷伏海, 白如江, 祝清松. 面向科技文献的混合语义信息抽取方法研究[J]. 图书情报工作, 2013, 57(11): 112-119.
[15] (Leng Fuhai, Bai Rujiang, Zhu Qingsong.A Hybrid Semantic Information Extraction Method for Scientific Research Papers[J]. Library and Information Service, 2013, 57(11): 112-119.)
[16] 葛斌, 李芳芳, 李阜, 等. 基于无向图构建策略的主题句抽取[J]. 计算机科学, 2011, 38(5): 181-185.
[16] (Ge Bing, Li Fangfang, Li Fu, et al.Subject Science Extraction Based on Undirected Graph Construction[J]. Computer Science, 2011, 38(5): 181-185.)
[17] 温浩, 温有奎, 王民. 基于模式识别的文本知识点深度挖掘方法[J]. 计算机科学, 2016, 43(3): 279-284.
[17] (Wen Hao, Wen Youkui, Wang Min.Approach to Text Knowledge Depth Mining Based on Pattern Recognition[J]. Computer Science, 2016, 43(3): 279-284.)
[18] Yi L, Mari O, Hannaneh H.Scientific Information Extraction with Semi-supervised Neural Tagging[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. USA: ACL, 2017: 2641-2651.
[19] Girju R, Beamer B, Rozovskaya A, et al.A Knowledge-rich Approach to Identifying Semantic Relations Between Nominals[J]. Information Processing & Management, 2010, 46(5): 589-610.
[20] 车海燕, 冯铁, 张家晨, 等. 面向中文自然语言文档的自动知识抽取方法[J]. 计算机研究与发展, 2013, 50(4): 834-842.
[20] (Che Haiyan, Feng Tie, Zhang Jiachen, et al.Automatic Knowledge Extraction from Chinese Natural Language Documents[J]. Journal of Computer Research and Development, 2013, 50(4): 834-842.)
[21] 丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011, 34(12): 10-14.
[21] (Ding Junjun, Zheng Yanning, Hua Bolin.Extraction of Academic Concept Attribute Based on Rules[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14.)
[22] 翟劼, 裘江南. 基于规则的知识元属性抽取方法研究[J]. 情报科学, 2016, 34(4): 43-47.
[22] (Zhai Jie, Qiu Jiangnan.Research on the Rule-based Knowledge Unit Attributes Extraction Method[J]. Information Science, 2016, 34(4): 43-47.)
[23] 徐绪堪, 房道伟, 蒋勋, 等. 知识组织中知识粒度化表示和规范化研究[J]. 图书情报知识, 2014(6): 101-106.
[23] (Xu Xukan, Fang Daowei, Jiang Xun, et al.Research on Knowledge Granularity Representation and Standardization During Knowledge Organization[J]. Documentation, Information & Knowledge, 2014(6): 101-106.)
[24] 徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4): 589-606.
[24] (Xu Zenglin, Sheng Yongpan, He Lirong, et al.Review on Knowledge Graph Techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4): 589-606.)
[25] Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. IEEE, 1991: 29-32.
[26] Lin Y F, Tsai T, Chou W C, et al.A Maximum Entropy Approach to Biomedical Named Entity Recognition[C]// Proceedings of the 4th International Conference on Data Mining in Bioinformatics. USA: ACM, 2008: 56-61.
[27] Liu X H, Zhang S D, Wei F R, et al.Recognizing Named Entities in Tweets[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. ACL, 2011: 359-367.
[28] Lample G, Ballesteros M, Subramanian S, et al.Neural Architectures for Named Entity Recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. USA: ACL, 2016: 260-270.
[29] Whitelaw C, Kehlenbeck A, Petrovic N, et al.Web-Scale Named Entity Recognition[C]// Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 123-132.
[30] Etzioni O, Cafarella M, Downey D, et al.Unsupervised Named-Entity Extraction from the Web: An Experimental Study[J]. Artificial Intelligence, 2005, 165: 91-134.
[31] Brin S.Extracting Patterns and Relations from the World Wide Web[C]//Proceedings of the 6th International Conference on Extending Database Technology, 1998: 172-183.
[32] Agichtein E, Gravano L.Snowball: Extracting Relations from Large Plain-text Collections[C]// Proceedings of the 5th ACM International Conference on Digital Libraries. ACM, 2000: 85-94.
[33] Zhu J, Nie Z Q, Liu X J, et al.Statsnowball: A Statistical Approach to Extracting Entity Relationships[C]// Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain. New York, USA: ACM, 2009: 101-110.
[34] Carlson A, Betteridge J, Wang R C, et al.Coupled Semi-Supervised Learning for Information Extraction[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, New York, USA. USA: ACM, 2010: 101-110.
[35] Roth B, Klakow D.Combining Generative and Discriminative Model Scores for Distant Supervision[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013: 24-29.
[36] Roth B, Barth T, Wiegand M, et al.Effective Slot Filling Based on Shallow Distant Supervision Methods[OL]. arXiv Preprint, arXiv:1401.1158.
[37] Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]//Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, Barcelona, Spain. USA: ACL, 2004.
[38] Miao Q L, Zhang S, Zhang B, et al.Extracting and Visualizing Semantic Relationships from Chinese Biomedical Text[C]// Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, 2012: 99-107.
[39] Sun X, Dong L.Featured-Based Approach to Chinese Term Relation Extraction[C]// Proceedings of the 2009 International Conference on Signal Processing Systems. USA: ACM, 2009: 410-414.
[40] 车万翔, 刘挺, 李生. 实体关系自动抽取[J]. 中文信息学报, 2005, 19(2): 1-6.
[40] (Che Wanxiang, Liu Ting, Li Sheng.Automatic Entity Relation Extraction[J]. Journal of Chinese Information Processing, 2005, 19(2): 1-6.)
[41] Culotta A, Sorensen J.Dependency Tree Kernels for Relation Extraction[C]// Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain. USA: ACL, 2004.
[42] Zelenko D, Aone C, Richardella A.Kernel Methods for Relation Extraction[J]. Journal of Machine Learning Research, 2003, 3: 1083-1106.
[43] Nguyen T H, Grishman R.Relation Extraction: Perspective from Convolutional Neural Networks[C]// Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015: 39-48.
[44] Nguyen T H, Grishman R.Combining Neural Networks and Log-linear Models to Improve Relation Extraction[OL]. arXiv Preprint, arXiv: 1511.059026.
[45] 杨博, 蔡东风, 杨华. 开放式信息抽取研究进展[J]. 中文信息学报, 2014, 28(4): 1-11.
[45] (Yang Bo, Cai Dongfeng, Yang Hua.Progress in Open Information Extraction[J]. Journal of Chinese Information Processing, 2014, 28(4): 1-11.)
[46] Wu F, Weld D S.Open Information Extraction Using Wikipedia[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. USA: ACL, 2010: 118-127.
[47] Fader A, Soderland S, Etzioni O.Identifying Relations for Open Information Extraction[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011: 1535-1545.
[48] Akbik A, Loser A.KrakeN: N-ary Facts in Open Information Extraction[C]// Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction. ACM, 2012: 52-56.
[49] Zeng D, Liu K, Chen Y, et al.Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. USA: ACL, 2015: 1753-1762.
[50] Sunil S K, Anand A, Oruganty K, et al.Relation Extraction from Clinical Texts Using Domain Invariant Convolutional Neural Network[C]// Proceedings of the 15th Workshop on Biomedical Natural Language Processing, Berlin, Germany. USA: ACL, 2016: 206-215.
[51] Katiyar A, Cardie C.Investigating LSTMs for Joint Extraction of Opinion Entities and Relations[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. USA: ACL, 2016: 919-929.
[52] Miwa M, Bansal M.End-to-End Relation Extraction Using LSTMs on Sequences and Tree Structures[C]// Proceedings of the Association for Computational Linguistics, Berlin, Germany. USA: ACL, 2016: 1105-1116.
[53] Zheng S C, Hao Y X, Lu D Y, et al.Joint Entity and Relation Extraction Based on a Hybrid Neural Network[J]. Neurocomputing, 2017, 257: 59-66.
[54] 郭剑毅, 李真, 余正涛, 等. 领域本体概念实例、属性和属性值的抽取及关系预测[J]. 南京大学学报: 自然科学版, 2012, 48(4): 383-389.
[54] (Guo Jianyi, Li Zhen, Yu Zhengtao, et al.Extraction and Relation Prediction of Domain Ontology Concept Instance, Attribute and Attribute Value[J]. Journal of Nanjing University: Natural Sciences, 2012, 48(4): 383-389.)
[55] Zhang Y H, Zhong V, Chen D Q.Position-aware Attention and Supervised Data Improve Slot Filling[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. USA: ACL, 2017: 35-45.
[56] Huang L, Sil A, Ji H, et al.Improving Slot Filling Performance with Attention Neural Networks on Dependency Structures[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. USA: ACL, 2017: 2588-2597.
[57] 张帆, 乐小虬. 面向领域科技文献的句子级创新点抽取研究[J]. 现代图书情报技术, 2014(9): 15-21.
[57] (Zhang Fan, Le Xiaoqiu.Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus[J]. New Technology of Library and Information Service, 2014(9): 15-21.)
[58] Leskovec J, Milic-Frayling N, Grobelnik M.Extracting Summary Sentences Based on the Document Semantic Graph[R]. Microsoft Technical Report. Redmond: Microsoft Corporation,2005.
[59] Muratore D, Hagenbuchner M, Scarselli F, et al.Sentence Extraction by Graph Neural Networks[C]// Proceedings of the 20th International Conference on Artificial Neural Networks. 2010: 237-246.
[60] 秦彦霞, 张民, 郑德权. 神经网络事件抽取技术综述[J]. 智能计算机与应用, 2018, 8(3): 1-5.
[60] (Qin Yanxia, Zhang Min, Zheng Dequan.A Survey on Neural Network-based Methods for Event Extraction[J]. Intelligent Computer and Applications, 2018, 8(3): 1-5.)
[61] Chen C, Ng V.Joint Modeling for Chinese Event Extraction with Rich Linguistic Features[C]// Proceedings of the 24th International Conference on Computational Linguistics. 2012: 529-544.
[62] Li Q, Ji H, Huang L.Joint Event Extraction via Structured Prediction with Global Features[C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013: 73-82.
[63] Nguyen T H, Grishman R.Event Detection and Domain Adaptation with Convolutional Neural Networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 365-371.
[64] Nguyen T H, Cho K, Grishaman R.Joint Event Extraction via Recurrent Neural Networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 300-309.
[65] 魏小梅. 生物事件抽取联合模型研究[D]. 武汉: 武汉大学, 2016.
[65] (Wei Xiaomei.The Study on Joint Models for Biomedical Event Extraction[D]. Wuhan: Wuhan University, 2016.)
[66] Xiong C Y, Power R, Callan J.Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding[C]// Proceedings of the 26th International Conference on World Wide Web, Perth, Australia. USA: ACM, 2017: 1271-1279.
[67] Lossio-Ventura J A, Hogan W, Modave F, et al. OC-2-KB: A Software Pipeline to Build an Evidence-based Obesity and Cancer Knowledge Base[C]//Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine. 2017: 1284-1287.
[68] 龚立群, 孙洁丽. 国外主要知识抽取项目介绍与评析[J]. 图书馆论坛, 2007, 27(4): 11-15.
[68] (Gong Liqun, Sun Jieli.Introduction and Evaluation of Knowledge Extraction Projects Overseas[J]. Library Tribune, 2007, 27(4): 11-15.)
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[5] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[6] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[7] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[8] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[9] Shi Xiang,Liu Ping. Extraction and Representation of Domain Knowledge with Semantic Description Model and Knowledge Elements——Case Study of Information Retrieval[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[10] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[11] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[12] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[13] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[14] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[15] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938