|
|
Review of Knowledge Extraction of Scientific Literature |
Hongxia Xu(),Chunwang Li |
National Science Library, Chinese Academy of Sciences, Beijing 100190, China Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract [Objective] The paper reviews knowledge extraction of scientific literature. [Coverage] We searched research literatures in CNKI and Google Scholar, and then obtained a total of 68 representive literatures on knowledge extraction. [Methods] We used literature survey method. First, we reviewd knowledge extraction in the Library & Information Science and Computer Science. Then, we classified and summarized the key extraction technology. [Results] Investigating the current research status and technological system, this paper gives the pros & cons and the roadmap of knowledge extraction technology. [Limitations] There is little comparative study on knowledge extraction is different subjects. [Conclusions] The research framework is helpful to get a thorough understanding of the present status and provides some good advice for scholars.
|
Received: 01 June 2018
Published: 17 April 2019
|
[1] | Hey T, Tansley S, Tolle K.The Fourth Paradigm[M]. Microsoft Press, 2009. | [2] | 刘丽佳, 郭剑毅, 周兰江, 等. 基于LM算法的领域概念实体属性关系抽取[J]. 中文信息学报, 2014, 28(6): 216-222. | [2] | (Liu Lijia, Guo Jianyi, Zhou Lanjiang, et al.Domain Concepts Entity Attribute Relation Extraction Based on LM Algorithm[J]. Journal of Chinese Information Processing, 2014, 28(6): 216-222.) | [3] | 王宁, 陈湧, 郭玮, 等. 基于知识元的突发事件案例信息抽取方法[J]. 系统工程, 2014, 32(12): 133-139. | [3] | (Wang Ning, Chen Yong, Guo Wei, et al.A Method for Emergency Case Information Extraction Based on Knowledge Element[J]. Systems Engineering, 2014, 32(12): 133-139.) | [4] | Demner-Fushman D, Few B, Hauser S E, et al.Automatically Identifying Health Outcome Information in Medline Records[J]. Journal of the American Medical Informatics Association, 2006, 13(1): 52-60. | [5] | Lenat D B.CYC: A Large-scale Investment in Knowledge Infrastructure[J]. Communications of the ACM, 1995, 38(11): 33-38. | [6] | Ernst P, Meng C, Siu A, et al.KnowLife: A Knowledge Graph for Health and Life Sciences[C]//Proceedings of the 30th International Conference on Data Engineering. 2014. | [7] | 张力元, 姬东鸿. LS-SVM与条件随机场结合的生物证据句子抽取[J]. 计算机工程, 2015, 41(5): 207-212. | [7] | (Zhang Liyuan, Ji Donghong.Biological Evidence Sentence Extraction with Combination of LS-SVM and Conditional Random Field[J]. Computer Engineering, 2015, 41(5): 207-212.) | [8] | 刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 53(2): 247-261. | [8] | (Liu Zhiyuan, Sun Maosong, Lin Yankai, et al.Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016, 53(2): 247-261.) | [9] | Chambers N, Jurafsky D.Unsupervised Learning of Narrative Schemas and Their Participants[C]// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 602-610. | [10] | 王洋洋. 基于海量学术资源的知识元抽取研究[D]. 宁波: 宁波大学, 2014. | [10] | (Wang Yangyang.Research on Knowledge Element Extraction Based on Massive Academic Resources[D]. Ningbo: Ningbo University, 2014.) | [11] | Rak R, Kurgan L, Reformat M.Use of OWL 2 to Facilitate a Biomedical Knowledge Base Extracted from the GENIA Corpus[C]//Proceedings of the 5th OWLED Workshop on OWL: Experiences and Directions, Collocated with the 7th International Semantic Web Conference. 2008. | [12] | 孙静, 杨帆, 邓文萍, 等. 基于本体的中医症状知识表示模型构建[J]. 医学信息学杂志, 2017, 38(2): 52-56. | [12] | (Sun Jing, Yang Fan, Deng Wenping, et al.Construction of TCM Symptoms Knowledge Representation Model Based on Ontology[J]. Journal of Medical Informatics, 2017, 38(2): 52-56.) | [13] | 刘盛博, 丁堃, 张春博. 引文分析的新阶段:从引文著录分析到引用内容分析[J]. 图书情报知识, 2015(3): 25-34. | [13] | (Liu Shengbo, Ding Kun, Zhang Chunbo.New Stage of Citation Analysis: From Citation Description Analysis to Citation Context Analysis[J]. Documentation, Information & Knowledge, 2015(3): 25-34.) | [14] | Jeong Y K, Song M, Ding Y.Content-based Author Co-citation Analysis[J]. Journal of Informatrics, 2014, 8(1): 197-211. | [15] | 冷伏海, 白如江, 祝清松. 面向科技文献的混合语义信息抽取方法研究[J]. 图书情报工作, 2013, 57(11): 112-119. | [15] | (Leng Fuhai, Bai Rujiang, Zhu Qingsong.A Hybrid Semantic Information Extraction Method for Scientific Research Papers[J]. Library and Information Service, 2013, 57(11): 112-119.) | [16] | 葛斌, 李芳芳, 李阜, 等. 基于无向图构建策略的主题句抽取[J]. 计算机科学, 2011, 38(5): 181-185. | [16] | (Ge Bing, Li Fangfang, Li Fu, et al.Subject Science Extraction Based on Undirected Graph Construction[J]. Computer Science, 2011, 38(5): 181-185.) | [17] | 温浩, 温有奎, 王民. 基于模式识别的文本知识点深度挖掘方法[J]. 计算机科学, 2016, 43(3): 279-284. | [17] | (Wen Hao, Wen Youkui, Wang Min.Approach to Text Knowledge Depth Mining Based on Pattern Recognition[J]. Computer Science, 2016, 43(3): 279-284.) | [18] | Yi L, Mari O, Hannaneh H.Scientific Information Extraction with Semi-supervised Neural Tagging[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. USA: ACL, 2017: 2641-2651. | [19] | Girju R, Beamer B, Rozovskaya A, et al.A Knowledge-rich Approach to Identifying Semantic Relations Between Nominals[J]. Information Processing & Management, 2010, 46(5): 589-610. | [20] | 车海燕, 冯铁, 张家晨, 等. 面向中文自然语言文档的自动知识抽取方法[J]. 计算机研究与发展, 2013, 50(4): 834-842. | [20] | (Che Haiyan, Feng Tie, Zhang Jiachen, et al.Automatic Knowledge Extraction from Chinese Natural Language Documents[J]. Journal of Computer Research and Development, 2013, 50(4): 834-842.) | [21] | 丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011, 34(12): 10-14. | [21] | (Ding Junjun, Zheng Yanning, Hua Bolin.Extraction of Academic Concept Attribute Based on Rules[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14.) | [22] | 翟劼, 裘江南. 基于规则的知识元属性抽取方法研究[J]. 情报科学, 2016, 34(4): 43-47. | [22] | (Zhai Jie, Qiu Jiangnan.Research on the Rule-based Knowledge Unit Attributes Extraction Method[J]. Information Science, 2016, 34(4): 43-47.) | [23] | 徐绪堪, 房道伟, 蒋勋, 等. 知识组织中知识粒度化表示和规范化研究[J]. 图书情报知识, 2014(6): 101-106. | [23] | (Xu Xukan, Fang Daowei, Jiang Xun, et al.Research on Knowledge Granularity Representation and Standardization During Knowledge Organization[J]. Documentation, Information & Knowledge, 2014(6): 101-106.) | [24] | 徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4): 589-606. | [24] | (Xu Zenglin, Sheng Yongpan, He Lirong, et al.Review on Knowledge Graph Techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4): 589-606.) | [25] | Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. IEEE, 1991: 29-32. | [26] | Lin Y F, Tsai T, Chou W C, et al.A Maximum Entropy Approach to Biomedical Named Entity Recognition[C]// Proceedings of the 4th International Conference on Data Mining in Bioinformatics. USA: ACM, 2008: 56-61. | [27] | Liu X H, Zhang S D, Wei F R, et al.Recognizing Named Entities in Tweets[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. ACL, 2011: 359-367. | [28] | Lample G, Ballesteros M, Subramanian S, et al.Neural Architectures for Named Entity Recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. USA: ACL, 2016: 260-270. | [29] | Whitelaw C, Kehlenbeck A, Petrovic N, et al.Web-Scale Named Entity Recognition[C]// Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 123-132. | [30] | Etzioni O, Cafarella M, Downey D, et al.Unsupervised Named-Entity Extraction from the Web: An Experimental Study[J]. Artificial Intelligence, 2005, 165: 91-134. | [31] | Brin S.Extracting Patterns and Relations from the World Wide Web[C]//Proceedings of the 6th International Conference on Extending Database Technology, 1998: 172-183. | [32] | Agichtein E, Gravano L.Snowball: Extracting Relations from Large Plain-text Collections[C]// Proceedings of the 5th ACM International Conference on Digital Libraries. ACM, 2000: 85-94. | [33] | Zhu J, Nie Z Q, Liu X J, et al.Statsnowball: A Statistical Approach to Extracting Entity Relationships[C]// Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain. New York, USA: ACM, 2009: 101-110. | [34] | Carlson A, Betteridge J, Wang R C, et al.Coupled Semi-Supervised Learning for Information Extraction[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, New York, USA. USA: ACM, 2010: 101-110. | [35] | Roth B, Klakow D.Combining Generative and Discriminative Model Scores for Distant Supervision[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013: 24-29. | [36] | Roth B, Barth T, Wiegand M, et al.Effective Slot Filling Based on Shallow Distant Supervision Methods[OL]. arXiv Preprint, arXiv:1401.1158. | [37] | Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]//Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, Barcelona, Spain. USA: ACL, 2004. | [38] | Miao Q L, Zhang S, Zhang B, et al.Extracting and Visualizing Semantic Relationships from Chinese Biomedical Text[C]// Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, 2012: 99-107. | [39] | Sun X, Dong L.Featured-Based Approach to Chinese Term Relation Extraction[C]// Proceedings of the 2009 International Conference on Signal Processing Systems. USA: ACM, 2009: 410-414. | [40] | 车万翔, 刘挺, 李生. 实体关系自动抽取[J]. 中文信息学报, 2005, 19(2): 1-6. | [40] | (Che Wanxiang, Liu Ting, Li Sheng.Automatic Entity Relation Extraction[J]. Journal of Chinese Information Processing, 2005, 19(2): 1-6.) | [41] | Culotta A, Sorensen J.Dependency Tree Kernels for Relation Extraction[C]// Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain. USA: ACL, 2004. | [42] | Zelenko D, Aone C, Richardella A.Kernel Methods for Relation Extraction[J]. Journal of Machine Learning Research, 2003, 3: 1083-1106. | [43] | Nguyen T H, Grishman R.Relation Extraction: Perspective from Convolutional Neural Networks[C]// Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015: 39-48. | [44] | Nguyen T H, Grishman R.Combining Neural Networks and Log-linear Models to Improve Relation Extraction[OL]. arXiv Preprint, arXiv: 1511.059026. | [45] | 杨博, 蔡东风, 杨华. 开放式信息抽取研究进展[J]. 中文信息学报, 2014, 28(4): 1-11. | [45] | (Yang Bo, Cai Dongfeng, Yang Hua.Progress in Open Information Extraction[J]. Journal of Chinese Information Processing, 2014, 28(4): 1-11.) | [46] | Wu F, Weld D S.Open Information Extraction Using Wikipedia[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. USA: ACL, 2010: 118-127. | [47] | Fader A, Soderland S, Etzioni O.Identifying Relations for Open Information Extraction[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011: 1535-1545. | [48] | Akbik A, Loser A.KrakeN: N-ary Facts in Open Information Extraction[C]// Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction. ACM, 2012: 52-56. | [49] | Zeng D, Liu K, Chen Y, et al.Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. USA: ACL, 2015: 1753-1762. | [50] | Sunil S K, Anand A, Oruganty K, et al.Relation Extraction from Clinical Texts Using Domain Invariant Convolutional Neural Network[C]// Proceedings of the 15th Workshop on Biomedical Natural Language Processing, Berlin, Germany. USA: ACL, 2016: 206-215. | [51] | Katiyar A, Cardie C.Investigating LSTMs for Joint Extraction of Opinion Entities and Relations[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. USA: ACL, 2016: 919-929. | [52] | Miwa M, Bansal M.End-to-End Relation Extraction Using LSTMs on Sequences and Tree Structures[C]// Proceedings of the Association for Computational Linguistics, Berlin, Germany. USA: ACL, 2016: 1105-1116. | [53] | Zheng S C, Hao Y X, Lu D Y, et al.Joint Entity and Relation Extraction Based on a Hybrid Neural Network[J]. Neurocomputing, 2017, 257: 59-66. | [54] | 郭剑毅, 李真, 余正涛, 等. 领域本体概念实例、属性和属性值的抽取及关系预测[J]. 南京大学学报: 自然科学版, 2012, 48(4): 383-389. | [54] | (Guo Jianyi, Li Zhen, Yu Zhengtao, et al.Extraction and Relation Prediction of Domain Ontology Concept Instance, Attribute and Attribute Value[J]. Journal of Nanjing University: Natural Sciences, 2012, 48(4): 383-389.) | [55] | Zhang Y H, Zhong V, Chen D Q.Position-aware Attention and Supervised Data Improve Slot Filling[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. USA: ACL, 2017: 35-45. | [56] | Huang L, Sil A, Ji H, et al.Improving Slot Filling Performance with Attention Neural Networks on Dependency Structures[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. USA: ACL, 2017: 2588-2597. | [57] | 张帆, 乐小虬. 面向领域科技文献的句子级创新点抽取研究[J]. 现代图书情报技术, 2014(9): 15-21. | [57] | (Zhang Fan, Le Xiaoqiu.Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus[J]. New Technology of Library and Information Service, 2014(9): 15-21.) | [58] | Leskovec J, Milic-Frayling N, Grobelnik M.Extracting Summary Sentences Based on the Document Semantic Graph[R]. Microsoft Technical Report. Redmond: Microsoft Corporation,2005. | [59] | Muratore D, Hagenbuchner M, Scarselli F, et al.Sentence Extraction by Graph Neural Networks[C]// Proceedings of the 20th International Conference on Artificial Neural Networks. 2010: 237-246. | [60] | 秦彦霞, 张民, 郑德权. 神经网络事件抽取技术综述[J]. 智能计算机与应用, 2018, 8(3): 1-5. | [60] | (Qin Yanxia, Zhang Min, Zheng Dequan.A Survey on Neural Network-based Methods for Event Extraction[J]. Intelligent Computer and Applications, 2018, 8(3): 1-5.) | [61] | Chen C, Ng V.Joint Modeling for Chinese Event Extraction with Rich Linguistic Features[C]// Proceedings of the 24th International Conference on Computational Linguistics. 2012: 529-544. | [62] | Li Q, Ji H, Huang L.Joint Event Extraction via Structured Prediction with Global Features[C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013: 73-82. | [63] | Nguyen T H, Grishman R.Event Detection and Domain Adaptation with Convolutional Neural Networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 365-371. | [64] | Nguyen T H, Cho K, Grishaman R.Joint Event Extraction via Recurrent Neural Networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 300-309. | [65] | 魏小梅. 生物事件抽取联合模型研究[D]. 武汉: 武汉大学, 2016. | [65] | (Wei Xiaomei.The Study on Joint Models for Biomedical Event Extraction[D]. Wuhan: Wuhan University, 2016.) | [66] | Xiong C Y, Power R, Callan J.Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding[C]// Proceedings of the 26th International Conference on World Wide Web, Perth, Australia. USA: ACM, 2017: 1271-1279. | [67] | Lossio-Ventura J A, Hogan W, Modave F, et al. OC-2-KB: A Software Pipeline to Build an Evidence-based Obesity and Cancer Knowledge Base[C]//Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine. 2017: 1284-1287. | [68] | 龚立群, 孙洁丽. 国外主要知识抽取项目介绍与评析[J]. 图书馆论坛, 2007, 27(4): 11-15. | [68] | (Gong Liqun, Sun Jieli.Introduction and Evaluation of Knowledge Extraction Projects Overseas[J]. Library Tribune, 2007, 27(4): 11-15.) |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|