Review of Knowledge Extraction of Scientific Literature
Hongxia Xu(),Chunwang Li
National Science Library, Chinese Academy of Sciences, Beijing 100190, China Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] The paper reviews knowledge extraction of scientific literature. [Coverage] We searched research literatures in CNKI and Google Scholar, and then obtained a total of 68 representive literatures on knowledge extraction. [Methods] We used literature survey method. First, we reviewd knowledge extraction in the Library & Information Science and Computer Science. Then, we classified and summarized the key extraction technology. [Results] Investigating the current research status and technological system, this paper gives the pros & cons and the roadmap of knowledge extraction technology. [Limitations] There is little comparative study on knowledge extraction is different subjects. [Conclusions] The research framework is helpful to get a thorough understanding of the present status and provides some good advice for scholars.
(Liu Lijia, Guo Jianyi, Zhou Lanjiang, et al.Domain Concepts Entity Attribute Relation Extraction Based on LM Algorithm[J]. Journal of Chinese Information Processing, 2014, 28(6): 216-222.)
(Wang Ning, Chen Yong, Guo Wei, et al.A Method for Emergency Case Information Extraction Based on Knowledge Element[J]. Systems Engineering, 2014, 32(12): 133-139.)
[4]
Demner-Fushman D, Few B, Hauser S E, et al.Automatically Identifying Health Outcome Information in Medline Records[J]. Journal of the American Medical Informatics Association, 2006, 13(1): 52-60.
[5]
Lenat D B.CYC: A Large-scale Investment in Knowledge Infrastructure[J]. Communications of the ACM, 1995, 38(11): 33-38.
[6]
Ernst P, Meng C, Siu A, et al.KnowLife: A Knowledge Graph for Health and Life Sciences[C]//Proceedings of the 30th International Conference on Data Engineering. 2014.
(Zhang Liyuan, Ji Donghong.Biological Evidence Sentence Extraction with Combination of LS-SVM and Conditional Random Field[J]. Computer Engineering, 2015, 41(5): 207-212.)
(Liu Zhiyuan, Sun Maosong, Lin Yankai, et al.Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016, 53(2): 247-261.)
[9]
Chambers N, Jurafsky D.Unsupervised Learning of Narrative Schemas and Their Participants[C]// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 602-610.
[10]
王洋洋. 基于海量学术资源的知识元抽取研究[D]. 宁波: 宁波大学, 2014.
[10]
(Wang Yangyang.Research on Knowledge Element Extraction Based on Massive Academic Resources[D]. Ningbo: Ningbo University, 2014.)
[11]
Rak R, Kurgan L, Reformat M.Use of OWL 2 to Facilitate a Biomedical Knowledge Base Extracted from the GENIA Corpus[C]//Proceedings of the 5th OWLED Workshop on OWL: Experiences and Directions, Collocated with the 7th International Semantic Web Conference. 2008.
(Sun Jing, Yang Fan, Deng Wenping, et al.Construction of TCM Symptoms Knowledge Representation Model Based on Ontology[J]. Journal of Medical Informatics, 2017, 38(2): 52-56.)
(Leng Fuhai, Bai Rujiang, Zhu Qingsong.A Hybrid Semantic Information Extraction Method for Scientific Research Papers[J]. Library and Information Service, 2013, 57(11): 112-119.)
(Wen Hao, Wen Youkui, Wang Min.Approach to Text Knowledge Depth Mining Based on Pattern Recognition[J]. Computer Science, 2016, 43(3): 279-284.)
[18]
Yi L, Mari O, Hannaneh H.Scientific Information Extraction with Semi-supervised Neural Tagging[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. USA: ACL, 2017: 2641-2651.
[19]
Girju R, Beamer B, Rozovskaya A, et al.A Knowledge-rich Approach to Identifying Semantic Relations Between Nominals[J]. Information Processing & Management, 2010, 46(5): 589-610.
(Che Haiyan, Feng Tie, Zhang Jiachen, et al.Automatic Knowledge Extraction from Chinese Natural Language Documents[J]. Journal of Computer Research and Development, 2013, 50(4): 834-842.)
(Ding Junjun, Zheng Yanning, Hua Bolin.Extraction of Academic Concept Attribute Based on Rules[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14.)
(Xu Xukan, Fang Daowei, Jiang Xun, et al.Research on Knowledge Granularity Representation and Standardization During Knowledge Organization[J]. Documentation, Information & Knowledge, 2014(6): 101-106.)
(Xu Zenglin, Sheng Yongpan, He Lirong, et al.Review on Knowledge Graph Techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4): 589-606.)
[25]
Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. IEEE, 1991: 29-32.
[26]
Lin Y F, Tsai T, Chou W C, et al.A Maximum Entropy Approach to Biomedical Named Entity Recognition[C]// Proceedings of the 4th International Conference on Data Mining in Bioinformatics. USA: ACM, 2008: 56-61.
[27]
Liu X H, Zhang S D, Wei F R, et al.Recognizing Named Entities in Tweets[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. ACL, 2011: 359-367.
[28]
Lample G, Ballesteros M, Subramanian S, et al.Neural Architectures for Named Entity Recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. USA: ACL, 2016: 260-270.
[29]
Whitelaw C, Kehlenbeck A, Petrovic N, et al.Web-Scale Named Entity Recognition[C]// Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 123-132.
[30]
Etzioni O, Cafarella M, Downey D, et al.Unsupervised Named-Entity Extraction from the Web: An Experimental Study[J]. Artificial Intelligence, 2005, 165: 91-134.
[31]
Brin S.Extracting Patterns and Relations from the World Wide Web[C]//Proceedings of the 6th International Conference on Extending Database Technology, 1998: 172-183.
[32]
Agichtein E, Gravano L.Snowball: Extracting Relations from Large Plain-text Collections[C]// Proceedings of the 5th ACM International Conference on Digital Libraries. ACM, 2000: 85-94.
[33]
Zhu J, Nie Z Q, Liu X J, et al.Statsnowball: A Statistical Approach to Extracting Entity Relationships[C]// Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain. New York, USA: ACM, 2009: 101-110.
[34]
Carlson A, Betteridge J, Wang R C, et al.Coupled Semi-Supervised Learning for Information Extraction[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, New York, USA. USA: ACM, 2010: 101-110.
[35]
Roth B, Klakow D.Combining Generative and Discriminative Model Scores for Distant Supervision[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013: 24-29.
[36]
Roth B, Barth T, Wiegand M, et al.Effective Slot Filling Based on Shallow Distant Supervision Methods[OL]. arXiv Preprint, arXiv:1401.1158.
[37]
Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]//Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, Barcelona, Spain. USA: ACL, 2004.
[38]
Miao Q L, Zhang S, Zhang B, et al.Extracting and Visualizing Semantic Relationships from Chinese Biomedical Text[C]// Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, 2012: 99-107.
[39]
Sun X, Dong L.Featured-Based Approach to Chinese Term Relation Extraction[C]// Proceedings of the 2009 International Conference on Signal Processing Systems. USA: ACM, 2009: 410-414.
(Che Wanxiang, Liu Ting, Li Sheng.Automatic Entity Relation Extraction[J]. Journal of Chinese Information Processing, 2005, 19(2): 1-6.)
[41]
Culotta A, Sorensen J.Dependency Tree Kernels for Relation Extraction[C]// Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain. USA: ACL, 2004.
[42]
Zelenko D, Aone C, Richardella A.Kernel Methods for Relation Extraction[J]. Journal of Machine Learning Research, 2003, 3: 1083-1106.
[43]
Nguyen T H, Grishman R.Relation Extraction: Perspective from Convolutional Neural Networks[C]// Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015: 39-48.
[44]
Nguyen T H, Grishman R.Combining Neural Networks and Log-linear Models to Improve Relation Extraction[OL]. arXiv Preprint, arXiv: 1511.059026.
(Yang Bo, Cai Dongfeng, Yang Hua.Progress in Open Information Extraction[J]. Journal of Chinese Information Processing, 2014, 28(4): 1-11.)
[46]
Wu F, Weld D S.Open Information Extraction Using Wikipedia[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. USA: ACL, 2010: 118-127.
[47]
Fader A, Soderland S, Etzioni O.Identifying Relations for Open Information Extraction[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011: 1535-1545.
[48]
Akbik A, Loser A.KrakeN: N-ary Facts in Open Information Extraction[C]// Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction. ACM, 2012: 52-56.
[49]
Zeng D, Liu K, Chen Y, et al.Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. USA: ACL, 2015: 1753-1762.
[50]
Sunil S K, Anand A, Oruganty K, et al.Relation Extraction from Clinical Texts Using Domain Invariant Convolutional Neural Network[C]// Proceedings of the 15th Workshop on Biomedical Natural Language Processing, Berlin, Germany. USA: ACL, 2016: 206-215.
[51]
Katiyar A, Cardie C.Investigating LSTMs for Joint Extraction of Opinion Entities and Relations[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. USA: ACL, 2016: 919-929.
[52]
Miwa M, Bansal M.End-to-End Relation Extraction Using LSTMs on Sequences and Tree Structures[C]// Proceedings of the Association for Computational Linguistics, Berlin, Germany. USA: ACL, 2016: 1105-1116.
[53]
Zheng S C, Hao Y X, Lu D Y, et al.Joint Entity and Relation Extraction Based on a Hybrid Neural Network[J]. Neurocomputing, 2017, 257: 59-66.
(Guo Jianyi, Li Zhen, Yu Zhengtao, et al.Extraction and Relation Prediction of Domain Ontology Concept Instance, Attribute and Attribute Value[J]. Journal of Nanjing University: Natural Sciences, 2012, 48(4): 383-389.)
[55]
Zhang Y H, Zhong V, Chen D Q.Position-aware Attention and Supervised Data Improve Slot Filling[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. USA: ACL, 2017: 35-45.
[56]
Huang L, Sil A, Ji H, et al.Improving Slot Filling Performance with Attention Neural Networks on Dependency Structures[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. USA: ACL, 2017: 2588-2597.
(Zhang Fan, Le Xiaoqiu.Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus[J]. New Technology of Library and Information Service, 2014(9): 15-21.)
[58]
Leskovec J, Milic-Frayling N, Grobelnik M.Extracting Summary Sentences Based on the Document Semantic Graph[R]. Microsoft Technical Report. Redmond: Microsoft Corporation,2005.
[59]
Muratore D, Hagenbuchner M, Scarselli F, et al.Sentence Extraction by Graph Neural Networks[C]// Proceedings of the 20th International Conference on Artificial Neural Networks. 2010: 237-246.
(Qin Yanxia, Zhang Min, Zheng Dequan.A Survey on Neural Network-based Methods for Event Extraction[J]. Intelligent Computer and Applications, 2018, 8(3): 1-5.)
[61]
Chen C, Ng V.Joint Modeling for Chinese Event Extraction with Rich Linguistic Features[C]// Proceedings of the 24th International Conference on Computational Linguistics. 2012: 529-544.
[62]
Li Q, Ji H, Huang L.Joint Event Extraction via Structured Prediction with Global Features[C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013: 73-82.
[63]
Nguyen T H, Grishman R.Event Detection and Domain Adaptation with Convolutional Neural Networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 365-371.
[64]
Nguyen T H, Cho K, Grishaman R.Joint Event Extraction via Recurrent Neural Networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 300-309.
[65]
魏小梅. 生物事件抽取联合模型研究[D]. 武汉: 武汉大学, 2016.
[65]
(Wei Xiaomei.The Study on Joint Models for Biomedical Event Extraction[D]. Wuhan: Wuhan University, 2016.)
[66]
Xiong C Y, Power R, Callan J.Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding[C]// Proceedings of the 26th International Conference on World Wide Web, Perth, Australia. USA: ACM, 2017: 1271-1279.
[67]
Lossio-Ventura J A, Hogan W, Modave F, et al. OC-2-KB: A Software Pipeline to Build an Evidence-based Obesity and Cancer Knowledge Base[C]//Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine. 2017: 1284-1287.