科技文献内容知识点抽取研究综述

doi:10.11925/infotech.2096-3467.2018.0607

数据分析与知识发现

2019, Vol. 3

Issue (3): 14-24 https://doi.org/10.11925/infotech.2096-3467.2018.0607

综述评介

本期目录 | 过刊浏览 | 高级检索

科技文献内容知识点抽取研究综述

徐红霞(

),李春旺

中国科学院文献情报中心北京 100190
中国科学院大学图书情报与档案管理系北京 100190

Review of Knowledge Extraction of Scientific Literature

Hongxia Xu(

),Chunwang Li

National Science Library, Chinese Academy of Sciences, Beijing 100190, China
Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China

摘要
参考文献
相关文章
Metrics

全文: PDF (495 KB) HTML ( 18 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】总结述评国内外科技文献内容知识点抽取研究。【文献范围】以CNKI和Google Scholar为平台, 检索得到知识点抽取相关论文, 共选择68篇代表性文献进行述评。【方法】采用文献调研方法, 对当前图书情报和计算机领域的知识点抽取研究进展进行评析, 对关键的抽取技术进行分类总结。【结果】在总结知识点抽取研究现状和技术体系的基础上, 指出科技文献知识点抽取技术的利弊及未来研究方向。【局限】不同学科领域的科技文献知识点抽取的对比研究较少。【结论】本文提出的研究框架有助于全面把握知识点抽取研究现状, 为其他学者开展新的研究提供借鉴。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	徐红霞
	李春旺

关键词 ：知识抽取, 科技文献, 机器学习

Abstract：

[Objective] The paper reviews knowledge extraction of scientific literature. [Coverage] We searched research literatures in CNKI and Google Scholar, and then obtained a total of 68 representive literatures on knowledge extraction. [Methods] We used literature survey method. First, we reviewd knowledge extraction in the Library & Information Science and Computer Science. Then, we classified and summarized the key extraction technology. [Results] Investigating the current research status and technological system, this paper gives the pros & cons and the roadmap of knowledge extraction technology. [Limitations] There is little comparative study on knowledge extraction is different subjects. [Conclusions] The research framework is helpful to get a thorough understanding of the present status and provides some good advice for scholars.

Key words： Knowledge Extraction Scientific Literature Machine Learning

收稿日期: 2018-06-01 出版日期: 2019-04-17

引用本文:

徐红霞,李春旺. 科技文献内容知识点抽取研究综述[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature. Data Analysis and Knowledge Discovery, 2019, 3(3): 14-24.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0607 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I3/14

[1]	Hey T, Tansley S, Tolle K.The Fourth Paradigm[M]. Microsoft Press, 2009.
[2]	刘丽佳, 郭剑毅, 周兰江, 等. 基于LM算法的领域概念实体属性关系抽取[J]. 中文信息学报, 2014, 28(6): 216-222.
[2]	(Liu Lijia, Guo Jianyi, Zhou Lanjiang, et al.Domain Concepts Entity Attribute Relation Extraction Based on LM Algorithm[J]. Journal of Chinese Information Processing, 2014, 28(6): 216-222.)
[3]	王宁, 陈湧, 郭玮, 等. 基于知识元的突发事件案例信息抽取方法[J]. 系统工程, 2014, 32(12): 133-139.
[3]	(Wang Ning, Chen Yong, Guo Wei, et al.A Method for Emergency Case Information Extraction Based on Knowledge Element[J]. Systems Engineering, 2014, 32(12): 133-139.)
[4]	Demner-Fushman D, Few B, Hauser S E, et al.Automatically Identifying Health Outcome Information in Medline Records[J]. Journal of the American Medical Informatics Association, 2006, 13(1): 52-60.
[5]	Lenat D B.CYC: A Large-scale Investment in Knowledge Infrastructure[J]. Communications of the ACM, 1995, 38(11): 33-38.
[6]	Ernst P, Meng C, Siu A, et al.KnowLife: A Knowledge Graph for Health and Life Sciences[C]//Proceedings of the 30th International Conference on Data Engineering. 2014.
[7]	张力元, 姬东鸿. LS-SVM与条件随机场结合的生物证据句子抽取[J]. 计算机工程, 2015, 41(5): 207-212.
[7]	(Zhang Liyuan, Ji Donghong.Biological Evidence Sentence Extraction with Combination of LS-SVM and Conditional Random Field[J]. Computer Engineering, 2015, 41(5): 207-212.)
[8]	刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016, 53(2): 247-261.
[8]	(Liu Zhiyuan, Sun Maosong, Lin Yankai, et al.Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016, 53(2): 247-261.)
[9]	Chambers N, Jurafsky D.Unsupervised Learning of Narrative Schemas and Their Participants[C]// Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009: 602-610.
[10]	王洋洋. 基于海量学术资源的知识元抽取研究[D]. 宁波: 宁波大学, 2014.
[10]	(Wang Yangyang.Research on Knowledge Element Extraction Based on Massive Academic Resources[D]. Ningbo: Ningbo University, 2014.)
[11]	Rak R, Kurgan L, Reformat M.Use of OWL 2 to Facilitate a Biomedical Knowledge Base Extracted from the GENIA Corpus[C]//Proceedings of the 5th OWLED Workshop on OWL: Experiences and Directions, Collocated with the 7th International Semantic Web Conference. 2008.
[12]	孙静, 杨帆, 邓文萍, 等. 基于本体的中医症状知识表示模型构建[J]. 医学信息学杂志, 2017, 38(2): 52-56.
[12]	(Sun Jing, Yang Fan, Deng Wenping, et al.Construction of TCM Symptoms Knowledge Representation Model Based on Ontology[J]. Journal of Medical Informatics, 2017, 38(2): 52-56.)
[13]	刘盛博, 丁堃, 张春博. 引文分析的新阶段:从引文著录分析到引用内容分析[J]. 图书情报知识, 2015(3): 25-34.
[13]	(Liu Shengbo, Ding Kun, Zhang Chunbo.New Stage of Citation Analysis: From Citation Description Analysis to Citation Context Analysis[J]. Documentation, Information & Knowledge, 2015(3): 25-34.)
[14]	Jeong Y K, Song M, Ding Y.Content-based Author Co-citation Analysis[J]. Journal of Informatrics, 2014, 8(1): 197-211.
[15]	冷伏海, 白如江, 祝清松. 面向科技文献的混合语义信息抽取方法研究[J]. 图书情报工作, 2013, 57(11): 112-119.
[15]	(Leng Fuhai, Bai Rujiang, Zhu Qingsong.A Hybrid Semantic Information Extraction Method for Scientific Research Papers[J]. Library and Information Service, 2013, 57(11): 112-119.)
[16]	葛斌, 李芳芳, 李阜, 等. 基于无向图构建策略的主题句抽取[J]. 计算机科学, 2011, 38(5): 181-185.
[16]	(Ge Bing, Li Fangfang, Li Fu, et al.Subject Science Extraction Based on Undirected Graph Construction[J]. Computer Science, 2011, 38(5): 181-185.)
[17]	温浩, 温有奎, 王民. 基于模式识别的文本知识点深度挖掘方法[J]. 计算机科学, 2016, 43(3): 279-284.
[17]	(Wen Hao, Wen Youkui, Wang Min.Approach to Text Knowledge Depth Mining Based on Pattern Recognition[J]. Computer Science, 2016, 43(3): 279-284.)
[18]	Yi L, Mari O, Hannaneh H.Scientific Information Extraction with Semi-supervised Neural Tagging[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. USA: ACL, 2017: 2641-2651.
[19]	Girju R, Beamer B, Rozovskaya A, et al.A Knowledge-rich Approach to Identifying Semantic Relations Between Nominals[J]. Information Processing & Management, 2010, 46(5): 589-610.
[20]	车海燕, 冯铁, 张家晨, 等. 面向中文自然语言文档的自动知识抽取方法[J]. 计算机研究与发展, 2013, 50(4): 834-842.
[20]	(Che Haiyan, Feng Tie, Zhang Jiachen, et al.Automatic Knowledge Extraction from Chinese Natural Language Documents[J]. Journal of Computer Research and Development, 2013, 50(4): 834-842.)
[21]	丁君军, 郑彦宁, 化柏林. 基于规则的学术概念属性抽取[J]. 情报理论与实践, 2011, 34(12): 10-14.
[21]	(Ding Junjun, Zheng Yanning, Hua Bolin.Extraction of Academic Concept Attribute Based on Rules[J]. Information Studies: Theory & Application, 2011, 34(12): 10-14.)
[22]	翟劼, 裘江南. 基于规则的知识元属性抽取方法研究[J]. 情报科学, 2016, 34(4): 43-47.
[22]	(Zhai Jie, Qiu Jiangnan.Research on the Rule-based Knowledge Unit Attributes Extraction Method[J]. Information Science, 2016, 34(4): 43-47.)
[23]	徐绪堪, 房道伟, 蒋勋, 等. 知识组织中知识粒度化表示和规范化研究[J]. 图书情报知识, 2014(6): 101-106.
[23]	(Xu Xukan, Fang Daowei, Jiang Xun, et al.Research on Knowledge Granularity Representation and Standardization During Knowledge Organization[J]. Documentation, Information & Knowledge, 2014(6): 101-106.)
[24]	徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J]. 电子科技大学学报, 2016, 45(4): 589-606.
[24]	(Xu Zenglin, Sheng Yongpan, He Lirong, et al.Review on Knowledge Graph Techniques[J]. Journal of University of Electronic Science and Technology of China, 2016, 45(4): 589-606.)
[25]	Rau L F.Extracting Company Names from Text[C]// Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications. IEEE, 1991: 29-32.
[26]	Lin Y F, Tsai T, Chou W C, et al.A Maximum Entropy Approach to Biomedical Named Entity Recognition[C]// Proceedings of the 4th International Conference on Data Mining in Bioinformatics. USA: ACM, 2008: 56-61.
[27]	Liu X H, Zhang S D, Wei F R, et al.Recognizing Named Entities in Tweets[C]// Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. ACL, 2011: 359-367.
[28]	Lample G, Ballesteros M, Subramanian S, et al.Neural Architectures for Named Entity Recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. USA: ACL, 2016: 260-270.
[29]	Whitelaw C, Kehlenbeck A, Petrovic N, et al.Web-Scale Named Entity Recognition[C]// Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 123-132.
[30]	Etzioni O, Cafarella M, Downey D, et al.Unsupervised Named-Entity Extraction from the Web: An Experimental Study[J]. Artificial Intelligence, 2005, 165: 91-134.
[31]	Brin S.Extracting Patterns and Relations from the World Wide Web[C]//Proceedings of the 6th International Conference on Extending Database Technology, 1998: 172-183.
[32]	Agichtein E, Gravano L.Snowball: Extracting Relations from Large Plain-text Collections[C]// Proceedings of the 5th ACM International Conference on Digital Libraries. ACM, 2000: 85-94.
[33]	Zhu J, Nie Z Q, Liu X J, et al.Statsnowball: A Statistical Approach to Extracting Entity Relationships[C]// Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain. New York, USA: ACM, 2009: 101-110.
[34]	Carlson A, Betteridge J, Wang R C, et al.Coupled Semi-Supervised Learning for Information Extraction[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining, New York, USA. USA: ACM, 2010: 101-110.
[35]	Roth B, Klakow D.Combining Generative and Discriminative Model Scores for Distant Supervision[C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing.2013: 24-29.
[36]	Roth B, Barth T, Wiegand M, et al.Effective Slot Filling Based on Shallow Distant Supervision Methods[OL]. arXiv Preprint, arXiv:1401.1158.
[37]	Kambhatla N. Combining Lexical, Syntactic, and Semantic Features with Maximum Entropy Models for Extracting Relations[C]//Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, Barcelona, Spain. USA: ACL, 2004.
[38]	Miao Q L, Zhang S, Zhang B, et al.Extracting and Visualizing Semantic Relationships from Chinese Biomedical Text[C]// Proceedings of the 26th Pacific Asia Conference on Language, Information and Computation, 2012: 99-107.
[39]	Sun X, Dong L.Featured-Based Approach to Chinese Term Relation Extraction[C]// Proceedings of the 2009 International Conference on Signal Processing Systems. USA: ACM, 2009: 410-414.
[40]	车万翔, 刘挺, 李生. 实体关系自动抽取[J]. 中文信息学报, 2005, 19(2): 1-6.
[40]	(Che Wanxiang, Liu Ting, Li Sheng.Automatic Entity Relation Extraction[J]. Journal of Chinese Information Processing, 2005, 19(2): 1-6.)
[41]	Culotta A, Sorensen J.Dependency Tree Kernels for Relation Extraction[C]// Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain. USA: ACL, 2004.
[42]	Zelenko D, Aone C, Richardella A.Kernel Methods for Relation Extraction[J]. Journal of Machine Learning Research, 2003, 3: 1083-1106.
[43]	Nguyen T H, Grishman R.Relation Extraction: Perspective from Convolutional Neural Networks[C]// Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015: 39-48.
[44]	Nguyen T H, Grishman R.Combining Neural Networks and Log-linear Models to Improve Relation Extraction[OL]. arXiv Preprint, arXiv: 1511.059026.
[45]	杨博, 蔡东风, 杨华. 开放式信息抽取研究进展[J]. 中文信息学报, 2014, 28(4): 1-11.
[45]	(Yang Bo, Cai Dongfeng, Yang Hua.Progress in Open Information Extraction[J]. Journal of Chinese Information Processing, 2014, 28(4): 1-11.)
[46]	Wu F, Weld D S.Open Information Extraction Using Wikipedia[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. USA: ACL, 2010: 118-127.
[47]	Fader A, Soderland S, Etzioni O.Identifying Relations for Open Information Extraction[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 2011: 1535-1545.
[48]	Akbik A, Loser A.KrakeN: N-ary Facts in Open Information Extraction[C]// Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction. ACM, 2012: 52-56.
[49]	Zeng D, Liu K, Chen Y, et al.Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks[C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. USA: ACL, 2015: 1753-1762.
[50]	Sunil S K, Anand A, Oruganty K, et al.Relation Extraction from Clinical Texts Using Domain Invariant Convolutional Neural Network[C]// Proceedings of the 15th Workshop on Biomedical Natural Language Processing, Berlin, Germany. USA: ACL, 2016: 206-215.
[51]	Katiyar A, Cardie C.Investigating LSTMs for Joint Extraction of Opinion Entities and Relations[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. USA: ACL, 2016: 919-929.
[52]	Miwa M, Bansal M.End-to-End Relation Extraction Using LSTMs on Sequences and Tree Structures[C]// Proceedings of the Association for Computational Linguistics, Berlin, Germany. USA: ACL, 2016: 1105-1116.
[53]	Zheng S C, Hao Y X, Lu D Y, et al.Joint Entity and Relation Extraction Based on a Hybrid Neural Network[J]. Neurocomputing, 2017, 257: 59-66.
[54]	郭剑毅, 李真, 余正涛, 等. 领域本体概念实例、属性和属性值的抽取及关系预测[J]. 南京大学学报: 自然科学版, 2012, 48(4): 383-389.
[54]	(Guo Jianyi, Li Zhen, Yu Zhengtao, et al.Extraction and Relation Prediction of Domain Ontology Concept Instance, Attribute and Attribute Value[J]. Journal of Nanjing University: Natural Sciences, 2012, 48(4): 383-389.)
[55]	Zhang Y H, Zhong V, Chen D Q.Position-aware Attention and Supervised Data Improve Slot Filling[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. USA: ACL, 2017: 35-45.
[56]	Huang L, Sil A, Ji H, et al.Improving Slot Filling Performance with Attention Neural Networks on Dependency Structures[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. USA: ACL, 2017: 2588-2597.
[57]	张帆, 乐小虬. 面向领域科技文献的句子级创新点抽取研究[J]. 现代图书情报技术, 2014(9): 15-21.
[57]	(Zhang Fan, Le Xiaoqiu.Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus[J]. New Technology of Library and Information Service, 2014(9): 15-21.)
[58]	Leskovec J, Milic-Frayling N, Grobelnik M.Extracting Summary Sentences Based on the Document Semantic Graph[R]. Microsoft Technical Report. Redmond: Microsoft Corporation,2005.
[59]	Muratore D, Hagenbuchner M, Scarselli F, et al.Sentence Extraction by Graph Neural Networks[C]// Proceedings of the 20th International Conference on Artificial Neural Networks. 2010: 237-246.
[60]	秦彦霞, 张民, 郑德权. 神经网络事件抽取技术综述[J]. 智能计算机与应用, 2018, 8(3): 1-5.
[60]	(Qin Yanxia, Zhang Min, Zheng Dequan.A Survey on Neural Network-based Methods for Event Extraction[J]. Intelligent Computer and Applications, 2018, 8(3): 1-5.)
[61]	Chen C, Ng V.Joint Modeling for Chinese Event Extraction with Rich Linguistic Features[C]// Proceedings of the 24th International Conference on Computational Linguistics. 2012: 529-544.
[62]	Li Q, Ji H, Huang L.Joint Event Extraction via Structured Prediction with Global Features[C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 2013: 73-82.
[63]	Nguyen T H, Grishman R.Event Detection and Domain Adaptation with Convolutional Neural Networks[C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015: 365-371.
[64]	Nguyen T H, Cho K, Grishaman R.Joint Event Extraction via Recurrent Neural Networks[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016: 300-309.
[65]	魏小梅. 生物事件抽取联合模型研究[D]. 武汉: 武汉大学, 2016.
[65]	(Wei Xiaomei.The Study on Joint Models for Biomedical Event Extraction[D]. Wuhan: Wuhan University, 2016.)
[66]	Xiong C Y, Power R, Callan J.Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding[C]// Proceedings of the 26th International Conference on World Wide Web, Perth, Australia. USA: ACM, 2017: 1271-1279.
[67]	Lossio-Ventura J A, Hogan W, Modave F, et al. OC-2-KB: A Software Pipeline to Build an Evidence-based Obesity and Cancer Knowledge Base[C]//Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine. 2017: 1284-1287.
[68]	龚立群, 孙洁丽. 国外主要知识抽取项目介绍与评析[J]. 图书馆论坛, 2007, 27(4): 11-15.
[68]	(Gong Liqun, Sun Jieli.Introduction and Evaluation of Knowledge Extraction Projects Overseas[J]. Library Tribune, 2007, 27(4): 11-15.)

[1]	王寒雪,崔文娟,周园春,杜一. 基于机器学习的食源性疾病致病菌识别方法*[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2]	陈东华,赵红梅,尚小溥,张润彤. 数据驱动的大型医院手术室运营预测与优化方法研究*[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3]	车宏鑫,王桐,王伟. 前列腺癌预测模型对比研究*[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4]	柴庆凤, 史霖炎, 梅珊, 熊海涛, 贺惠新. 基于人工特征和机器特征融合的科技文献知识元抽取^*[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[5]	王勤洁, 秦春秀, 马续补, 刘怀亮, 徐存真. 基于作者偏好和异构信息网络的科技文献推荐方法研究^*[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[6]	苏强, 侯校理, 邹妮. 基于机器学习组合优化方法的术后感染预测模型研究^*[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[7]	曹睿,廖彬,李敏,孙瑞娜. 基于XGBoost的在线短租市场价格预测及特征分析模型^*[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[8]	钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述^*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[9]	向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 ^*[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[10]	石湘,刘萍. *基于知识元语义描述模型的领域知识抽取与表示研究 ^——以信息检索领域为例**[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[11]	柴国荣,王斌,沙勇忠. 基于多机器学习方法联合的公共卫生风险预测研究——以兰州市流感预测为例*[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[12]	陈东,王建冬,李慧颖,蔡思航,黄倩倩,易成岐,曹攀. 融合机器学习算法和多因素的禽肉交易量预测方法研究 ^*[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[13]	梁野,李小元,许航,胡伊然. CLOpin:一种面向舆情分析与预警领域的跨语言知识图谱架构*[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[14]	杨恒,王思丽,祝忠明,刘巍,王楠. 基于并行协同过滤算法的领域知识推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[15]	王树义,刘赛,马峥. 基于深度迁移学习的微博图像隐私分类研究^*[J]. 数据分析与知识发现, 2020, 4(10): 80-92.

Viewed

Full text

Abstract

Cited

Shared

Discussed