基于句法依赖增强的主题-问题实例识别方法研究

doi:10.11925/infotech.2096-3467.2022.0087

数据分析与知识发现

2022, Vol. 6

Issue (12): 13-22 https://doi.org/10.11925/infotech.2096-3467.2022.0087

研究论文

本期目录 | 过刊浏览 | 高级检索

基于句法依赖增强的主题-问题实例识别方法研究

王露^1,²,乐小虬^1,²(

)

¹中国科学院文献情报中心北京 100190
²中国科学院大学经济与管理学院图书情报与档案管理系北京 100190

Identifying Topic-Problem Instances Based on Syntactic Dependency Enhancement

Wang Lu^1,²,Le Xiaoqiu^1,²(

)

¹National Science Library, Chinese Academy of Sciences, Beijing 100190, China
²Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (1133 KB) HTML ( 18 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】从科技文献中发现给定主题在已有研究中存在的缺陷、不足、难点等方面的问题实例。【方法】将主题-问题实例对的抽取任务转化为候选短语分类问题。在问题句的基础上抽取候选短语、构建句法依赖树，采用基于BiGCN和Transformer交互模块的句法依赖增强分类模型判断候选短语是否为给定主题对应的问题实例。【结果】实现了面向主题的问题实例识别，其中句法增强的分类模型在候选短语分类任务中F1值为83.7%，相比基线模型提高了2.8个百分点。【局限】没有考虑句子间的指代关系，存在问题实例遗漏的可能，从而导致召回率降低。【结论】句法依赖增强模型能够较好地学习句子中主题与问题实例间的对应关系，提高给定主题的问题实例识别准确率。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	王露
	乐小虬

关键词 ：问题识别, 句法依赖, Transformer, 图神经网络

Abstract：

[Objective] This paper aims to identify the defects, deficiencies, and difficulties of existing research on a given topic. [Methods] First, we transformed the topic-problem instance pair extraction to candidate phrase classification. Then, we extracted candidate phrases from the problem sentences, and constructed a syntactic dependency tree. Third, we built a syntactic dependency enhanced classification model based on BiGCN and Transformer interaction module, Fourth, we used this new model to identify the problem instances from the candidate phrases corresponding to a given topic. [Results] The proposed model effectively identified the problem instances and topic-problem instances. Its F1 value reached 83.7%, which is 2.8 percentage point higher than the baseline model. [Limitations] We did not examine the referential relationship between sentences, which may omit some problem instances and reduce the recall rates. [Conclusions] The proposed model could effectively identify the topic and problem instances.

Key words： Problem Extraction Syntactic Dependency Transformer Graph Neural Network

收稿日期: 2022-01-29 出版日期: 2023-02-03

ZTFLH:

TP391

通讯作者: 乐小虬，ORCID：0000-0002-7114-5544 E-mail: lexq@mail.las.ac.cn

引用本文:

王露, 乐小虬. 基于句法依赖增强的主题-问题实例识别方法研究[J]. 数据分析与知识发现, 2022, 6(12): 13-22.
Wang Lu, Le Xiaoqiu. Identifying Topic-Problem Instances Based on Syntactic Dependency Enhancement. Data Analysis and Knowledge Discovery, 2022, 6(12): 13-22.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0087 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I12/13

Fig.1 处理流程

Table 1 候选短语抽取及标注示例

Fig.2 多词主题合并示意图

Fig.3 模型框架

Fig.4 平面表示与依赖表示交互模块

Table 2 对比实验结果

Table 3 消融实验结果

Table 4 实验结果示例

[1]	邓思艺. 单篇论文核心“问题-方法-结论”共指三元组识别方法研究[D]. 北京: 中国科学院大学, 2020.
[1]	(Deng Siyi. Identification Method of the Core “Problem-Method-Conclusion” Coreference Triple in a Single Scientific Paper[D]. Beijing: University of Chinese Academy of Sciences, 2020.)
[2]	Heffernan K, Teufel S. Identifying Problems and Solutions in Scientific Text[J]. Scientometrics, 2018, 116(2):1367-1382. doi: 10.1007/s11192-018-2718-6 pmid: 30147202
[3]	Mishra R B, Jiang H B. Classification of Problem and Solution Strings in Scientific Texts: Evaluation of the Effectiveness of Machine Learning Classifiers and Deep Neural Networks[J]. Applied Sciences, 2021, 11(21):9997. doi: 10.3390/app11219997
[4]	徐珍珍, 张均胜, 刘文斌. 科技文献中技术关联自动发现方法研究[J]. 图书情报工作, 2021, 65(20): 113-122. doi: 10.13266/j.issn.0252-3116.2021.20.012
[4]	(Xu Zhenzhen, Zhang Junsheng, Liu Wenbin. Automatically Discovering Associations among Technologies in Scientific Literature[J]. Library and Information Service, 2021, 65(20): 113-122.) doi: 10.13266/j.issn.0252-3116.2021.20.012
[5]	王艳艳, 张均胜, 乔晓东, 等. 基于问题-方法矩阵的文献新颖性评估方法[J]. 情报理论与实践, 2021, 44(2): 90-95.
[5]	(Wang Yanyan, Zhang Junsheng, Qiao Xiaodong, et al. Evaluating Novelty of Scientific Literature Based on Question-Method Matrix[J]. Information Studies: Theory & Application, 2021, 44(2): 90-95.)
[6]	Sasaki H, Yamamoto S, Agchbayar A, et al. Extracting Problem Linkages to Improve Knowledge Exchange Between Science and Technology Domains Using an Attention-Based Language Model[J]. Engineering, Technology & Applied Science Research, 2020, 10(4): 5903-5913.
[7]	陈果, 彭家彬, 肖璐. 基于“问题-方法”知识抽取的科研领域知识演化研究:以人工智能为例[J]. 情报理论与实践, 2022, 45(6): 32-38.
[7]	(Chen Guo, Peng Jiabin, Xiao Lu. Knowledge Evolution of Scientific Research Domains Based on Problem-Solution Knowledge Extraction: A Case Study of Artificial Intelligence[J]. Information Studies: Theory & Application, 2022, 45(6): 32-38.)
[8]	陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329.
[8]	(Lu Wei, Li Pengcheng, Zhang Guobiao, et al. Recognition of Lexical Functions in Academic Texts: Automatic Classification of Keywords Based on BERT Vectorization[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(12): 1320-1329.)
[9]	钱佳佳, 罗卓然, 陆伟. 基于问题-方法组合的科技论文新颖性度量与创新类型识别[J]. 图书情报工作, 2021, 65(14): 82-89. doi: 10.13266/j.issn.0252-3116.2021.14.010
[9]	(Qian Jiajia, Luo Zhuoran, Lu Wei. Novelty Measurement and Innovation Type Identification of Scientific Literature Based on Question-Method Combination[J]. Library and Information Service, 2021, 65(14): 82-89.) doi: 10.13266/j.issn.0252-3116.2021.14.010
[10]	D’Souza J, Auer S. Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles[C]// Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries. 2021: 401-410.
[11]	Kipf T N, Welling M. Semi-Supervised Classification with Graph Convolutional Networks[OL]. arXiv Preprint, arXiv: 1609.02907.
[12]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[13]	李信, 程齐凯, 刘兴帮. 基于词汇功能识别的科研文献分析系统设计与实现[J]. 图书情报工作, 2017, 61(1): 109-116. doi: 10.13266/j.issn.0252-3116.2017.01.013
[13]	(Li Xin, Cheng Qikai, Liu Xingbang. Design and Implementation of Scientific Literature Analysis System Based on Term Function Recognition[J]. Library and Information Service, 2017, 61(1): 109-116.) doi: 10.13266/j.issn.0252-3116.2017.01.013
[14]	Asadi N, Badie K, Mahmoudi M T. Automatic Zone Identification in Scientific Papers via Fusion Techniques[J]. Scientometrics, 2019, 119(2): 845-862. doi: 10.1007/s11192-019-03060-9
[15]	Howard J, Ruder S. Universal Language Model Fine-Tuning for Text Classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 328-339.
[16]	Merity S, Keskar N S, Socher R. Regularizing and Optimizing LSTM Language Models[OL]. arXiv Preprint, arXiv: 1708.02182.
[17]	Merity S, Xiong C, Bradbury J, et al. Pointer Sentinel Mixture Models[OL]. arXiv Preprint, arXiv: 1609.07843.
[18]	Ge S Y, Huang J X, Meng Y, et al. Fine-Grained Opinion Summarization with Minimal Supervision[OL]. arXiv Preprint, arXiv: 2110.08845.
[19]	Tang H, Ji D H, Li C L, et al. Dependency Graph Enhanced Dual-transformer Structure for Aspect-Based Sentiment Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6578-6588.
[20]	Xing B W, Tsang I. DigNet: Digging Clues from Local-Global Interactive Graph for Aspect-Level Sentiment Classification[OL]. arXiv Preprint, arXiv: 2201.00989.
[21]	Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[22]	Wang Y Q, Huang M L, Zhu X Y, et al. Attention-Based LSTM for Aspect-Level Sentiment Classification[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 606-615.
[23]	Liu J M, Zhang Y. Attention Modeling for Targeted Sentiment[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017: 572-577.
[24]	Li X, Bing L D, Lam W, et al. Transformation Networks for Target-Oriented Sentiment Classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers). 2018: 946-956.
[25]	Xue W, Li T. Aspect Based Sentiment Analysis with Gated Convolutional Networks[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2514-2523.
[26]	Zhang C, Li Q C, Song D W. Aspect-Based Sentiment Classification with Aspect-Specific Graph Convolutional Networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4568-4578.
[27]	Wang K, Shen W Z, Yang Y Y, et al. Relational Graph Attention Network for Aspect-Based Sentiment Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3229-3238.
[28]	Manning C D, Surdeanu M, Bauer J, et al. The Stanford CoreNLP Natural Language Processing Toolkit[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations. 2014: 55-60.
[29]	Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[30]	Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[OL]. arXiv Preprint, arXiv: 1412.6980.

[1]	成全, 佘德昕. 融合患者体征与用药数据的图神经网络药物推荐方法研究^*[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[2]	张若琦, 申建芳, 陈平华. 结合GNN、Bi-GRU及注意力机制的会话序列推荐^*[J]. 数据分析与知识发现, 2022, 6(6): 46-54.
[3]	郭樊容, 黄孝喜, 王荣波, 谌志群, 胡创, 谢一敏, 司博宇. 基于Transformer和图卷积神经网络的隐喻识别^*[J]. 数据分析与知识发现, 2022, 6(4): 120-129.
[4]	王洁,高原,张蕾,马力文,冯筠. 基于因果分析图的城市交通流短时预测研究^*[J]. 数据分析与知识发现, 2022, 6(11): 111-125.
[5]	顾耀文,郑思,杨丰春,李姣. 基于图神经网络的抗结核杆菌药物虚拟筛选模型的建立及应用^*[J]. 数据分析与知识发现, 2022, 6(11): 93-102.
[6]	冯小东, 惠康欣. 基于异构图神经网络的社交媒体文本主题聚类^*[J]. 数据分析与知识发现, 2022, 6(10): 9-19.
[7]	黄学坚, 刘雨飏, 马廷淮. 基于改进型图神经网络的学术论文分类模型^*[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
[8]	顾耀文, 张博文, 郑思, 杨丰春, 李姣. 基于图注意力网络的药物ADMET分类预测模型构建方法^*[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[9]	张冬瑜,崔紫娟,李映夏,张伟,林鸿飞. 基于Transformer和BERT的名词隐喻识别*[J]. 数据分析与知识发现, 2020, 4(4): 100-108.

Viewed

Full text

Abstract

Cited

Shared

Discussed