Please wait a minute...
Advanced Search
数据分析与知识发现  2022, Vol. 6 Issue (12): 13-22     https://doi.org/10.11925/infotech.2096-3467.2022.0087
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于句法依赖增强的主题-问题实例识别方法研究
王露1,2,乐小虬1,2()
1中国科学院文献情报中心 北京 100190
2中国科学院大学经济与管理学院图书情报与档案管理系 北京 100190
Identifying Topic-Problem Instances Based on Syntactic Dependency Enhancement
Wang Lu1,2,Le Xiaoqiu1,2()
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
全文: PDF (1133 KB)   HTML ( 18
输出: BibTeX | EndNote (RIS)      
摘要 

目的】 从科技文献中发现给定主题在已有研究中存在的缺陷、不足、难点等方面的问题实例。【方法】 将主题-问题实例对的抽取任务转化为候选短语分类问题。在问题句的基础上抽取候选短语、构建句法依赖树,采用基于BiGCN和Transformer交互模块的句法依赖增强分类模型判断候选短语是否为给定主题对应的问题实例。【结果】 实现了面向主题的问题实例识别,其中句法增强的分类模型在候选短语分类任务中F1值为83.7%,相比基线模型提高了2.8个百分点。【局限】 没有考虑句子间的指代关系,存在问题实例遗漏的可能,从而导致召回率降低。【结论】 句法依赖增强模型能够较好地学习句子中主题与问题实例间的对应关系,提高给定主题的问题实例识别准确率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王露
乐小虬
关键词 问题识别句法依赖Transformer图神经网络    
Abstract

[Objective] This paper aims to identify the defects, deficiencies, and difficulties of existing research on a given topic. [Methods] First, we transformed the topic-problem instance pair extraction to candidate phrase classification. Then, we extracted candidate phrases from the problem sentences, and constructed a syntactic dependency tree. Third, we built a syntactic dependency enhanced classification model based on BiGCN and Transformer interaction module, Fourth, we used this new model to identify the problem instances from the candidate phrases corresponding to a given topic. [Results] The proposed model effectively identified the problem instances and topic-problem instances. Its F1 value reached 83.7%, which is 2.8 percentage point higher than the baseline model. [Limitations] We did not examine the referential relationship between sentences, which may omit some problem instances and reduce the recall rates. [Conclusions] The proposed model could effectively identify the topic and problem instances.

Key wordsProblem Extraction    Syntactic Dependency    Transformer    Graph Neural Network
收稿日期: 2022-01-29      出版日期: 2023-02-03
ZTFLH:  TP391  
通讯作者: 乐小虬,ORCID:0000-0002-7114-5544     E-mail: lexq@mail.las.ac.cn
引用本文:   
王露, 乐小虬. 基于句法依赖增强的主题-问题实例识别方法研究[J]. 数据分析与知识发现, 2022, 6(12): 13-22.
Wang Lu, Le Xiaoqiu. Identifying Topic-Problem Instances Based on Syntactic Dependency Enhancement. Data Analysis and Knowledge Discovery, 2022, 6(12): 13-22.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0087      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2022/V6/I12/13
Fig.1  处理流程
主题 候选短语 标签
low-fidelity
model
not very accurate 1
may be quick to evaluate 0
whereas a high-fidelity model may be computationally expensive to evaluate 0
provides an accurate estimate of the true performance 0
high-fidelity
model
not very accurate 0
a low-fidelity model 0
may be quick to evaluate 0
may be computationally expensive to evaluate 1
provides an accurate estimate of the true performance 0
Table 1  候选短语抽取及标注示例
Fig.2  多词主题合并示意图
Fig.3  模型框架
Fig.4  平面表示与依赖表示交互模块
模型 准确率/% 召回率/% F1值/% 句子级准确率/%
BiLSTM 82.3 79.9 80.9 75.9
本文 84.6 82.8 83.7 80.9
Table 2  对比实验结果
模型 准确率
/%
召回率
/%
F1值
/%
句子级
准确率/%
Transformer 82.7 81.2 81.9 77.9
BiGCN 83.3 81.3 82.3 78.4
Transformer + BiGCN 83.4 80.9 82.1 78.2
本文 84.6 82.8 83.7 80.9
Table 3  消融实验结果
问题实例 模型 识别结果
1. Frequent changes in the relationship of members towards a community make the task of community detection even more challenging. Transformer /
BiGCN {<community detection; frequent changes in the relationship of members towards a community >}
Transformer + BiGCN {<community detection; frequent changes in the relationship of members towards a community >}
本文 {<community detection; frequent changes in the relationship of members towards a community >}
2. Most of the existing community detection approaches ignore node attributes information, which leads to poor results. Transformer {<community detection; ignore node attributes information>}
BiGCN {<community detection; ignore node attributes information, leads to poor results>}
Transformer + BiGCN {<community detection; ignore node attributes information >}
本文 {<community detection; ignore node attributes information, leads to poor results>}
3. The most basic and significant issue in complex network analysis is community detection, which is a branch of machine learning. Transformer /
BiGCN {<community detection; is a branch of machine learning >}
Transformer + BiGCN {<community detection; is a branch of machine learning >}
本文 /
Table 4  实验结果示例
[1] 邓思艺. 单篇论文核心“问题-方法-结论”共指三元组识别方法研究[D]. 北京: 中国科学院大学, 2020.
[1] (Deng Siyi. Identification Method of the Core “Problem-Method-Conclusion” Coreference Triple in a Single Scientific Paper[D]. Beijing: University of Chinese Academy of Sciences, 2020.)
[2] Heffernan K, Teufel S. Identifying Problems and Solutions in Scientific Text[J]. Scientometrics, 2018, 116(2):1367-1382.
doi: 10.1007/s11192-018-2718-6 pmid: 30147202
[3] Mishra R B, Jiang H B. Classification of Problem and Solution Strings in Scientific Texts: Evaluation of the Effectiveness of Machine Learning Classifiers and Deep Neural Networks[J]. Applied Sciences, 2021, 11(21):9997.
doi: 10.3390/app11219997
[4] 徐珍珍, 张均胜, 刘文斌. 科技文献中技术关联自动发现方法研究[J]. 图书情报工作, 2021, 65(20): 113-122.
doi: 10.13266/j.issn.0252-3116.2021.20.012
[4] (Xu Zhenzhen, Zhang Junsheng, Liu Wenbin. Automatically Discovering Associations among Technologies in Scientific Literature[J]. Library and Information Service, 2021, 65(20): 113-122.)
doi: 10.13266/j.issn.0252-3116.2021.20.012
[5] 王艳艳, 张均胜, 乔晓东, 等. 基于问题-方法矩阵的文献新颖性评估方法[J]. 情报理论与实践, 2021, 44(2): 90-95.
[5] (Wang Yanyan, Zhang Junsheng, Qiao Xiaodong, et al. Evaluating Novelty of Scientific Literature Based on Question-Method Matrix[J]. Information Studies: Theory & Application, 2021, 44(2): 90-95.)
[6] Sasaki H, Yamamoto S, Agchbayar A, et al. Extracting Problem Linkages to Improve Knowledge Exchange Between Science and Technology Domains Using an Attention-Based Language Model[J]. Engineering, Technology & Applied Science Research, 2020, 10(4): 5903-5913.
[7] 陈果, 彭家彬, 肖璐. 基于“问题-方法”知识抽取的科研领域知识演化研究:以人工智能为例[J]. 情报理论与实践, 2022, 45(6): 32-38.
[7] (Chen Guo, Peng Jiabin, Xiao Lu. Knowledge Evolution of Scientific Research Domains Based on Problem-Solution Knowledge Extraction: A Case Study of Artificial Intelligence[J]. Information Studies: Theory & Application, 2022, 45(6): 32-38.)
[8] 陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329.
[8] (Lu Wei, Li Pengcheng, Zhang Guobiao, et al. Recognition of Lexical Functions in Academic Texts: Automatic Classification of Keywords Based on BERT Vectorization[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(12): 1320-1329.)
[9] 钱佳佳, 罗卓然, 陆伟. 基于问题-方法组合的科技论文新颖性度量与创新类型识别[J]. 图书情报工作, 2021, 65(14): 82-89.
doi: 10.13266/j.issn.0252-3116.2021.14.010
[9] (Qian Jiajia, Luo Zhuoran, Lu Wei. Novelty Measurement and Innovation Type Identification of Scientific Literature Based on Question-Method Combination[J]. Library and Information Service, 2021, 65(14): 82-89.)
doi: 10.13266/j.issn.0252-3116.2021.14.010
[10] D’Souza J, Auer S. Pattern-Based Acquisition of Scientific Entities from Scholarly Article Titles[C]// Proceedings of the 23rd International Conference on Asia-Pacific Digital Libraries. 2021: 401-410.
[11] Kipf T N, Welling M. Semi-Supervised Classification with Graph Convolutional Networks[OL]. arXiv Preprint, arXiv: 1609.02907.
[12] Vaswani A, Shazeer N, Parmar N, et al. Attention is all You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017: 6000-6010.
[13] 李信, 程齐凯, 刘兴帮. 基于词汇功能识别的科研文献分析系统设计与实现[J]. 图书情报工作, 2017, 61(1): 109-116.
doi: 10.13266/j.issn.0252-3116.2017.01.013
[13] (Li Xin, Cheng Qikai, Liu Xingbang. Design and Implementation of Scientific Literature Analysis System Based on Term Function Recognition[J]. Library and Information Service, 2017, 61(1): 109-116.)
doi: 10.13266/j.issn.0252-3116.2017.01.013
[14] Asadi N, Badie K, Mahmoudi M T. Automatic Zone Identification in Scientific Papers via Fusion Techniques[J]. Scientometrics, 2019, 119(2): 845-862.
doi: 10.1007/s11192-019-03060-9
[15] Howard J, Ruder S. Universal Language Model Fine-Tuning for Text Classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 328-339.
[16] Merity S, Keskar N S, Socher R. Regularizing and Optimizing LSTM Language Models[OL]. arXiv Preprint, arXiv: 1708.02182.
[17] Merity S, Xiong C, Bradbury J, et al. Pointer Sentinel Mixture Models[OL]. arXiv Preprint, arXiv: 1609.07843.
[18] Ge S Y, Huang J X, Meng Y, et al. Fine-Grained Opinion Summarization with Minimal Supervision[OL]. arXiv Preprint, arXiv: 2110.08845.
[19] Tang H, Ji D H, Li C L, et al. Dependency Graph Enhanced Dual-transformer Structure for Aspect-Based Sentiment Classification[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 6578-6588.
[20] Xing B W, Tsang I. DigNet: Digging Clues from Local-Global Interactive Graph for Aspect-Level Sentiment Classification[OL]. arXiv Preprint, arXiv: 2201.00989.
[21] Bahdanau D, Cho K, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473.
[22] Wang Y Q, Huang M L, Zhu X Y, et al. Attention-Based LSTM for Aspect-Level Sentiment Classification[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 606-615.
[23] Liu J M, Zhang Y. Attention Modeling for Targeted Sentiment[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017: 572-577.
[24] Li X, Bing L D, Lam W, et al. Transformation Networks for Target-Oriented Sentiment Classification[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1:Long Papers). 2018: 946-956.
[25] Xue W, Li T. Aspect Based Sentiment Analysis with Gated Convolutional Networks[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). 2018: 2514-2523.
[26] Zhang C, Li Q C, Song D W. Aspect-Based Sentiment Classification with Aspect-Specific Graph Convolutional Networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 4568-4578.
[27] Wang K, Shen W Z, Yang Y Y, et al. Relational Graph Attention Network for Aspect-Based Sentiment Analysis[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 3229-3238.
[28] Manning C D, Surdeanu M, Bauer J, et al. The Stanford CoreNLP Natural Language Processing Toolkit[C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations. 2014: 55-60.
[29] Pennington J, Socher R, Manning C. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
[30] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[OL]. arXiv Preprint, arXiv: 1412.6980.
[1] 成全, 佘德昕. 融合患者体征与用药数据的图神经网络药物推荐方法研究*[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[2] 张若琦, 申建芳, 陈平华. 结合GNN、Bi-GRU及注意力机制的会话序列推荐*[J]. 数据分析与知识发现, 2022, 6(6): 46-54.
[3] 郭樊容, 黄孝喜, 王荣波, 谌志群, 胡创, 谢一敏, 司博宇. 基于Transformer和图卷积神经网络的隐喻识别*[J]. 数据分析与知识发现, 2022, 6(4): 120-129.
[4] 王洁,高原,张蕾,马力文,冯筠. 基于因果分析图的城市交通流短时预测研究*[J]. 数据分析与知识发现, 2022, 6(11): 111-125.
[5] 顾耀文,郑思,杨丰春,李姣. 基于图神经网络的抗结核杆菌药物虚拟筛选模型的建立及应用*[J]. 数据分析与知识发现, 2022, 6(11): 93-102.
[6] 冯小东, 惠康欣. 基于异构图神经网络的社交媒体文本主题聚类*[J]. 数据分析与知识发现, 2022, 6(10): 9-19.
[7] 黄学坚, 刘雨飏, 马廷淮. 基于改进型图神经网络的学术论文分类模型*[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
[8] 顾耀文, 张博文, 郑思, 杨丰春, 李姣. 基于图注意力网络的药物ADMET分类预测模型构建方法*[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[9] 张冬瑜,崔紫娟,李映夏,张伟,林鸿飞. 基于Transformer和BERT的名词隐喻识别*[J]. 数据分析与知识发现, 2020, 4(4): 100-108.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn