Please wait a minute...
Advanced Search
数据分析与知识发现  2024, Vol. 8 Issue (2): 65-73     https://doi.org/10.11925/infotech.2096-3467.2022.1330
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于语步识别的科技文献结构化自动综合工具构建*
刘熠1,张智雄1,2(),王宇飞1,2,李雪思1,2
1中国科学院文献情报中心 北京 100190
2中国科学院大学经济与管理学院信息资源管理系 北京 100190
Constructing Automatic Structured Synthesis Tool for Sci-Tech Literature Based on Move Recognition
Liu Yi1,Zhang Zhixiong1,2(),Wang Yufei1,2,Li Xuesi1,2
1National Science Library, Chinese Academy of Sciences, Beijing 10090, China
2Department of Information Resources Management, School of Economic and Management, University of Chinese Academy of Sciences, Beijing 10090, China
全文: PDF (1769 KB)   HTML ( 4
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 借鉴文献综合(Synthesis)的思想,利用人工智能技术构建科技文献结构化自动综合工具,以结构化的形式自动梳理文献集的研究脉络与研究骨架,揭示文献集的要点与看点。【方法】 提出了一种基于语步识别的科技文献结构化自动综合工具的建设思路,即通过语步识别与研究问题、研究方法、研究进展短语抽取,自动揭示单篇文献中的关键知识内容;通过层次聚类与类簇标签生成,实现多篇文献的知识整理归纳;通过设计树形综合结构,指导结构化综合结果输出。【结果】 研发了结构化自动综合工具,能够自动综合文献集内容,并按照“研究问题-研究方法-研究进展”的树形结构揭示文献集的研究脉络与骨架。【局限】 由于聚类技术的限制,目前还存在聚类准确率不足、聚类簇个数难以确定等问题,影响了自动综合效果。【结论】 基于语步识别技术,构建面向实际应用的结构化自动综合工具,支持文献检索、自动综合、结果循证等功能,验证了基于语步识别实现结构化自动综合思路的可行性和有效性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
刘熠
张智雄
王宇飞
李雪思
关键词 科技文献语步识别结构化自动综合短语抽取层次聚类类簇标签生成    
Abstract

[Objective] This paper utilizes AI technology to construct an automatic structured synthesis tool, which organizes the sci-tech research frameworks structurally and reveals their main points. [Methods] The new tool was developed based on move recognition. First, we identified the research questions, methodology, and progress keywords to extract the most important knowledge points from each literature. Then, we employed hierarchical clustering and cluster label generation methods to synthesize the knowledge. Third, we designed a tree structure for the synthesis outputs. [Results] The proposed tool could automatically synthesize the literature contents and reveal their framework with a “research question, methodology, and progress” tree structure. [Limitations] Insufficient clustering accuracy and difficulty determining cluster numbers reduce our model's synthesis performance. [Conclusions] The synthesis tool based on move recognition could automatically retrieve structured literature contents.

Key wordsScientific and Technological Literature    Move Recognition    Automatic Structured Synthesis    Phrase Extraction    Hierarchical Clustering    Label Generation
收稿日期: 2022-11-14      出版日期: 2023-04-28
ZTFLH:  TP391  
  G35  
基金资助:*中国科学院特别研究助理资助项目(E1290905);国家科技图书文献中心(NSTL)专项(2022XM28)
通讯作者: 张智雄,ORCID:0000-0003-1596-7487,E-mail:zhangzx@mail.las.ac.cn。   
引用本文:   
刘熠, 张智雄, 王宇飞, 李雪思. 基于语步识别的科技文献结构化自动综合工具构建*[J]. 数据分析与知识发现, 2024, 8(2): 65-73.
Liu Yi, Zhang Zhixiong, Wang Yufei, Li Xuesi. Constructing Automatic Structured Synthesis Tool for Sci-Tech Literature Based on Move Recognition. Data Analysis and Knowledge Discovery, 2024, 8(2): 65-73.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1330      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I2/65
Fig.1  结构化自动综合整体技术框架
Fig.2  综合利用文摘信息抽取研究问题、方法、进展短语的算法
字段 数据类型 示例
研究问题 String “coffee consumption on lung cancer in Thai population”
研究方法 String “proportional hazard models”
研究进展 String “suggests coffee consumption may be a protective factor for lung cancer”
Table 1  关键短语抽取结果示例
待聚类短语数量 距离阈值
<250 2.0
250~350 2.3
350~450 2.5
>450 2.8
Table 2  聚类参数设置
Fig.3  类簇标签生成方法示意图
Fig.4  综合结果的结构化骨架(部分)
Fig.5  文献的要点与看点展示
Fig.6  综合结果溯源与循证示意图
[1] Glass G V. Primary, Secondary, and Meta-Analysis of Research[J]. Educational Researcher, 1976, 5(10): 3-8.
[2] Noblit G W, Hare R D. Meta-Ethnography: Synthesizing Qulitative Studies[M]. London: SAGE Publications, 1988.
[3] Popay J, Roberts H, Sowden A, et al. Guidance on the Conduct of Narrative Synthesis in Systematic Reviews: A Product from the ESRC Methods Programme[EB/OL]. (2006-04) [2022-12-01]. https://www.lancaster.ac.uk/media/lancaster-university/content-assets/documents/fhm/dhr/chir/NSsynthesisguidanceVersion1-April2006.pdf.
[4] Noyes J, Booth A, Cargo M, et al. Qualitative Evidence[A]//Higgins J P T, Thomas J, Chandler J, et al. Cochrane Handbook for Systematic Reviews of Interventions[M]. The 2nd Edition. Chicester: Wiley-Blackwell, 2019: 525-545.
[5] Dixon-Woods M, Cavers D, Agarwal S, et al. Conducting a Critical Interpretive Synthesis of the Literature on Access to Healthcare by Vulnerable Groups[J]. BMC Medical Research Methodology, 2006, 6: Article No.35.
[6] Zimmer L. Qualitative Meta-Synthesis: A Question of Dialoguing with Texts[J]. Journal of Advanced Nursing, 2006, 53(3): 311-318.
doi: 10.1111/j.1365-2648.2006.03721.x pmid: 16441536
[7] Nanba H, Kando N, Okumura M. Classification of Research Papers Using Citation Links and Citation Types: Towards Automatic Review Article Generation[J]. Advances in Classification Research Online, 2011, 11(1): 117-134
[8] Tohalino J V, Amancio D R. Extractive Multi-Document Summarization Using Multilayer Networks[J]. Physica A: Statistical Mechanics and Its Applications, 2018, 503: 526-539.
doi: 10.1016/j.physa.2018.03.013
[9] Lamsiyah S, El Mahdaouy A, Espinasse B, et al. An Unsupervised Method for Extractive Multi-Document Summarization Based on Centroid Approach and Sentence Embeddings[J]. Expert Systems with Applications, 2021, 167: 114152.
doi: 10.1016/j.eswa.2020.114152
[10] Agarwal R, Chatterjee N. Improvements in Multi-Document Abstractive Summarization Using Multi Sentence Compression with Word Graph and Node Alignment[J]. Expert Systems with Applications, 2022, 190: 116154.
doi: 10.1016/j.eswa.2021.116154
[11] Teufel S, Moens M. Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status[J]. Computational Linguistics, 2002, 28(4): 409-445.
doi: 10.1162/089120102762671936
[12] Swales J M. Genre Analysis: English in Academic and Research Settings[M]. Cambridge: Cambridge University Press, 1990.
[13] 马浩, 崔运鹏. 基于混合深度学习模型的科技文献自动综述模型构建研究[J]. 情报理论与实践, 2021, 44(9): 176-182, 168.
doi: 10.16353/j.cnki.1000-7490.2021.09.025
[13] (Ma Hao, Cui Yunpeng. Research on the Construction of Model for Automatic Review of Scientific Literatures Based on Hybrid Deep Learning Model[J]. Information Studies: Theory & Application, 2021, 44(9): 176-182, 168.)
doi: 10.16353/j.cnki.1000-7490.2021.09.025
[14] Zhang Z X, Liu H, Ding L P, et al. Moves Recognition in Abstract of Research Paper Based on Deep Learning[C]// Proceedings of the 18th Joint Conference on Digital Libraries (JCDL). New York: ACM, 2019: 390-391.
[15] Yu G H, Zhang Z X, Liu H, et al. Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts[J]. Journal of Data and Information Science, 2019, 4(4): 42-55.
doi: 10.2478/jdis-2019-0020
[16] 张智雄, 刘欢, 于改红. 构建基于科技文献知识的人工智能引擎[J]. 农业图书情报学报, 2021, 33(1): 17-31.
doi: 10.13998/j.cnki.issn1002-1248.20-0797
[16] (Zhang Zhixiong, Liu Huan, Yu Gaihong. Building an Artificial Intelligence Engine Based on Scientific and Technological Literature Knowledge[J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 17-31.)
doi: 10.13998/j.cnki.issn1002-1248.20-0797
[17] Bird S, Klein E, Loper E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit[M]. Sebastopol: O'Reilly Media Inc, 2009.
[18] Honnibal M, Montani I. spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing[J]. To Appear, 2017, 7(1): 411-420.
[19] Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks[OL]. arXiv Preprint, arXiv:1908.10084.
[1] 杜新玉, 李宁. 中文学术论文全文语步识别研究*[J]. 数据分析与知识发现, 2024, 8(2): 74-83.
[2] 刘江峰, 冯钰童, 刘浏, 沈思, 王东波. 领域双语数据增强的学术文本摘要结构识别研究*[J]. 数据分析与知识发现, 2023, 7(8): 105-118.
[3] 李广建, 袁钺. 基于深度学习的科技文献知识单元抽取研究综述[J]. 数据分析与知识发现, 2023, 7(7): 1-17.
[4] 俞琰, 王丽, 郑斯煜. 融入术语与层级信息的专利关键短语抽取方法研究[J]. 数据分析与知识发现, 2023, 7(6): 99-112.
[5] 王卫军, 宁致远, 杜一, 周园春. 基于多标签分类的科技文献学科交叉研究性质识别*[J]. 数据分析与知识发现, 2023, 7(1): 102-112.
[6] 吕璐成, 周健, 王学昭, 刘细文. 基于双层主题模型的技术演化分析框架及其应用*[J]. 数据分析与知识发现, 2022, 6(2/3): 18-32.
[7] 柴庆凤, 史霖炎, 梅珊, 熊海涛, 贺惠新. 基于人工特征和机器特征融合的科技文献知识元抽取*[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[8] 王勤洁, 秦春秀, 马续补, 刘怀亮, 徐存真. 基于作者偏好和异构信息网络的科技文献推荐方法研究*[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
[9] 夏天. 面向中文学术文本的单文档关键短语抽取 *[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[10] 徐红霞,李春旺. 科技文献内容知识点抽取研究综述[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[11] 刘清民,姚长青,石崇德,温晓洁,孙玥莹. 面向科技文献神经机器翻译词汇表优化研究*[J]. 数据分析与知识发现, 2019, 3(3): 76-82.
[12] 张智雄,刘欢,丁良萍,吴朋民,于改红. 不同深度学习模型的科技论文摘要语步识别效果对比研究 *[J]. 数据分析与知识发现, 2019, 3(12): 1-9.
[13] 毕崇武,叶光辉,李明倩,曾杰妍. 基于标签语义挖掘的城市画像感知研究 *[J]. 数据分析与知识发现, 2019, 3(12): 41-51.
[14] 丁良萍,张智雄,刘欢. 影响支持向量机模型语步自动识别效果的因素研究 *[J]. 数据分析与知识发现, 2019, 3(11): 16-23.
[15] 贾君枝,叶壮壮. 基于潜在语义索引的Wikidata机构实体聚类研究 *[J]. 数据分析与知识发现, 2019, 3(10): 56-65.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn