Constructing Automatic Structured Synthesis Tool for Sci-Tech Literature Based on Move Recognition
Liu Yi1,Zhang Zhixiong1,2(),Wang Yufei1,2,Li Xuesi1,2
1National Science Library, Chinese Academy of Sciences, Beijing 10090, China 2Department of Information Resources Management, School of Economic and Management, University of Chinese Academy of Sciences, Beijing 10090, China
[Objective] This paper utilizes AI technology to construct an automatic structured synthesis tool, which organizes the sci-tech research frameworks structurally and reveals their main points. [Methods] The new tool was developed based on move recognition. First, we identified the research questions, methodology, and progress keywords to extract the most important knowledge points from each literature. Then, we employed hierarchical clustering and cluster label generation methods to synthesize the knowledge. Third, we designed a tree structure for the synthesis outputs. [Results] The proposed tool could automatically synthesize the literature contents and reveal their framework with a “research question, methodology, and progress” tree structure. [Limitations] Insufficient clustering accuracy and difficulty determining cluster numbers reduce our model's synthesis performance. [Conclusions] The synthesis tool based on move recognition could automatically retrieve structured literature contents.
刘熠, 张智雄, 王宇飞, 李雪思. 基于语步识别的科技文献结构化自动综合工具构建*[J]. 数据分析与知识发现, 2024, 8(2): 65-73.
Liu Yi, Zhang Zhixiong, Wang Yufei, Li Xuesi. Constructing Automatic Structured Synthesis Tool for Sci-Tech Literature Based on Move Recognition. Data Analysis and Knowledge Discovery, 2024, 8(2): 65-73.
“coffee consumption on lung cancer in Thai population”
研究方法
String
“proportional hazard models”
研究进展
String
“suggests coffee consumption may be a protective factor for lung cancer”
Table 1 关键短语抽取结果示例
待聚类短语数量
距离阈值
<250
2.0
250~350
2.3
350~450
2.5
>450
2.8
Table 2 聚类参数设置
Fig.3 类簇标签生成方法示意图
Fig.4 综合结果的结构化骨架(部分)
Fig.5 文献的要点与看点展示
Fig.6 综合结果溯源与循证示意图
[1]
Glass G V. Primary, Secondary, and Meta-Analysis of Research[J]. Educational Researcher, 1976, 5(10): 3-8.
[2]
Noblit G W, Hare R D. Meta-Ethnography: Synthesizing Qulitative Studies[M]. London: SAGE Publications, 1988.
[3]
Popay J, Roberts H, Sowden A, et al. Guidance on the Conduct of Narrative Synthesis in Systematic Reviews: A Product from the ESRC Methods Programme[EB/OL]. (2006-04) [2022-12-01]. https://www.lancaster.ac.uk/media/lancaster-university/content-assets/documents/fhm/dhr/chir/NSsynthesisguidanceVersion1-April2006.pdf.
[4]
Noyes J, Booth A, Cargo M, et al. Qualitative Evidence[A]//Higgins J P T, Thomas J, Chandler J, et al. Cochrane Handbook for Systematic Reviews of Interventions[M]. The 2nd Edition. Chicester: Wiley-Blackwell, 2019: 525-545.
[5]
Dixon-Woods M, Cavers D, Agarwal S, et al. Conducting a Critical Interpretive Synthesis of the Literature on Access to Healthcare by Vulnerable Groups[J]. BMC Medical Research Methodology, 2006, 6: Article No.35.
[6]
Zimmer L. Qualitative Meta-Synthesis: A Question of Dialoguing with Texts[J]. Journal of Advanced Nursing, 2006, 53(3): 311-318.
doi: 10.1111/j.1365-2648.2006.03721.x
pmid: 16441536
[7]
Nanba H, Kando N, Okumura M. Classification of Research Papers Using Citation Links and Citation Types: Towards Automatic Review Article Generation[J]. Advances in Classification Research Online, 2011, 11(1): 117-134
[8]
Tohalino J V, Amancio D R. Extractive Multi-Document Summarization Using Multilayer Networks[J]. Physica A: Statistical Mechanics and Its Applications, 2018, 503: 526-539.
doi: 10.1016/j.physa.2018.03.013
[9]
Lamsiyah S, El Mahdaouy A, Espinasse B, et al. An Unsupervised Method for Extractive Multi-Document Summarization Based on Centroid Approach and Sentence Embeddings[J]. Expert Systems with Applications, 2021, 167: 114152.
doi: 10.1016/j.eswa.2020.114152
[10]
Agarwal R, Chatterjee N. Improvements in Multi-Document Abstractive Summarization Using Multi Sentence Compression with Word Graph and Node Alignment[J]. Expert Systems with Applications, 2022, 190: 116154.
doi: 10.1016/j.eswa.2021.116154
[11]
Teufel S, Moens M. Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status[J]. Computational Linguistics, 2002, 28(4): 409-445.
doi: 10.1162/089120102762671936
[12]
Swales J M. Genre Analysis: English in Academic and Research Settings[M]. Cambridge: Cambridge University Press, 1990.
(Ma Hao, Cui Yunpeng. Research on the Construction of Model for Automatic Review of Scientific Literatures Based on Hybrid Deep Learning Model[J]. Information Studies: Theory & Application, 2021, 44(9): 176-182, 168.)
doi: 10.16353/j.cnki.1000-7490.2021.09.025
[14]
Zhang Z X, Liu H, Ding L P, et al. Moves Recognition in Abstract of Research Paper Based on Deep Learning[C]// Proceedings of the 18th Joint Conference on Digital Libraries (JCDL). New York: ACM, 2019: 390-391.
[15]
Yu G H, Zhang Z X, Liu H, et al. Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts[J]. Journal of Data and Information Science, 2019, 4(4): 42-55.
doi: 10.2478/jdis-2019-0020
(Zhang Zhixiong, Liu Huan, Yu Gaihong. Building an Artificial Intelligence Engine Based on Scientific and Technological Literature Knowledge[J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 17-31.)
doi: 10.13998/j.cnki.issn1002-1248.20-0797
[17]
Bird S, Klein E, Loper E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit[M]. Sebastopol: O'Reilly Media Inc, 2009.
[18]
Honnibal M, Montani I. spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing[J]. To Appear, 2017, 7(1): 411-420.
[19]
Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks[OL]. arXiv Preprint, arXiv:1908.10084.