Please wait a minute...
Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (2): 65-73    DOI: 10.11925/infotech.2096-3467.2022.1330
Current Issue | Archive | Adv Search |
Constructing Automatic Structured Synthesis Tool for Sci-Tech Literature Based on Move Recognition
Liu Yi1,Zhang Zhixiong1,2(),Wang Yufei1,2,Li Xuesi1,2
1National Science Library, Chinese Academy of Sciences, Beijing 10090, China
2Department of Information Resources Management, School of Economic and Management, University of Chinese Academy of Sciences, Beijing 10090, China
Download: PDF (1769 KB)   HTML ( 4
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper utilizes AI technology to construct an automatic structured synthesis tool, which organizes the sci-tech research frameworks structurally and reveals their main points. [Methods] The new tool was developed based on move recognition. First, we identified the research questions, methodology, and progress keywords to extract the most important knowledge points from each literature. Then, we employed hierarchical clustering and cluster label generation methods to synthesize the knowledge. Third, we designed a tree structure for the synthesis outputs. [Results] The proposed tool could automatically synthesize the literature contents and reveal their framework with a “research question, methodology, and progress” tree structure. [Limitations] Insufficient clustering accuracy and difficulty determining cluster numbers reduce our model's synthesis performance. [Conclusions] The synthesis tool based on move recognition could automatically retrieve structured literature contents.

Key wordsScientific and Technological Literature      Move Recognition      Automatic Structured Synthesis      Phrase Extraction      Hierarchical Clustering      Label Generation     
Received: 14 November 2022      Published: 28 April 2023
ZTFLH:  TP391  
  G35  
Fund:Special Research Assistant Program of Chinese Academy of Sciences(E1290905);National Science and Technology Library and Literature Center (NSTL) Project(2022XM28)
Corresponding Authors: Zhang Zhixiong,ORCID:0000-0003-1596-7487,E-mail:zhangzx@mail.las.ac.cn。   

Cite this article:

Liu Yi, Zhang Zhixiong, Wang Yufei, Li Xuesi. Constructing Automatic Structured Synthesis Tool for Sci-Tech Literature Based on Move Recognition. Data Analysis and Knowledge Discovery, 2024, 8(2): 65-73.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.1330     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I2/65

General Technique Framework of Automatic Structured Synthesis of Scientific & Technological Literature
Extracting of Research Question, Method and Progress Phrase Utilizing Abstract Information
字段 数据类型 示例
研究问题 String “coffee consumption on lung cancer in Thai population”
研究方法 String “proportional hazard models”
研究进展 String “suggests coffee consumption may be a protective factor for lung cancer”
Example of Key Phrase Extraction Results
待聚类短语数量 距离阈值
<250 2.0
250~350 2.3
350~450 2.5
>450 2.8
Cluster Parameter Settings
Diagram of Cluster Label Generation Method
The Structure Skeleton of Synthesis Result
The Key Points and Highlights of Synthesis Result
The Evidence of Synthesis Result
[1] Glass G V. Primary, Secondary, and Meta-Analysis of Research[J]. Educational Researcher, 1976, 5(10): 3-8.
[2] Noblit G W, Hare R D. Meta-Ethnography: Synthesizing Qulitative Studies[M]. London: SAGE Publications, 1988.
[3] Popay J, Roberts H, Sowden A, et al. Guidance on the Conduct of Narrative Synthesis in Systematic Reviews: A Product from the ESRC Methods Programme[EB/OL]. (2006-04) [2022-12-01]. https://www.lancaster.ac.uk/media/lancaster-university/content-assets/documents/fhm/dhr/chir/NSsynthesisguidanceVersion1-April2006.pdf.
[4] Noyes J, Booth A, Cargo M, et al. Qualitative Evidence[A]//Higgins J P T, Thomas J, Chandler J, et al. Cochrane Handbook for Systematic Reviews of Interventions[M]. The 2nd Edition. Chicester: Wiley-Blackwell, 2019: 525-545.
[5] Dixon-Woods M, Cavers D, Agarwal S, et al. Conducting a Critical Interpretive Synthesis of the Literature on Access to Healthcare by Vulnerable Groups[J]. BMC Medical Research Methodology, 2006, 6: Article No.35.
[6] Zimmer L. Qualitative Meta-Synthesis: A Question of Dialoguing with Texts[J]. Journal of Advanced Nursing, 2006, 53(3): 311-318.
doi: 10.1111/j.1365-2648.2006.03721.x pmid: 16441536
[7] Nanba H, Kando N, Okumura M. Classification of Research Papers Using Citation Links and Citation Types: Towards Automatic Review Article Generation[J]. Advances in Classification Research Online, 2011, 11(1): 117-134
[8] Tohalino J V, Amancio D R. Extractive Multi-Document Summarization Using Multilayer Networks[J]. Physica A: Statistical Mechanics and Its Applications, 2018, 503: 526-539.
doi: 10.1016/j.physa.2018.03.013
[9] Lamsiyah S, El Mahdaouy A, Espinasse B, et al. An Unsupervised Method for Extractive Multi-Document Summarization Based on Centroid Approach and Sentence Embeddings[J]. Expert Systems with Applications, 2021, 167: 114152.
doi: 10.1016/j.eswa.2020.114152
[10] Agarwal R, Chatterjee N. Improvements in Multi-Document Abstractive Summarization Using Multi Sentence Compression with Word Graph and Node Alignment[J]. Expert Systems with Applications, 2022, 190: 116154.
doi: 10.1016/j.eswa.2021.116154
[11] Teufel S, Moens M. Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status[J]. Computational Linguistics, 2002, 28(4): 409-445.
doi: 10.1162/089120102762671936
[12] Swales J M. Genre Analysis: English in Academic and Research Settings[M]. Cambridge: Cambridge University Press, 1990.
[13] 马浩, 崔运鹏. 基于混合深度学习模型的科技文献自动综述模型构建研究[J]. 情报理论与实践, 2021, 44(9): 176-182, 168.
doi: 10.16353/j.cnki.1000-7490.2021.09.025
[13] (Ma Hao, Cui Yunpeng. Research on the Construction of Model for Automatic Review of Scientific Literatures Based on Hybrid Deep Learning Model[J]. Information Studies: Theory & Application, 2021, 44(9): 176-182, 168.)
doi: 10.16353/j.cnki.1000-7490.2021.09.025
[14] Zhang Z X, Liu H, Ding L P, et al. Moves Recognition in Abstract of Research Paper Based on Deep Learning[C]// Proceedings of the 18th Joint Conference on Digital Libraries (JCDL). New York: ACM, 2019: 390-391.
[15] Yu G H, Zhang Z X, Liu H, et al. Masked Sentence Model Based on BERT for Move Recognition in Medical Scientific Abstracts[J]. Journal of Data and Information Science, 2019, 4(4): 42-55.
doi: 10.2478/jdis-2019-0020
[16] 张智雄, 刘欢, 于改红. 构建基于科技文献知识的人工智能引擎[J]. 农业图书情报学报, 2021, 33(1): 17-31.
doi: 10.13998/j.cnki.issn1002-1248.20-0797
[16] (Zhang Zhixiong, Liu Huan, Yu Gaihong. Building an Artificial Intelligence Engine Based on Scientific and Technological Literature Knowledge[J]. Journal of Library and Information Science in Agriculture, 2021, 33(1): 17-31.)
doi: 10.13998/j.cnki.issn1002-1248.20-0797
[17] Bird S, Klein E, Loper E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit[M]. Sebastopol: O'Reilly Media Inc, 2009.
[18] Honnibal M, Montani I. spaCy 2: Natural Language Understanding with Bloom Embeddings, Convolutional Neural Networks and Incremental Parsing[J]. To Appear, 2017, 7(1): 411-420.
[19] Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks[OL]. arXiv Preprint, arXiv:1908.10084.
[1] Du Xinyu, Li Ning. Identifying Moves in Full-Text Chinese Academic Papers[J]. 数据分析与知识发现, 2024, 8(2): 74-83.
[2] Liu Jiangfeng, Feng Yutong, Liu Liu, Shen Si, Wang Dongbo. Structural Recognition of Abstracts of Academic Text Enhanced by Domain Bilingual Data[J]. 数据分析与知识发现, 2023, 7(8): 105-118.
[3] Yu Yan, Wang Li, Zheng Siyu. Patent Keyphrase Extraction Based on Patent Term and Layer Information[J]. 数据分析与知识发现, 2023, 7(6): 99-112.
[4] Xia Tian. Extracting Key-phrases from Chinese Scholarly Papers[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[5] Wei Jiaze,Dong Cheng,He Yanqing,Liu Zhihui,Peng Keyun. Detecting News Topics Based on Equalized Paragraph and Sub-topic Vector[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
[6] Liangping Ding,Zhixiong Zhang,Huan Liu. Factors Affecting Rhetorical Move Recognition with SVM Model[J]. 数据分析与知识发现, 2019, 3(11): 16-23.
[7] Junzhi Jia,Zhuangzhuang Ye. Clustering Wikidata’s Organizational Entities with Latent Semantic Index[J]. 数据分析与知识发现, 2019, 3(10): 56-65.
[8] Ding Shengchun,Gong Silan,Li Hongmei. A New Method to Detect Bursty Events from Micro-blog Posts Based on Bursty Topic Words and Agglomerative Hierarchical Clustering Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 12-20.
[9] Xiao Tianjiu, Liu Ying. Words and N-gram Models Analysis for “A Dream of Red Mansions”[J]. 现代图书情报技术, 2015, 31(4): 50-57.
[10] Zhao Pengwei, Ma Lin, Qin Chunxiu. Formation of Interest-based Peer-to-Peer Community[J]. 现代图书情报技术, 2013, 29(10): 53-58.
[11] Xiao Ming, Li Wenchao, Xia Qiuju. Mapping the Themes of Information Retrieval Based on Prefuse and Hierarchical Clustering[J]. 现代图书情报技术, 2012, 28(4): 35-40.
[12] Zhang Shunrui, You Hongliang. Chinese People Name Disambiguation by Hierarchical Clustering[J]. 现代图书情报技术, 2010, 26(11): 64-68.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn