|
|
A Modified Hybrid Method to Identify Cited Spans |
Nie Weimin,Ou Shiyan() |
School of Information Management, Nanjing University, Nanjing 210023, China |
|
|
Abstract [Objective] This paper proposes a new algorithm to identify the cited contents, aiming to address the issues facing the existing unsupervised models and extend the granularity of single sentence to several adjacent ones. [Methods] First, we established a modified hybrid method with supervised ranking to select candidates from all sentences of the cited literature. Then, we used regression technique to determine the sentences with the cited segments. Third, we used the grouped adjacent sentences of the cited literature, namely n-sent, as inputs to the modified hybrid method. Finally, we conducted the intraclass normalization to identify the cited contents. [Results] The modified hybrid method yielded sentence overlapping F1 value of 0.167 on the test set of CL-SciSumm 2019 and 2020. With 3-sent as input, the modified hybrid method improved the sentence overlapping F1 value from 0.083 to 0.158 after intraclass Z-score normalization. [Limitations] The modified hybrid method did not utilize the sentence positions of the cited literature. In addition, the prospect of applying the proposed method to downstream tasks remains vague. [Conclusions] The proposed method could effectively identify cited segments, of which the granularity ranges from single sentence to multiple adjacent sentences.
|
Received: 26 April 2022
Published: 16 February 2023
|
|
Fund:National Social Science Fund of China(17ATQ001) |
Corresponding Authors:
Ou Shiyan,ORCID:0000-0001-8617-6987,E-mail: oushiyan@nju.edu.cn。
|
[1] |
叶继元. “SCI至上”的要害、根源与破解之道[J]. 情报学报, 2020, 39(8): 787-795.
|
[1] |
( Ye Jiyuan. The Keys, Roots, and Solutions To “SCI Supremacy”[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8): 787-795.)
|
[2] |
国务院办公厅. 关于完善科技成果评价机制的指导意见[EB/OL]. [2022-03-12]. http://www.gov.cn/zhengce/content/2021-08/02/content_5628987.htm.
|
[2] |
( General Office of the State Council. Guidance on Improving the Evaluation Mechanism of Scientific and Technological Achievements[EB/OL]. [2022-03-12]. http://www.gov.cn/zhengce/content/2021-08/02/content_5628987.htm.)
|
[3] |
卢超, 章成志, 王玉琢, 等. 语义特征分析的深化——学术文献的全文计量分析研究综述[J]. 中国图书馆学报, 2021, 47(2): 110-131.
|
[3] |
( Lu Chao, Zhang Chengzhi, Wang Yuzhuo, et al. Strengthened Analyses of Semantic Features: Review of Full-Text Bibliometrics of Academic Documents[J]. Journal of Library Science in China, 2021, 47(2): 110-131.)
|
[4] |
李文文, 陈雅. 国内外Data Curation研究综述[J]. 情报资料工作, 2013(5): 35-38.
|
[4] |
( Li Wenwen, Chen Ya. Summary of Data Curation Research at Home and Abroad[J]. Information and Documentation Services, 2013(5): 35-38.)
|
[5] |
徐健, 李纲, 毛进, 等. 文献被引片段特征分析与识别研究[J]. 数据分析与知识发现, 2017, 1(11): 37-45.
|
[5] |
( Xu Jian, Li Gang, Mao Jin, et al. Recognizing and Analyzing Cited Spans in Literature[J]. Data Analysis and Knowledge Discovery, 2017, 1(11): 37-45.)
|
[6] |
金贤日, 欧石燕. 无监督引用文本自动识别与分析[J]. 数据分析与知识发现, 2021, 5(1): 66-77.
|
[6] |
( Kim Hyonil, Ou Shiyan. Identifying Citation Texts with Unsupervised Method[J]. Data Analysis and Knowledge Discovery, 2021, 5(1): 66-77.)
|
[7] |
Chandrasekaran M K, Yasunaga M, Radev D R, et al. Overview and Results: CL-SciSumm Shared Task 2019[C]// Proceedings of the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 153-166.
|
[8] |
Jaidka K, Chandrasekaran M K, Rustagi S, et al. Overview of the CL-SciSumm 2016 Shared Task[C]// Proceedings of the 2016 Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 2016 Joint Conference on Digital Libraries. 2016: 93-102.
|
[9] |
Li L, Mao L, Zhang Y, et al. CIST System for CL-SciSumm 2016 Shared Task[C]// Proceedings of the 2016 Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 2016 Joint Conference on Digital Libraries. 2016: 156-167.
|
[10] |
La Quatra M, Cagliero L, Baralis E. Poli2Sum@CL-SciSumm-19: Identify, Classify, and Summarize Cited Text Spans by Means of Ensembles of Supervised Models[C]// Proceedings of the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 233-246.
|
[11] |
Ma S, Zhang H, Xu J, et al. NJUST @CLSciSumm-18[C]// Proceedings of the 3rd Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 2018: 114-129.
|
[12] |
Wang P, Li S, Wang T, et al. NUDT @CLSciSumm-18[C]// Proceedings of the 3rd Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 2018: 102-113.
|
[13] |
Nomoto T. NEAL: A Neurally Enhanced Approach to Linking Citation and Reference[C]// Proceedings of the 2016 Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 2016 Joint Conference on Digital Libraries. 2016: 168-174.
|
[14] |
Prasad A. WING-NUS at CL-SciSumm 2017:Learning from Syntactic and Semantic Similarity for Citation Contextualization[C]// Proceedings of the 2017 Computational Linguistics Scientific Summarization Shared Task Organized as a Part of the 2nd Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries and Co-Located, the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017: 26-32.
|
[15] |
Zerva C, Nghiem M Q, Nguyen N T H, et al. NaCTeM-UoM @CL-SciSumm 2019[C]// Proceedings of the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 167-180.
|
[16] |
Chai L, Fu G Z, Ni Y. NLP-PINGAN-TECH @CL-SciSumm 2020[C]// Proceedings of the 1st Workshop on Scholarly Document Processing. 2020: 235-241.
|
[17] |
Alonso H M, Makki R, Gu J. CL-SciSumm Shared Task-Team Magma[C]// Proceedings of the 3rd Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 2018: 172-176.
|
[18] |
Moraes L, Baki S, Verma R, et al. University of Houston at CL-SciSumm 2016: SVMs with Tree Kernels and Sentence Similarity[C]// Proceedings of the 2016 Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 2016 Joint Conference on Digital Libraries. 2016: 113-121.
|
[19] |
Zhang D, Li S. PKU @CLSciSumm-17: Citation Contextualization[C]// Proceedings of the 2017 Computational Linguistics Scientific Summarization Shared Task Organized as a Part of the 2nd Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries and Co-Located, the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017: 86-93.
|
[20] |
Kim H, Ou S. NJU@CL-SciSumm-19[C]// Proceedings of the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 247-255.
|
[21] |
章成志, 徐津, 马舒天. 学术文本被引片段的自动识别研究[J]. 情报理论与实践, 2019, 42(9): 139-145.
|
[21] |
( Zhang Chengzhi, Xu Jin, Ma Shutian. Automatic Identification of Cited Spans in Academic Articles[J]. Information Studies: Theory & Application, 2019, 42(9): 139-145.)
|
[22] |
Jaidka K, Chandrasekaran M K, Elizalde B F, et al. The Computational Linguistics Summarization Pilot Task[C]// Proceedings of the 2014 Text Analysis Conference. 2014: 1-12.
|
[23] |
Cohan A, Soldaini L. Towards Citation-Based Summarization of Biomedical Literature[C]// Proceedings of the 2014 Text Analysis Conference. 2014: 79-87.
|
[24] |
Felber T, Kern R. Graz University of Technology at CL-SciSumm 2017: Query Generation Strategies[C]// Proceedings of the 2017 Computational Linguistics Scientific Summarization Shared Task Organized as a Part of the 2nd Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries and Co-Located, the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017: 67-72.
|
[25] |
Lu K, Mao J, Li G, et al. Recognizing Reference Spans and Classifying Their Discourse Facets[C]// Proceedings of the 2016 Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 2016 Joint Conference on Digital Libraries. 2016: 139-145.
|
[26] |
Cao Z, Li W, Wu D. PolyU at CL-SciSumm 2016[C]// Proceedings of the 2016 Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 2016 Joint Conference on Digital Libraries. 2016: 132-138.
|
[27] |
Klampfl S, Rexha A, Kern R. Identifying Referenced Text in Scientific Publications by Summarisation and Classification Techniques[C]// Proceedings of the 2016 Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 2016 Joint Conference on Digital Libraries. 2016: 122-131.
|
[28] |
Aumiller D, Almasian S, Hausner P, et al. UniHD@CL-SciSumm 2020: Citation Extraction as Search[C]// Proceedings of the 1st Workshop on Scholarly Document Processing. 2020: 261-269.
|
[29] |
Lauscher A, Glavas G, Eckert K. University of Mannheim @CLSciSumm-17: Citation-Based Summarization of Scientific Articles Using Semantic Textual Similarity[C]// Proceedings of the 2017 Computational Linguistics Scientific Summarization Shared Task Organized as a Part of the 2nd Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries and Co-Located, the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017: 33-42.
|
[30] |
Bromley J, Bentz J W, Bottou L, et al. Signature Verification Using a “Siamese” Time Delay Neural Network[J]. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7(4): 669-688.
doi: 10.1142/S0218001493000339
|
[31] |
Moraes L F, Das A, Karimi S, et al. University of Houston @CL-SciSumm 2018[C]// Proceedings of the 3rd Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 2018: 142-149.
|
[32] |
Fergadis A, Pappas D, Papageorgiou H. ATHENA@CL-SciSumm 2019: Siamese Recurrent Bi-Directional Neural Network for Identifying Cited Text Spans[C]// Proceedings of the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 256-262.
|
[33] |
Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019: 3982-3992.
|
[34] |
Mahurkar S, Patil R. LRG at SemEval-2020 Task 7: Assessing the Ability of BERT and Derivative Models to Perform Short-Edits Based Humor Grading[C]// Proceedings of the 14th Workshop on Semantic Evaluation. 2020: 858-864.
|
[35] |
Henderson M, Al-Rfou R, Strope B, et al. Efficient Natural Language Response Suggestion for Smart Reply[OL]. arXiv Preprint, arXiv: 1705.00652.
|
[36] |
Lin C Y. ROUGE: A Package for Automatic Evaluation of Summaries[C]// Proceedings of the 2004 Workshop on Text Summarization Branches Out. 2004: 74-81.
|
[37] |
Kenton D, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2019: 4171-4186.
|
[38] |
Song K, Tan X, Qin T, et al. MPNet: Masked and Permuted Pre-training for Language Understanding[C]// Proceedings of the 2020 Annual Conference on Neural Information Processing Systems. 2020: 16857-16867.
|
[39] |
Wang W, Wei F, Dong L, et al. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers[C]// Proceedings of the 2020 Annual Conference on Neural Information Processing Systems. 2020: 5776-5788.
|
[40] |
Beltagy I, Lo K, Cohan A. SciBERT: A Pretrained Language Model for Scientific Text[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, the 9th International Joint Conference on Natural Language Processing. 2019: 3615-3620.
|
[41] |
Lan Z, Chen M, Goodman S, et al. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations[C]// Proceedings of the 8th International Conference on Learning Representations. 2020: 1-17.
|
[42] |
Liu Y H, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[OL]. arXiv Preprint, arXiv: 1907.11692.
|
[43] |
Umapathy A, Radhakrishnan K, Jain K, et al. CiteQA@CLSciSumm 2020[C]// Proceedings of the 1st Workshop on Scholarly Document Processing. 2020: 297-302.
|
[44] |
Li L, Zhu Y, Xie Y, et al. CIST@CLSciSumm-19: Automatic Scientific Paper Summarization with Citances and Facets[C]// Proceedings of the 4th Joint Workshop on Bibliometric-Enhanced Information Retrieval and Natural Language Processing for Digital Libraries Co-Located, the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2019: 196-207.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|