|
|
Review of Detection Methods for Scientific Data Citations |
Zhou Jiayin,Qian Qing,Tang Mingkun,Wu Sizhu() |
Institute of Medical Information, Chinese Academy of Medical Sciences/Beijing Union Medical College, Beijing 100020, China |
|
|
Abstract [Objective] This paper analyzes the characteristics of the existing data citation practices and summarizes their recognition methods. It also explores current research and future development trends. [Methods] The existing data citation detection methods could be divided into three categories: rule-based recognition, supervised machine learning algorithm, and semi-supervised machine learning algorithm. We also reviewed each method’s principles, characteristics, existing problems, performance, and applications of each method. [Results] The existing technologies are concentrated on supervised machine learning algorithms. Detecting data citation with the help of citing behaviors and extracting data citation elements are the future direction. [Limitations] This paper summarizes the characteristics of data citations and existing recognition algorithms. It did not elaborate on the technical details of these algorithms. [Conclusions] There are still some problems in detecting data citation, such as research field limitations, lack of diversity in methods, and insufficient consideration of data citation characteristics, which need further optimization.
|
Received: 27 June 2022
Published: 09 August 2023
|
|
Fund:Medical and Health Science and Technology Innovation Project of Chinese Academy of Medical Sciences(2021-I2M-1-057) |
Corresponding Authors:
Wu Sizhu, ORCID: 0000-0003-4540-9910, E-mail: wu.sizhu@imicams.ac.cn。
|
[1] |
UNESCO. UNESCO Recommendation on Open Science[EB/OL]. [2022-06-01]. https://unesdoc.unesco.org/ark:/48223/pf0000379949.
|
[2] |
孔丽华, 习妍, 姜璐璐. 科技期刊关联数据开放共享及出版政策研究[J]. 中国科技期刊研究, 2022, 33(2): 192-199.
doi: 10.11946/cjstp.202106300526
|
[2] |
(Kong Lihua, Xi Yan, Jiang Lulu. Open Sharing and Publishing Policies for Research Data of Scientific Journals[J]. Chinese Journal of Scientific and Technical Periodicals, 2022, 33(2): 192-199.)
doi: 10.11946/cjstp.202106300526
|
[3] |
Springer Nature. Research Data Policies[EB/OL]. [2022-05-31]. https://www.springernature.com/gp/authors.
|
[4] |
Parsons M A, Duerr R E, Jones M B. The History and Future of Data Citation in Practice[J]. Data Science Journal, 2019, 18(1): 52.
doi: 10.5334/dsj-2019-052
|
[5] |
FORCE 11. Joint Declaration of Data Citation Principles[EB/OL]. [2022-09-09]. https://force11.org/info/joint-declaration-of-data-citation-principles-final/.
|
[6] |
Vasilevsky N A, Minnier J, Haendel M A, et al. Reproducible and Reusable Research: Are Journal Data Sharing Policies Meeting the Mark?[J]. PeerJ, 2017, 5: e3208.
doi: 10.7717/peerj.3208
|
[7] |
邱均平, 肖博轩, 徐中阳, 等. 国内外图书情报领域数据引用特征的多维度分析[J]. 情报理论与实践, 2022, 45(9): 44-50.
doi: 10.16353/j.cnki.1000-7490.2022.09.007
|
[7] |
(Qiu Junping, Xiao Boxuan, Xu Zhongyang, et al. Multi-Dimensional Analysis of Data Citation in the Field of Library and Information Science at Home and Abroad[J]. Information Studies: Theory & Application, 2022, 45(9): 44-50.)
doi: 10.16353/j.cnki.1000-7490.2022.09.007
|
[8] |
USGS. Data Citation[EB/OL]. [2022-06-01]. https://www.usgs.gov/data-management/data-citation.
|
[9] |
Springer Nature. Data Available Statement[EB/OL]. [2022-09-09]. https://www.springernature.com/gp/authors/research-data-policy/data-availability-statements/12330880.
|
[10] |
Web of Science. Recommended Practices to Promote Scholarly Data Citation and Tracking[EB/OL]. [2022-06-13]. https://clarivate.com/webofsciencegroup/wp-content/uploads/sites/2/2019/08/Crv_WOS_Whitepaper_DCI_web.pdf.
|
[11] |
DataCite. Why is It So Important to Cite Data?[EB/OL]. [2022-06-13]. https://datacite.org/cite-your-data.html.
|
[12] |
Humanities and Social Science Communications. Availability of Materials and Data[EB/OL]. [2022-09-13]. https://www.nature.com/palcomms/journal-policies/editorial-and-publishing-policies#Availability%20of%20materials%20and%20data.
|
[13] |
Scientific Data. Data Policies[EB/OL]. [2022-09-13]. https://www.nature.com/sdata/policies/data-policies.
|
[14] |
SAGE Journals. Submit Paper[EB/OL]. [2022-09-09]. https://journals.sagepub.com/author-instructions/TCT#ResearchData.
|
[15] |
Annals of Medicine. Instructions for Authors[EB/OL]. [2022-09-09]. https://www.tandfonline.com/action/authorSubmission?show=instructions&journalCode=iann20#dsp.
|
[16] |
Med. Information for Authors[EB/OL]. [2023-04-15]. https://www.cell.com/med/authors.
|
[17] |
Wiley’s Data Citation Policy[EB/OL].[2023-04-15]. https://authorservices.wiley.com/author-resources/Journal-Authors/open-access/data-sharing-citation/data-citation-policy.html.
|
[18] |
GB/T 35294—2017, 信息技术科学数据引用[S]. 北京: 中国质检出版社, 2017.
|
[18] |
(GB/T 35294—2017, Information Technology—Scientific Data Citation[S]. Beijing: China Quality Inspection Press, 2017.)
|
[19] |
杨宁, 张志强. 结合计量分析和内容分析的科学数据集使用特征研究[J]. 图书情报工作, 2022, 66(10): 122-130.
doi: 10.13266/j.issn.0252-3116.2022.010.011
|
[19] |
(Yang Ning, Zhang Zhiqiang. Research on the Use Characteristics of Scientific Datasets Combined with Quantitative Analysis and Content Analysis[J]. Library and Information Service, 2022, 66(10): 122-130.)
doi: 10.13266/j.issn.0252-3116.2022.010.011
|
[20] |
孙玉伟, 成颖, 谢娟. 科研人员数据复用行为研究: 系统综述与元综合[J]. 中国图书馆学报, 2019, 45(3): 110-130.
|
[20] |
(Sun Yuwei, Cheng Ying, Xie Juan. A Review on the Data Reuse Behavior of Scholars: System Review and Meta Synthesis[J]. Journal of Library Science in China, 2019, 45(3): 110-130.)
|
[21] |
宰冰欣. 科研数据共享中的数据安全规范研究——以澳大利亚高校科研数据共享政策为例[J]. 新世纪图书馆, 2022(1): 61-68.
|
[21] |
(Zai Bingxin. Research on Research Data Security During the Process of Data Sharing: A Case Study of University Research Data Sharing Policy in Australia[J]. New Century Library, 2022(1): 61-68.)
|
[22] |
Grechkin M, Poon H, Howe B. Wide-Open: Accelerating Public Data Release by Automating Detection of Overdue Datasets[J]. PLoS Biology, 2017, 15(6): e2002477.
doi: 10.1371/journal.pbio.2002477
|
[23] |
Goldstein J C, Mayernik M S, Ramapriyan H K. Identifiers for Earth Science Data Sets: Where We Have Been and Where We Need to Go[J]. Data Science Journal, 2017, 16: 23.
doi: 10.5334/dsj-2017-023
|
[24] |
re3data[EB/OL]. [2022-06-14]. https://www.re3data.org/search?subjects%5B%5D=2%20Life%20Sciences.
|
[25] |
Kafkas Ş, Kim J H, Pi X J, et al. Database Citation in Supplementary Data Linked to Europe PubMed Central Full Text Biomedical Articles[J]. Journal of Biomedical Semantics, 2015, 6. DOI: 10.1186/2041-1480-6-1.
doi: 10.1186/2041-1480-6-1
|
[26] |
焦红, 杨波, 周琪. 生物医学领域科学数据集复用特征研究[J]. 情报理论与实践, 2021, 44(9): 90-96.
doi: 10.16353/j.cnki.1000-7490.2021.09.013
|
[26] |
(Jiao Hong, Yang Bo, Zhou Qi. Research on Characteristics of Scientific Datasets Reuse in the Field of Biomedicine[J]. Information Studies: Theory & Application, 2021, 44(9): 90-96.)
doi: 10.16353/j.cnki.1000-7490.2021.09.013
|
[27] |
Womack R P. Research Data in Core Journals in Biology, Chemistry, Mathematics, and Physics[J]. PLoS One, 2015, 10(12): e0143460.
doi: 10.1371/journal.pone.0143460
|
[28] |
Ghavimi B, Mayr P, Lange C, et al. A Semi-Automatic Approach for Detecting Dataset References in Social Science Texts[J]. Information Services & Use, 2017, 36(3-4): 171-187.
|
[29] |
Park H, You S, Wolfram D. Informal Data Citation for Data Sharing and Reuse is More Common than Formal Data Citation in Biomedical Fields[J]. Journal of the Association for Information Science and Technology, 2018, 69(11): 1346-1354.
doi: 10.1002/asi.2018.69.issue-11
|
[30] |
Riedel N, Kip M, Bobrov E. ODDPub—A Text-Mining Algorithm to Detect Data Sharing in Biomedical Publications[J]. Data Science Journal, 2020, 19(1): 42.
doi: 10.5334/dsj-2020-042
|
[31] |
Piwowar H, Chapman W. Identifying Data Sharing in Biomedical Literature[J]. Nature Precedings, 2008. https://doi.org/10.1038/npre.2008.1721.1
|
[32] |
Névéol A, Wilbur W J, Lu Z Y. Extraction of Data Deposition Statements from the Literature: A Method for Automatically Tracking Research Results[J]. Bioinformatics, 2011, 27(23): 3306-3312.
doi: 10.1093/bioinformatics/btr573
pmid: 21998156
|
[33] |
赵佳骏. 科学文献中的数据引用识别研究[D]. 南京: 南京农业大学, 2019.
|
[33] |
(Zhao Jiajun. Research on Data Citation Identification in Scientific Literature[D]. Nanjing: Nanjing Agricultural University, 2019.)
|
[34] |
Colavizza G, Hrynaszkiewicz I, Staden I, et al. The Citation Advantage of Linking Publications to Research Data[J]. PLoS One, 2020, 15(4): e0230416.
doi: 10.1371/journal.pone.0230416
|
[35] |
杨宁, 张志强. 基于机器学习的科学数据正式引用识别方法研究[J]. 情报杂志, 2022, 41(2): 182-189.
|
[35] |
(Yang Ning, Zhang Zhiqiang. Research on the Method of Formal Citation Recognition of Scientific Data Based on Machine Learning[J]. Journal of Intelligence, 2022, 41(2): 182-189.)
|
[36] |
Goodfellow I, Bengio Y, Courville A. Deep Learning[M]. Cambridge: MIT Press, 2016.
|
[37] |
杨宁, 张志强. 融合全文信息的科学数据正式引用识别方法研究[J]. 情报理论与实践, 2022, 45(2): 191-197.
|
[37] |
(Yang Ning, Zhang Zhiqiang. Research on Formal Citation Recognition Method of Scientific Data Fused with Full-Text Information[J]. Information Studies: Theory & Application, 2022, 45(2): 191-197.)
|
[38] |
Hou L L, Zhang J, Wu O, et al. Method and Dataset Entity Mining in Scientific Literature: A CNN + BiLSTM Model with Self-Attention[J]. Knowledge-Based Systems, 2022, 235: 107621.
doi: 10.1016/j.knosys.2021.107621
|
[39] |
Boland K, Ritze D, Eckert K, et al. Identifying References to Datasets in Publications[C]// Proceedings of the 2012 International Conference on Theory and Practice of Digital Libraries. Berlin, Heidelberg: Springer, 2012: 150-161.
|
[40] |
张秋子. 学术文献中数据使用的自动识别——以计算机科学为例[D]. 武汉: 武汉大学, 2017.
|
[40] |
(Zhang Qiuzi. Automatic Data Usage Identification in Scientific Articles— An Example from Computer Science[D]. Wuhan: Wuhan University, 2017.)
|
[41] |
Groth P, Cousijn H, Clark T, et al. FAIR Data Reuse—The Path Through Data Citation[J]. Data Intelligence, 2020, 2(1/2): 78-86.
doi: 10.1162/dint_a_00030
|
[42] |
Smith L M, Kearney T D, Rutherford C, et al. Data Identification, Citation and Tracking Best Practices: A White Paper from the Observatory Best Practices/Lessons Learned Series[R]. Washington, DC, Consortium for Ocean Leadership, 2019. DOI: http://dx.doi.org/10.25607/OBP-505.
doi: http://dx.doi.org/10.25607/OBP-505
|
[43] |
史雅莉. 科学数据引用标准实施的关键问题探析[J]. 现代情报, 2019, 39(4): 34-41.
doi: 10.3969/j.issn.1008-0821.2019.04.004
|
[43] |
(Shi Yali. Analysis on the Key Issues in the Implementation of Scientific Data Citation Standards[J]. Journal of Modern Information, 2019, 39(4): 34-41.)
doi: 10.3969/j.issn.1008-0821.2019.04.004
|
[44] |
Force M M, Robinson N J. Encouraging Data Citation and Discovery with the Data Citation Index[J]. Journal of Computer-Aided Molecular Design, 2014, 28(10): 1043-1048.
doi: 10.1007/s10822-014-9768-5
pmid: 24980647
|
[45] |
Robinson-García N, Jiménez-Contreras E, Torres-Salinas D. Analyzing Data Citation Practices Using the Data Citation Index[J]. Journal of the Association for Information Science and Technology, 2016, 67(12): 2964-2975.
doi: 10.1002/asi.2016.67.issue-12
|
[46] |
Clarivate. Data Citation Index[EB/OL]. [2022-06-16]. https://clarivate.com/webofsciencegroup/solutions/webofscience-data-citation-index/.
|
[47] |
Buneman P, Dosso D, Lissandrini M, et al. Data Citation and the Citation Graph[J]. Quantitative Science Studies, 2021, 2(4): 1399-1422.
doi: 10.1162/qss_a_00166
|
[48] |
涂志芳, 刘兹恒. 国内外科学数据管理服务评价研究与实践进展[J]. 图书馆建设, 2021(2): 108-117.
|
[48] |
(Tu Zhifang, Liu Ziheng. Advances in Evaluation of Research Data Management Services at Home and Abroad: Research and Practice[J]. Library Development, 2021(2): 108-117.)
|
[49] |
Digital Science, Fane B, Ayris P, et al. The State of Open Data Report 2019[R/OL]. [2019-10-24]. https://doi.org/10.6084/m9.figshare.9980783.v2.
|
[50] |
Vines T H, Andrew R L, Bock D G, et al. Mandated Data Archiving Greatly Improves Access to Research Data[J]. The FASEB Journal, 2013, 27(4): 1304-1308.
doi: 10.1096/fsb2.v27.4
|
[51] |
Digital Science, Simons N, Goodey G, et al. The State of Open Data 2021[R/OL]. [2021-11-30]. https://doi.org/10.6084/m9.figshare.17061347.v1.
|
[52] |
Cho J. Study About Research Data Citation Based on DCI (Data Citation Index)[J]. Journal of the Korean Society for Library and Information Science, 2016, 50(1): 189-207.
doi: 10.4275/KSLIS.2016.50.1.189
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|