|
|
Interdisciplinary Measurement Based on Automatic Classification of Text Content |
Lv Qi1(),Shangguan Yanhong1,Zhang Lin2,3,4,Huang Ying2,3,4() |
School of Management and Economics, North China University of Water Resources and Electric Power, Zhengzhou 450046, China 2School of Information Management, Wuhan University, Wuhan 430072, China 3Center for Science, Technology & Education Assessment (CSTEA), Wuhan University, Wuhan 430072, China 4Department of MSI & ECOOM, KU Leuven, Leuven B-3000, Belgium |
|
|
Abstract [Objective] This paper identifies the literature subjects according to their contents, aiming to meet the needs of interdisciplinary measurement based on the discipline classification of a single paper. [Methods] With the help of the Leuven-Budapest subject classification system, we used machine learning, deep learning, and pre-training language models to classify abstracts from 15 primary disciplines. Then, we used the improved SCIBERT model to conduct interdisciplinary measurement analysis. [Results] The improved SCIBERT model had the best automatic classification performance, with an average F1 score of 81.45%. Some individual categories achieved a classification performance of over 90%. The highest interdisciplinary degree among the 15 primary disciplines was 0.38 for biomedical research, while the lowest was 0.08 for physics. [Limitations] This paper measures the interdisciplinary from the perspective of text content and does not consider multi-dimensional methods for interdisciplinary measurement. [Conclusions] The pre-training model performs best in automatically classifying journal articles, followed by deep learning models. In contrast, machine learning models had the worst performance. Using automatic classification for interdisciplinary measurement based on literature content expanded the current research system and is helpful for a multi-angle and deep understanding of interdisciplinary research.
|
Received: 03 July 2022
Published: 09 November 2022
|
|
Fund:National Science Foundation of China(72004169);Humanities and Social Sciences Research in Henan Universities in 2023(2023-ZZJH-176) |
Corresponding Authors:
Huang Ying,ORCID:0000-0003-0115-4581,E-mail:ying.huang@whu.edu.cn
|
[1] |
杨良斌, 周秋菊, 金碧辉. 基于文献计量的跨学科测度及实证研究[J]. 图书情报工作, 2009, 53(10): 87-90, 115.
|
[1] |
(Yang Liangbin, Zhou Qiuju, Jin Bihui. The Interdisciplinary Measure and Empirical Research Based on Bibliometrics[J]. Library and Information Service, 2009, 53(10): 87-90, 115.)
|
[2] |
杨辰毓妍, 范少萍, 蔡荣, 等. 医学领域学科交叉性和论文影响力关系及其测度模型构建[J]. 中华医学图书情报杂志, 2020, 29(11): 24-30.
|
[2] |
(Yang Chenyuyan, Fan Shaoping, Cai Rong, et al. Relationship Between Interdisciplinarity and Impact of Papers in Medical Field and Establishment of Its Measurement Model[J]. Chinese Journal of Medical Library and Information Science, 2020, 29(11): 24-30.)
|
[3] |
曾粤亮, 司莉. 跨学科科研合作:背景、理论研究与实践进展[J]. 图书情报工作, 2021, 65(10): 127-140.
doi: 10.13266/j.issn.0252-3116.2021.10.013
|
[3] |
(Zeng Yueliang, Si Li. Interdisciplinary Research Collaboration: Background, Theoretical Research and Practice Progress[J]. Library and Information Service, 2021, 65(10): 127-140.)
doi: 10.13266/j.issn.0252-3116.2021.10.013
|
[4] |
张雪, 张志强. 学科交叉研究系统综述[J]. 图书情报工作, 2020, 64(14): 112-125.
doi: 10.13266/j.issn.0252-3116.2020.14.012
|
[4] |
(Zhang Xue, Zhang Zhiqiang. Review on Interdisciplinary Research[J]. Library and Information Service, 2020, 64(14): 112-125.)
doi: 10.13266/j.issn.0252-3116.2020.14.012
|
[5] |
王洪, 贾惠波, 徐端颐. 基于人工标引的中文学术期刊文献自动分类算法[J]. 清华大学学报(自然科学版), 2002, 42(6): 787-790.
|
[5] |
(Wang Hong, Jia Huibo, Xu Duanyi. Literature Automatic Categorization of Chinese Academic Journals Based on the Manual Labeling[J]. Journal of Tsinghua University(Science and Technology), 2002, 42(6): 787-790.)
|
[6] |
王昊鹏, 王卫东, 李森. 基于元数据的科技论文分类方法[J]. 山东师范大学学报(自然科学版), 2008, 23(3): 41-43.
|
[6] |
(Wang Haopeng, Wang Weidong, Li Sen. A Methods Based on Metadata for Technical Literature Categorization[J]. Journal of Shandong Normal University(Natural Science), 2008, 23(3): 41-43.)
|
[7] |
王昊, 叶鹏, 邓三鸿. 机器学习在中文期刊论文自动分类研究中的应用[J]. 现代图书情报技术, 2014(3): 80-87.
|
[7] |
(Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. New Technology of Library and Information Service, 2014(3): 80-87.)
|
[8] |
郭利敏. 基于卷积神经网络的文献自动分类研究[J]. 图书与情报, 2017(6): 96-103.
|
[8] |
(Guo Limin. Study of Automatic Classification of Literature Based on Convolution Neural Network[J]. Library & Information, 2017(6): 96-103.)
|
[9] |
薛峰, 胡越, 夏帅, 等. 基于论文标题和摘要的短文本分类研究[J]. 合肥工业大学学报(自然科学版), 2018, 41(10): 1343-1349.
|
[9] |
Xue Feng, Hu Yue, Xia Shuai, et al. Research on Short Text Classification Based on Paper Title and Abstract[J]. Journal of Hefei University of Technology(Natural Science), 2018, 41(10): 1343-1349.)
|
[10] |
Hu J M, Zhang Y. Measuring the Interdisciplinarity of Big Data Research: A Longitudinal Study[J]. Online Information Review, 2018, 42(5): 681-696.
doi: 10.1108/OIR-12-2016-0361
|
[11] |
Porter A L, Cohen A S, Roessner J D, et al. Measuring Researcher Interdisciplinarity[J]. Scientometrics, 2007, 72(1): 117-147.
doi: 10.1007/s11192-007-1700-5
|
[12] |
Rafols I, Meyer M. Diversity and Network Coherence as Indicators of Interdisciplinarity: Case Studies in Bionanoscience[J]. Scientometrics, 2010, 82(2): 263-287.
doi: 10.1007/s11192-009-0041-y
|
[13] |
Stirling A. A General Framework for Analysing Diversity in Science, Technology and Society[J]. Journal of the Royal Society, Interface, 2007, 4(15): 707-719.
pmid: 17327202
|
[14] |
Porter A L, Chubin D E. An Indicator of Cross-Disciplinary Research[J]. Scientometrics, 1985, 8(3): 161-176.
doi: 10.1007/BF02016934
|
[15] |
Bromham L, Dinnage R, Hua X. Interdisciplinary Research Has Consistently Lower Funding Success[J]. Nature, 2016, 534(7609): 684-687.
doi: 10.1038/nature18315
|
[16] |
Zhang L, Rousseau R, Glänzel W. Diversity of References as an Indicator of the Interdisciplinarity of Journals: Taking Similarity Between Subject Fields into Account[J]. Journal of the Association for Information Science and Technology, 2016, 67(5): 1257-1265.
doi: 10.1002/asi.2016.67.issue-5
|
[17] |
del Carmen Calatrava Moreno M, Auzinger T, Werthner H. On the Uncertainty of Interdisciplinarity Measurements Due to Incomplete Bibliographic Data[J]. Scientometrics, 2016, 107(1): 213-232.
doi: 10.1007/s11192-016-1842-4
|
[18] |
Leydesdorff L, Wagner C S, Bornmann L. Interdisciplinarity as Diversity in Citation Patterns among Journals: Rao-Stirling Diversity, Relative Variety, and the Gini Coefficient[J]. Journal of Informetrics, 2019, 13(1): 255-269.
doi: 10.1016/j.joi.2018.12.006
|
[19] |
黄颖, 张琳, 孙蓓蓓, 等. 跨学科的三维测度——外部知识融合、内在知识会聚与科学合作模式[J]. 科学学研究, 2019, 37(1): 25-35.
|
[19] |
(Huang Ying, Zhang Lin, Sun Beibei, et al. Interdisciplinarity Measurement: External Knowledge Integration,Internal Information Convergence and Research Activity Pattern[J]. Studies in Science of Science, 2019, 37(1): 25-35.)
|
[20] |
Huang L, Cai Y J, Zhao E D, et al. Measuring the Interdisciplinarity of Information and Library Science Interactions Using Citation Analysis and Semantic Analysis[J]. Scientometrics, 2022, 127(11): 6733-6761.
doi: 10.1007/s11192-022-04401-x
|
[21] |
Zhang L, Sun B B, Chinchilla-Rodríguez Z, et al. Interdisciplinarity and Collaboration: On the Relationship between Disciplinary Diversity in Departmental Affiliations and Reference Lists[J]. Scientometrics, 2018, 117(1): 271-291.
doi: 10.1007/s11192-018-2853-0
|
[22] |
Xu H Y, Guo T, Yue Z H, et al. Interdisciplinary Topics of Information Science: A Study Based on the Terms Interdisciplinarity Index Series[J]. Scientometrics, 2016, 106(2): 583-601.
doi: 10.1007/s11192-015-1792-2
|
[23] |
华秀丽, 徐凡, 王中卿, 等. 细粒度科技论文摘要句子分类方法[J]. 计算机工程, 2012, 38(14): 138-140.
|
[23] |
(Hua Xiuli, Xu Fan, Wang Zhongqing, et al. Fine-Grained Classification Method for Abstract Sentence of Scientific Paper[J]. Computer Engineering, 2012, 38(14): 138-140.)
|
[24] |
白小明, 邱桃荣. 基于SVM和KNN算法的科技文献自动分类研究[J]. 微计算机信息, 2006, 22(36): 275-276, 65.
|
[24] |
(Bai Xiaoming, Qiu Taorong. Science and Technology Text Auto Sort Study Base of SVM and KNN Algorithm[J]. Microcomputer Information, 2006, 22(36): 275-276, 65.)
|
[25] |
Zhang M L, Zhou Z H. ML-KNN: A Lazy Learning Approach to Multi-label Learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048.
doi: 10.1016/j.patcog.2006.12.019
|
[26] |
Eckle-Kohler J, Nghiem T D, Gurevych I. Automatically Assigning Research Methods to Journal Articles in the Domain of Social Sciences[J]. Proceedings of the American Society for Information Science and Technology, 2013, 50(1): 1-8.
|
[27] |
曾立梅. 基于文本数据挖掘的硕士论文分类技术[J]. 重庆邮电大学学报(自然科学版), 2010, 22(5): 669-672, 682.
|
[27] |
Zeng Limei. Categorization of Master Thesis Based on Text Data Mining[J]. Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition), 2010, 22(5):669-672, 682.)
|
[28] |
Kim Y. Convolutional Neural Networks for Sentence Classification[OL]. arXiv Preprint, arXiv: 1408.5882.
|
[29] |
孔洁. 基于深度学习与《中国图书馆分类法》的文献自动分类系统研究[J]. 新世纪图书馆, 2021(5): 51-56.
|
[29] |
(Kong Jie. Research on Automatic Literature Classification System Based on Deep Learning and Chinese Library Classification[J]. New Century Library, 2021(5): 51-56.)
|
[30] |
Devlin J, Chang M, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv:1810.04805.
|
[31] |
赵旸, 张智雄, 刘欢. 基于层次分类法的中文医学文献分类研究[J]. 图书馆学研究, 2021(21): 49-55, 61.
|
[31] |
(Zhao Yang, Zhang Zhixiong, Liu Huan. A Research on Automatic Classification of Chinese Medical Literature Based on Hierarchical Classification[J]. Research on Library Science, 2021(21): 49-55, 61.)
|
[32] |
欧石燕, 陈嘉文. 科学论文全文语步自动识别研究[J]. 现代情报, 2021, 41(11): 3-11.
doi: 10.3969/j.issn.1008-0821.2021.11.001
|
[32] |
(Ou Shiyan, Chen Jiawen. The Research on Automatic Recognition of Moves in Full-Text Scientific Papers[J]. Journal of Modern Information, 2021, 41(11): 3-11.)
doi: 10.3969/j.issn.1008-0821.2021.11.001
|
[33] |
王末, 崔运鹏, 陈丽, 等. 基于深度学习的学术论文语步结构分类方法研究[J]. 数据分析与知识发现, 2020, 4(6): 60-68.
|
[33] |
(Wang Mo, Cui Yunpeng, Chen Li, et al. A Deep Learning-Based Method of Argumentative Zoning for Research Articles[J]. Data Analysis and Knowledge Discovery, 2020, 4(6): 60-68.)
|
[34] |
Bu Y, Li M Y, Gu W Y, et al. Topic Diversity: A Discipline Scheme-Free Diversity Measurement for Journals[J]. Journal of the Association for Information Science and Technology, 2021, 72(5): 523-539.
doi: 10.1002/asi.v72.5
|
[35] |
刘浏, 王东波. 基于论文自动分类的社科类学科跨学科性研究[J]. 数据分析与知识发现, 2018, 2(3): 30-38.
|
[35] |
(Liu Liu, Wang Dongbo. Identifying Interdisciplinary Social Science Research Based on Article Classification[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 30-38.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|