Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (1): 102-112    DOI: 10.11925/infotech.2096-3467.2022.0358
Current Issue | Archive | Adv Search |
Identifying Interdisciplinary Sci-Tech Literature Based on Multi-Label Classification
Wang Weijun1,2,Ning Zhiyuan1,2,Du Yi1,2(),Zhou Yuanchun1,2
1Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
2University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF (1289 KB)   HTML ( 19
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper tries to identify interdisciplinary sci-tech literature, aiming to find emerging interdisciplinary issues. [Methods] We combined the discipline labels of sci-tech literature provided by specialists with labels predicted by text classification algorithms to find interdisciplinary studies. [Results] The F1 value of the proposed method reached 0.45, which was 0.22 higher than those of the model-based predictions. [Limitations] The model had low recall values for identifying the interdisciplinary sci-tech research. [Conclusions] The paper effectively addresses the classification issues of interdisciplinary sci-tech literature, which merits more studies in the future.

Key wordsDeep Learning      Multi-Label Text Classification      Interdisciplinary Research Recognition      Sci-Tech Literature     
Received: 18 April 2022      Published: 16 February 2023
ZTFLH:  TP393 G250  
Fund:Strategic Priority Research Program of Chinese Academy of Sciences(XDA16021400);National Natural Science Foundation of China(61836013);Youth Innovation Promotion Association,CAS(2021166)
Corresponding Authors: Du Yi,ORCID:0000-0003-3121-8937,E-mail:duyi@cnic.cn。   

Cite this article:

Wang Weijun, Ning Zhiyuan, Du Yi, Zhou Yuanchun. Identifying Interdisciplinary Sci-Tech Literature Based on Multi-Label Classification. Data Analysis and Knowledge Discovery, 2023, 7(1): 102-112.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0358     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I1/102

Tree-Like Hierarchical of Disciplinary Classification System
Text Classification Flowchart
Identification Model for Interdisciplinary Research of Sci-Tech Literature
Encoder模型 Micro-F1 Macro-F1
FastText 0.781 8 0.535 1
TextCNN 0.778 2 0.524 6
TextRNN 0.714 7 0.300 4
TextRCNN 0.780 2 0.587 9
TextDPCNN 0.769 0 0.590 5
TextCRNN 0.757 3 0.485 8
HAN 0.738 9 0.487 5
BERT 0.791 8 0.597 6
Performance of Different Models
Distribution of the Projects Marked as Different Discipline Branches(2011-2018)
Distribution of the Projects Marked as Different Discipline Branches(2019)
参数 参数设置
Encoder模型 BERT
Transformer层数 4
batchsize 64
学习率 5e-5
epochs 100
早停参数 20
预训练模型 chinese_L-12_H-768_A-12
标签判断阈值 0.5
Classification Model Parameter Setting
数据集 Macro-F1 Micro-F1
关键词 0.564 4 0.788 3
关键词+题目 0.582 3 0.786 7
关键词+题目+摘要 0.597 6 0.791 8
Classification Performance on Different Data Sets
方法 分类类别 准确率 召回率 F1
S1 0(431条) 0.90 1 0.95
S1 1( 55条) 1 0.13 0.23
S2 0(431条) 0.92 0.99 0.95
S2 1( 55条) 0.85 0.31 0.45
The Effects of Interdisciplinary Research Identification Methods
序号 题目 关键词 实际学科 预测学科 人工标注
0 社交网络互动中用户“信息窄化”机理分析:基于微博的数据挖掘 网络信息内容; 信息沟; 社交媒体挖掘; 信息窄化; 微博舆情 G0414; G04; G F06; G0414; G04; F 交叉研究
1 面向攻击语境的社交网络危害信息检测方法及其验证与测试研究 验证与测试; 对抗攻击; 危害信息检测; 社交网络信息传播; 人工智能安全 F06; F0608; F F06; G01; F 交叉研究
2 基于多模态深度学习的金融跨市场耦合关系建模及应用研究 危机预警; 数量——文本双模态信息; 金融跨市场耦合关系; 深度学习 G01; G0114; G F06; G01; G0114; G; F 交叉研究
3 面向边计算服务质量优化的博弈论方法研究 nash均衡; pareto最优; 行为模型; 决策论; 移动边计算 F06; F; F0601 F06; G01; G0114; G 交叉研究
Projects Identified as Interdisciplinary Research
序号 题目 关键词 实际学科 预测学科 人工标注
0 人工智能环境下层次化复杂问题决策方法研究 人工智能; 决策方法; 层次化复杂问题; 认知表达; 体系建模 F06; F; F0601 G01; G0114; G 学科交叉
1 云平台中多云用户联合博弈服务机制与策略探索 定价机制; 决策论; 机制设计; nash均衡 F06; F; F0601 G01; G0114; G 学科交叉
2 基于非确定性概率型信息的智能安全决策系统研究 智能决策系统; 不确定性知识表示; 安全博弈; 不确定性推理; 不确定性下的决策 F06; F; F0601 G01; G0114; G 学科交叉
3 基于标签语义挖掘的城市画像计算与应用模型研究 知识发现; 语义挖掘; 社会化标签; 城市画像; 信息融合 G0414; G04; G F0607; F06; F 学科交叉
Projects Identified as Other Discipline
[1] Klein J T. A Conceptual Vocabulary of Interdisciplinary Science[A]//StehrN, WeingartP. Practising Interdisciplinarity[M]. Toronto: University of Toronto Press, 2000: 3-24.
[2] Easton D. The Division, Integration, and Transfer of Knowledge[J]. Bulletin of the American Academy of Arts and Sciences, 1991, 44(4): 8-27.
doi: 10.2307/3824130
[3] 许海云, 董坤, 隗玲. 学科交叉主题识别与预测方法研究[M]. 北京: 科学技术文献出版社, 2019.
[3] ( Xu Haiyun, Dong Kun, Wei Ling. Research on Interdisciplinary Topics Identification and Prediction Methods[M]. Beijing: Scientific and Technical Documents Publishing House, 2019.)
[4] 魏建香. 学科交叉知识发现及其可视化研究[D]. 南京: 南京大学, 2010.
[4] ( Wei Jianxiang. Interdiscipline Knowledge Discovery and Its Visualization Research[D]. Nanjing: Nanjing University, 2010.)
[5] Dong K, Xu H Y, Luo R, et al. An Integrated Method for Interdisciplinary Topic Identification and Prediction: A Case Study on Information Science and Library Science[J]. Scientometrics, 2018, 115(2): 849-868.
doi: 10.1007/s11192-018-2694-x
[6] Ba Z C, Cao Y J, Mao J, et al. A Hierarchical Approach to Analyzing Knowledge Integration Between Two Fields—A Case Study on Medical Informatics and Computer Science[J]. Scientometrics, 2019, 119(3): 1455-1486.
doi: 10.1007/s11192-019-03103-1
[7] 阮光册, 夏磊. 学科间交叉研究主题识别——以图书情报学与教育学为例[J]. 情报科学, 2020, 38(12): 152-157.
[7] ( Ruan Guangce, Xia Lei. Research on Interdisciplinary Topics Identification—A Case Study of Library & Information Science and Education[J]. Information Science, 2020, 38(12): 152-157.)
[8] Deshmukh P R, Borhade B. Support Vector Machine Classifier for Research Discipline Area Selection[C]// Proceedings of the 2017 International Conference on Intelligent Computing and Control Systems. IEEE, 2017: 462-466.
[9] 王昊, 叶鹏, 邓三鸿. 机器学习在中文期刊论文自动分类研究中的应用[J]. 现代图书情报技术, 2014(3): 80-87.
[9] ( Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. New Technology of Library and Information Service, 2014(3): 80-87.)
[10] 刘晓东, 倪浩然. 深度学习技术在学科融合研究中的应用[J]. 数据与计算发展前沿, 2020(5): 99-109.
[10] ( Liu Xiaodong, Ni Haoran. Application of Deep Learning Technology in Discipline Integration Research[J]. Frontiers of Data & Computing, 2020(5): 99-109.)
[11] Xiao M, Qiao Z Y, Fu Y J, et al. Expert Knowledge-Guided Length-Variant Hierarchical Label Generation for Proposal Classification[C]// Proceedings of the 2021 IEEE International Conference on Data Mining. IEEE, 2021: 757-766.
[12] Kowsari K, Brown D E, Heidarysafa M, et al. HDLTex: Hierarchical Deep Learning for Text Classification[C]// Proceedings of the 16th IEEE International Conference on Machine Learning and Applications. IEEE, 2017: 364-371.
[13] Haghighian Roudsari A, Afshar J, Lee W, et al. PatentNet: Multi-Label Classification of Patent Documents Using Deep Learning Based Language Understanding[J]. Scientometrics, 2022, 127(1): 207-231.
doi: 10.1007/s11192-021-04179-4
[14] Xiao M, Qiao Z, Fu Y, et al. Who Should Review Your Proposal? Interdisciplinary Topic Path Detection for Research Proposals[OL]. arXiv Preprint, arXiv: 2203.10922.
[15] 黄学坚, 刘雨飏, 马廷淮. 基于改进型图神经网络的学术论文分类模型[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
[15] ( Huang Xuejian, Liu Yuyang, Ma Tinghuai. Classification Model for Scholarly Articles Based on Improved Graph Neural Network[J]. Data Analysis and Knowledge Discovery, 2022, 6(10): 93-102.)
[16] 刘浏, 王东波. 基于论文自动分类的社科类学科跨学科性研究[J]. 数据分析与知识发现, 2018, 2(3): 30-38.
[16] ( Wang Dongbo. Identifying Interdisciplinary Social Science Research Based on Article Classification[J]. Data Analysis and Knowledge Discovery, 2018, 2(3): 30-38.)
[17] Lyutov A, Uygun Y, Hütt M T. Machine Learning Misclassification of Academic Publications Reveals Non-Trivial Interdependencies of Scientific Disciplines[J]. Scientometrics, 2021, 126(2): 1173-1186.
doi: 10.1007/s11192-020-03789-8
[18] Li Q, Peng H, Li J, et al. A Survey on Text Classification: From Shallow to Deep Learning[OL]. arXiv Preprint, arXiv: 2008.00364.
[19] Yegros-Yegros A, Rafols I, D'Este P. Does Interdisciplinary Research Lead to Higher Citation Impact? The Different Effect of Proximal and Distal Interdisciplinarity[J]. PLoS One, 2015, 10(8): e0135095.
doi: 10.1371/journal.pone.0135095
[20] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification[C]// Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017: 427-431.
[21] Kim Y. Convolutional Neural Networks for Sentence Classification[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1746-1751.
[22] Liu P, Qiu X, Huang X. Recurrent Neural Network for Text Classification with Multi-Task Learning[C]// Proceedings of the 25th International Joint Conference on Artificial Intelligence. 2016.
[23] Lai S, Xu L, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015.
[24] Johnson R, Zhang T. Deep Pyramid Convolutional Neural Networks for Text Categorization[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 562-570.
[25] Zhou C, Sun C, Liu Z, et al. A C-LSTM Neural Network for Text Classification[OL]. arXiv Preprint, arXiv: 1511.08630.
[26] Yang Z, Yang D, Dyer C, et al. Hierarchical Attention Networks for Document Classification[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2016.
[27] Devlin J, Chang M W, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[OL]. arXiv Preprint, arXiv: 1810.04805.
[28] Vaswani A, Shazeer N, Parmar N, et al. Attention is All You Need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017.
[1] Xiao Yuhan, Lin Huiping. Mining Differentiated Demands with Aspect Word Extraction: Case Study of Smartphone Reviews[J]. 数据分析与知识发现, 2023, 7(1): 63-75.
[2] Cheng Quan, She Dexin. Drug Recommendation Based on Graph Neural Network with Patient Signs and Medication Data[J]. 数据分析与知识发现, 2022, 6(9): 113-124.
[3] Wang Lu, Le Xiaoqiu. Research Progress on Citation Analysis of Scientific Papers[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[4] Zheng Xiao, Li Shuqing, Zhang Zhiwang. Measuring User Item Quality with Rating Analysis for Deep Recommendation Model[J]. 数据分析与知识发现, 2022, 6(4): 39-48.
[5] Yu Chuanming, Lin Hongjun, Zhang Zhengang. Joint Extraction Model for Entities and Events with Multi-task Deep Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[6] Zhang Yunqiu, Li Bocheng, Chen Yan. Automatic Classification with Unbalanced Data for Electronic Medical Records[J]. 数据分析与知识发现, 2022, 6(2/3): 233-241.
[7] Zhang Fangcong, Qin Qiuli, Jiang Yong, Zhuang Runtao. Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[8] Hu Yamin, Wu Xiaoyan, Chen Fang. Review of Technology Term Recognition Studies Based on Machine Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 7-17.
[9] Liu Yang, Ma Lili, Zhang Wen, Hu Zhongyi, Wu Jiang. Detecting Sarcasm from Travel Reviews Based on Cross-Modal Deep Learning[J]. 数据分析与知识发现, 2022, 6(12): 23-31.
[10] Cao Lina,Zhang Jian,Chen Jindong,Fan Hui. Comprehensive Quality Profiling for Micro-, Small-, and Medium-sized Enterprises Based on Deep Learning[J]. 数据分析与知识发现, 2022, 6(11): 126-138.
[11] Li Zhi, Sun Rui, Yao Yuxuan, Li Xiaohuan. Recommending Point-of-Interests with Real-Time Event Detection[J]. 数据分析与知识发现, 2022, 6(10): 114-127.
[12] Huang Xuejian, Liu Yuyang, Ma Tinghuai. Classification Model for Scholarly Articles Based on Improved Graph Neural Network[J]. 数据分析与知识发现, 2022, 6(10): 93-102.
[13] Zhou Zeyu,Wang Hao,Zhao Zibo,Li Yueyan,Zhang Xiaoqin. Construction and Application of GCN Model for Text Classification with Associated Information[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[14] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[15] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn