Classification of Academic Papers for Periodical Selection
Wang Xinyun,Wang Hao(),Deng Sanhong,Zhang Baolong
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China;Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
[Objective] We constructed a hierarchical system for papers published by academic journals and proposed submission guidance based on the similarity between articles and journals.[Methods] We studied journals in the field of Library and Information Science and used hierarchical clustering to construct two-layer architecture. Then, we employed SVM, CNN, and RNN to classify these papers. Third, we compared the results of different characteristic combinations, and selected the most suitable algorithm. To optimize the classification results, we combined the journals with similar coverage.[Results] Once the characteristic combinations were more reflective to the article contents, we got the highest accuracy of 81.84%.[Limitations] The data size needs to be expanded.[Conclusions] The deep learning algorithm does a better job in classification than the machine learning algorithm. Combining journals with similar contents improves the classification results.
( Li Jinghua, Zhang Fengying. How to Contribute to Foreign Professional Journals[J]. Chinese Journal of Scientific and Technical Periodicals, 2001,12(3):239-240.)
( Wang Yang, Xu Shanshan, Li Chang, et al. Classification Model Based on Support Vector Machine for Chinese Extremely Short Text[J]. Application Research of Computers, 2020,37(2):347-350.)
( Shen Lili. The Exploration and Practice of Classification Service System of Periodicals in the Republic of China: A Case Study of CNBKSY Database [J]. The Library Journal of Henan, 2017,37(12):117-119, 122.)
( Wang Qiang, Li Yan. Study on the Classification and Evaluation of International Journals of Social Science in Universities[J]. Social Science Journal, 2019(2):165-170.)
( Zhu Juntao, Miao Lei, Hu Xiaohong, et al. Research on the Application of Text Mining in Journal Evaluation[J]. Technological Development of Enterprise, 2018,37(12):125-127.)
[10]
邹金串. 基于文本挖掘的期刊决策参考研究[D]. 厦门:华侨大学, 2018.
[10]
( Zou Jinchuan. Research on Journal Decision-Making Reference Based on Text Mining[D]. Xiamen: Huaqiao University, 2018.)
( Geng Xiaojun. An Automatic Classification Method Based on Semi-Supervised Support Vector Machine for Periodical Manuscript Acceptance System[J]. Modern Electronic Technique, 2018,41(24):174-177.)
[12]
罗静. 网格聚类算法在用电营销中的应用[D]. 北京:华北电力大学, 2012.
[12]
( Luo Jing. Application of Grid Clustering Algorithm in Electric Power Marketing[D]. Beijing: North China Electric Power University, 2012.)
[13]
曹叔彦. CLIQUE网格聚类算法在医学空间数据中的应用[D]. 太原:山西医科大学, 2015.
[13]
( Cao Shuyan. Grid Clustering Algorithm of CLIQUE in the Medical Application of Spatial Data[D]. Taiyuan: Shanxi Medical University, 2015.)
[14]
Ester M, Kriegel H P, Sander J, et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C] //Proceedings of the 2nd International Conference on Knowledge Discovery & Data Mining. 1996: 226-231.
( Chen Lichao, Nie Yueguang, Li Jing, et al. DBSCAN Spatial Clustering Algorithm and Its Application in Highway Alignment Selection[J]. Journal of Computer Applications, 2008,28(S1):324-326.)
( Song Haoyuan. Study on Model-based Clustering Methods[J]. Journal of Chongqing University of Science and Technology: Natural Science Edition, 2008,10(3):71-73.)
[17]
Guha S, Rastogi R, Shim K. CURE: An Efficient Clustering Algorithm for Large Databases[J]. Information Systems, 1998,26(1):35-58.
[18]
Guha S, Rastogi R, Shim K. ROCK: A Robust Clustering Algorithm for Categorical Attributes[J]. Information Systems, 1999,25(5):345-366.
doi: 10.1016/S0306-4379(00)00022-3
[19]
Karypis G, Han E H, Kumar V. Chameleon: Hierarchical Clustering Using Dynamic Modeling[J]. Computer, 1999,32(8):68-75.
[20]
孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008,19(1):48-61.
[20]
( Sun Jigui, Liu Jie, Zhao Lianyu. Clustering Algorithms Research[J]. Journal of Software, 2008,19(1):48-61.)
( Zhang Yajie, Zhang Junling, Yang Yang, et al. Application of Hierarchical Clustering Analysis Method to Land Use Regionalization in Lianzhou[J]. Scientific and Technological Management of Land and Resources, 2007,24(5):71-76.)
( Yan Ying, Wang Yinglong, Yang Yan. Application of Hierarchical Cluster on Land Utilization Division——Take Nan County in Yiyang for Example[J]. Inner Mongolia Agricultural Science and Technology, 2009(5):83-85.)
[23]
MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations[C] //Proceedings of the 5th Berkeley Symposium on Mathematical Statistics & Probability. 1967.
[24]
Huang Z X. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values[J]. Data Mining & Knowledge Discovery, 1998,2(3):283-304.
[25]
Chaturvedi A, Green P E, Caroll J D. K-modes Clustering[J]. Journal of Classification, 2001,18(1):35-55.
doi: 10.1007/s00357-001-0004-3
[26]
Ding C, He X F. K-nearest-neighbor Consistency in Data Clustering: Incorporating Local Information into Global Optimization[C] //Proceedings of the 2004 ACM Symposium on Applied Computing. 2004: 584-589.
( Wang Dongbo, Su Xinning, Zhu Danhao, et al. Research on Automatic Classification of Medical Journal Articles Based on SVM[J]. Information Studies: Theory&Application, 2011,34(4):115-118.)
( Qi Yudong, Ding Haiqiang, Si Weichao, et al. Navy Text Classification Model Based on Improved CNN[J]. Electronics Optics & Control, 2020,27(5):68-73.)
[33]
张学工. 关于统计学习理论与支持向量机[J]. 自动化学报, 2000,26(1):36-46.
[33]
( Zhang Xuegong. Introduction to Statistical Learning Theory and Support Vector Machines[J]. Acta Automatica Sinica, 2000,26(1):36-46.)
[34]
Lecun Y, Bottou L. Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998,86(11):2278-2324.
[35]
Levin E. A Recurrent Neural Network: Limitations and Training[J]. Neural Networks, 1990,3(6):641-650.
doi: 10.1016/0893-6080(90)90054-O
( Wang Shaomin, Yang Di, Ren Hua. Key Technology Research and Model Validation of Text Classification System Based on Deep Learning[J]. Telecommunications Science, 2018,34(12):123-130.)
( Liu Qiyuan, Ye Ying. A Study on Mining Bibliographic Records by Designed Software SATI:Case Study on Library and Information Science[J]. Journal of Information Resources Management, 2012,2(1):50-58.)
[38]
吴启明, 易云飞. 文本聚类综述[J]. 河池学院学报, 2008,28(2):86-91.
[38]
( Wu Qiming, Yi Yunfei. An Overview of Text Clustering[J]. Journal of Hechi University, 2008,28(2):86-91.)
( Tang Liang, Duan Jianguo, Xu Hongbo, et al. Mutual Information Maximization Based Feature Selection Algorithm in Text Classification[J]. Computer Engineering and Applications, 2008,44(13):130-133.)
( Zhou Haifang, Du Yunfei, Yang Xuejun, et al. Study and Implement of Parallel Region-based Registration Algorithm Based on Mutual Information for Remote-sensing Images[J]. Journal of Image and Graphics, 2010,15(1):174-180.)
( Guo Yawei, Liu Xiaoxia. Study on Information Gain-based Feature Selection in Chinese Text Categorization[J]. Computer Engineering and Applications, 2012,48(27):119-122.)
[45]
Vatsavai R R, Cheriyadat A, Gleason S. Supervised Semantic Classification for Nuclear Proliferation Monitoring[C] //Proceedings of the 39th IEEE Applied Imagery Pattern Recognition Workshop. IEEE, 2010.
[46]
Yin C F, Feng L, Ma L Y. An Improved Hoeffding-ID Data-stream Classification Algorithm[J]. The Journal of Supercomputing, 2016,72(7):2670-2681.
[47]
Cao J W, Huang W H, Zhao T, et al. An Enhance Excavation Equipments Classification Algorithm Based on Acoustic Spectrum Dynamic Feature[J]. Multidimensional Systems and Signal Processing, 2017,28(3):921-943.