|
|
Classification of Academic Papers for Periodical Selection |
Wang Xinyun,Wang Hao(),Deng Sanhong,Zhang Baolong |
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China;Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China |
|
|
Abstract [Objective] We constructed a hierarchical system for papers published by academic journals and proposed submission guidance based on the similarity between articles and journals.[Methods] We studied journals in the field of Library and Information Science and used hierarchical clustering to construct two-layer architecture. Then, we employed SVM, CNN, and RNN to classify these papers. Third, we compared the results of different characteristic combinations, and selected the most suitable algorithm. To optimize the classification results, we combined the journals with similar coverage.[Results] Once the characteristic combinations were more reflective to the article contents, we got the highest accuracy of 81.84%.[Limitations] The data size needs to be expanded.[Conclusions] The deep learning algorithm does a better job in classification than the machine learning algorithm. Combining journals with similar contents improves the classification results.
|
Received: 22 March 2020
Published: 25 July 2020
|
|
Corresponding Authors:
Wang Hao
E-mail: ywhaowang@nju.edu.cn
|
[1] |
魏瑾瑞. 学术论文的数量特征与文本趋势[J]. 统计研究, 2015,32(8):104-112.
|
[1] |
( Wei Jinrui. Quantitative Characters and Structural Change of Academic Papers[J]. Statistical Research, 2015,32(8):104-112.)
|
[2] |
陈宣. 建立期刊投稿网络系统的探讨[J]. 科技信息, 2010(34):218.
|
[2] |
( Chen Xuan. Discussion on the Establishment of Periodical Contribution Network System[J]. Science & Technology Information, 2010(34):218.)
|
[3] |
李京华, 张凤英. 如何向国外专业期刊投稿[J]. 中国科技期刊研究, 2001,12(3):239-240.
|
[3] |
( Li Jinghua, Zhang Fengying. How to Contribute to Foreign Professional Journals[J]. Chinese Journal of Scientific and Technical Periodicals, 2001,12(3):239-240.)
|
[4] |
王杨, 许闪闪, 李昌, 等. 基于支持向量机的中文极短文本分类模型[J]. 计算机应用研究, 2020,37(2):347-350.
|
[4] |
( Wang Yang, Xu Shanshan, Li Chang, et al. Classification Model Based on Support Vector Machine for Chinese Extremely Short Text[J]. Application Research of Computers, 2020,37(2):347-350.)
|
[5] |
王稳, 杨洋. 二十年来西方保险理论的演变及其倾向[J]. 保险研究, 2017(10):3-15.
|
[5] |
( Wang Wen, Yang Yang. Evolution of Western Insurance Theories and Their Trends in the Past Two Decades[J]. Insurance Studies, 2017(10):3-15.)
|
[6] |
沈立力. 民国期刊分类服务体系探索与实践——以“全国报刊索引民国时期期刊全文数据库”为例[J]. 河南图书馆学刊, 2017,37(12):117-119, 122.
|
[6] |
( Shen Lili. The Exploration and Practice of Classification Service System of Periodicals in the Republic of China: A Case Study of CNBKSY Database [J]. The Library Journal of Henan, 2017,37(12):117-119, 122.)
|
[7] |
王强, 李岩. 高校社科类国际期刊分类评价研究[J]. 社会科学辑刊, 2019(2):165-170.
|
[7] |
( Wang Qiang, Li Yan. Study on the Classification and Evaluation of International Journals of Social Science in Universities[J]. Social Science Journal, 2019(2):165-170.)
|
[8] |
兰超英, 张凌云. 我国旅游学术期刊影响力和影响因子研究[J]. 旅游学刊, 2013(3):96-105.
|
[8] |
( Lan Chaoying, Zhang Lingyun. Assessing the Influence and Impact of China’s Tourism Research Journals[J]. Tourism Tribune, 2013(3):96-105.)
|
[9] |
朱军涛, 苗蕾, 胡晓红, 等. 文本挖掘在期刊评价中的应用研究[J]. 企业技术开发, 2018,37(12):125-127.
|
[9] |
( Zhu Juntao, Miao Lei, Hu Xiaohong, et al. Research on the Application of Text Mining in Journal Evaluation[J]. Technological Development of Enterprise, 2018,37(12):125-127.)
|
[10] |
邹金串. 基于文本挖掘的期刊决策参考研究[D]. 厦门:华侨大学, 2018.
|
[10] |
( Zou Jinchuan. Research on Journal Decision-Making Reference Based on Text Mining[D]. Xiamen: Huaqiao University, 2018.)
|
[11] |
耿晓军. 基于半监督支持向量机的期刊收稿系统自动分类方法[J]. 现代电子技术, 2018,41(24):174-177.
|
[11] |
( Geng Xiaojun. An Automatic Classification Method Based on Semi-Supervised Support Vector Machine for Periodical Manuscript Acceptance System[J]. Modern Electronic Technique, 2018,41(24):174-177.)
|
[12] |
罗静. 网格聚类算法在用电营销中的应用[D]. 北京:华北电力大学, 2012.
|
[12] |
( Luo Jing. Application of Grid Clustering Algorithm in Electric Power Marketing[D]. Beijing: North China Electric Power University, 2012.)
|
[13] |
曹叔彦. CLIQUE网格聚类算法在医学空间数据中的应用[D]. 太原:山西医科大学, 2015.
|
[13] |
( Cao Shuyan. Grid Clustering Algorithm of CLIQUE in the Medical Application of Spatial Data[D]. Taiyuan: Shanxi Medical University, 2015.)
|
[14] |
Ester M, Kriegel H P, Sander J, et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C] //Proceedings of the 2nd International Conference on Knowledge Discovery & Data Mining. 1996: 226-231.
|
[15] |
陈立潮, 聂跃光, 李静, 等. DBSCAN算法在公路选线中的应用[J]. 计算机应用, 2008,28(S1):324-326.
|
[15] |
( Chen Lichao, Nie Yueguang, Li Jing, et al. DBSCAN Spatial Clustering Algorithm and Its Application in Highway Alignment Selection[J]. Journal of Computer Applications, 2008,28(S1):324-326.)
|
[16] |
宋浩远. 基于模型的聚类方法研究[J]. 重庆科技学院学报:自然科学版, 2008,10(3):71-73.
|
[16] |
( Song Haoyuan. Study on Model-based Clustering Methods[J]. Journal of Chongqing University of Science and Technology: Natural Science Edition, 2008,10(3):71-73.)
|
[17] |
Guha S, Rastogi R, Shim K. CURE: An Efficient Clustering Algorithm for Large Databases[J]. Information Systems, 1998,26(1):35-58.
|
[18] |
Guha S, Rastogi R, Shim K. ROCK: A Robust Clustering Algorithm for Categorical Attributes[J]. Information Systems, 1999,25(5):345-366.
doi: 10.1016/S0306-4379(00)00022-3
|
[19] |
Karypis G, Han E H, Kumar V. Chameleon: Hierarchical Clustering Using Dynamic Modeling[J]. Computer, 1999,32(8):68-75.
|
[20] |
孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J]. 软件学报, 2008,19(1):48-61.
|
[20] |
( Sun Jigui, Liu Jie, Zhao Lianyu. Clustering Algorithms Research[J]. Journal of Software, 2008,19(1):48-61.)
|
[21] |
张雅杰, 张俊玲, 杨洋, 等. 层次聚类分析法在连州市土地利用分区中的应用[J]. 国土资源科技管理, 2007,24(5):71-76.
|
[21] |
( Zhang Yajie, Zhang Junling, Yang Yang, et al. Application of Hierarchical Clustering Analysis Method to Land Use Regionalization in Lianzhou[J]. Scientific and Technological Management of Land and Resources, 2007,24(5):71-76.)
|
[22] |
言迎, 王应龙, 杨延. 层次聚类分析法在土地利用分区中的应用——以益阳市南县为例[J]. 内蒙古农业科技, 2009(5):83-85.
|
[22] |
( Yan Ying, Wang Yinglong, Yang Yan. Application of Hierarchical Cluster on Land Utilization Division——Take Nan County in Yiyang for Example[J]. Inner Mongolia Agricultural Science and Technology, 2009(5):83-85.)
|
[23] |
MacQueen J. Some Methods for Classification and Analysis of Multivariate Observations[C] //Proceedings of the 5th Berkeley Symposium on Mathematical Statistics & Probability. 1967.
|
[24] |
Huang Z X. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values[J]. Data Mining & Knowledge Discovery, 1998,2(3):283-304.
|
[25] |
Chaturvedi A, Green P E, Caroll J D. K-modes Clustering[J]. Journal of Classification, 2001,18(1):35-55.
doi: 10.1007/s00357-001-0004-3
|
[26] |
Ding C, He X F. K-nearest-neighbor Consistency in Data Clustering: Incorporating Local Information into Global Optimization[C] //Proceedings of the 2004 ACM Symposium on Applied Computing. 2004: 584-589.
|
[27] |
李洋. K-means聚类算法在入侵检测中的应用[J]. 计算机工程, 2007,33(14):154-156.
|
[27] |
( Li Yang. Application of K-means Clustering Algorithm in Intrusion Detection[J]. Computer Engineering, 2007,33(14):154-156.)
|
[28] |
邢留伟. K-means算法在客户细分中的应用研究[D]. 成都:西南财经大学, 2007.
|
[28] |
( Xing Liuwei. Application of K-means in Customer Segmentation[D]. Chengdu: Southwestern University of Finance and Economics, 2007.)
|
[29] |
王东波, 苏新宁, 朱丹浩, 等. 基于支持向量机的医学期刊文章自动分类研究[J]. 情报理论与实践, 2011,34(4):115-118.
|
[29] |
( Wang Dongbo, Su Xinning, Zhu Danhao, et al. Research on Automatic Classification of Medical Journal Articles Based on SVM[J]. Information Studies: Theory&Application, 2011,34(4):115-118.)
|
[30] |
杨经, 林世平. 基于SVM的文本词句情感分析[J]. 计算机应用与软件, 2011,28(9):225-228.
|
[30] |
( Yang Jing, Lin Shiping. Emotion Analysis on Text Words and Sentences Based on SVM[J]. Computer Applications and Software, 2011,28(9):225-228.)
|
[31] |
郑亚南, 田大钢. 基于GloVe与SVM的文本分类研究[J]. 软件导刊, 2018,17(6):45-48,52.
|
[31] |
( Zheng Ya’nan, Tian Dagang. Research on Text Classification Based on GloVe and SVM[J]. Software Guide, 2018,17(6):45-48,52.)
|
[32] |
齐玉东, 丁海强, 司维超, 等. 基于改进CNN的海军军事文本分类模型[J]. 电光与控制, 2020,27(5):68-73.
|
[32] |
( Qi Yudong, Ding Haiqiang, Si Weichao, et al. Navy Text Classification Model Based on Improved CNN[J]. Electronics Optics & Control, 2020,27(5):68-73.)
|
[33] |
张学工. 关于统计学习理论与支持向量机[J]. 自动化学报, 2000,26(1):36-46.
|
[33] |
( Zhang Xuegong. Introduction to Statistical Learning Theory and Support Vector Machines[J]. Acta Automatica Sinica, 2000,26(1):36-46.)
|
[34] |
Lecun Y, Bottou L. Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998,86(11):2278-2324.
|
[35] |
Levin E. A Recurrent Neural Network: Limitations and Training[J]. Neural Networks, 1990,3(6):641-650.
doi: 10.1016/0893-6080(90)90054-O
|
[36] |
汪少敏, 杨迪, 任华. 基于深度学习的文本分类系统关键技术研究与模型验证[J]. 电信科学, 2018,34(12):123-130.
|
[36] |
( Wang Shaomin, Yang Di, Ren Hua. Key Technology Research and Model Validation of Text Classification System Based on Deep Learning[J]. Telecommunications Science, 2018,34(12):123-130.)
|
[37] |
刘启元, 叶鹰. 文献题录信息挖掘技术方法及其软件SATI的实现——以中外图书情报学为例[J]. 信息资源管理学报, 2012,2(1):50-58.
|
[37] |
( Liu Qiyuan, Ye Ying. A Study on Mining Bibliographic Records by Designed Software SATI:Case Study on Library and Information Science[J]. Journal of Information Resources Management, 2012,2(1):50-58.)
|
[38] |
吴启明, 易云飞. 文本聚类综述[J]. 河池学院学报, 2008,28(2):86-91.
|
[38] |
( Wu Qiming, Yi Yunfei. An Overview of Text Clustering[J]. Journal of Hechi University, 2008,28(2):86-91.)
|
[39] |
姜芳, 李国和, 岳翔. 基于语义的文档特征提取研究方法[J]. 计算机科学, 2016,43(2):254-258.
|
[39] |
( Jiang Fang, Li Guohe, Yue Xiang. Semantic-based Feature Extraction Method for Document[J]. Computer Science, 2016,43(2):254-258.)
|
[40] |
张海龙, 王莲芝. 自动文本分类特征选择方法研究[J]. 计算机工程与设计, 2006,27(20):114-117.
|
[40] |
( Zhang Hailong, Wang Lianzhi. Automatic Text Categorization Feature Selection Methods Research[J]. Computer Engineering and Design, 2006,27(20):114-117.)
|
[41] |
杨凯峰, 张毅坤, 李燕. 基于文档频率的特征选择方法[J]. 计算机工程, 2010,36(17):39-41,44.
|
[41] |
( Yang Kaifeng, Zhang Yikun, Li Yan. Feature Selection Method Based on Document Frequency[J]. Computer Engineering, 2010,26(17):39-41, 44.)
|
[42] |
唐亮, 段建国, 许洪波, 等. 基于互信息最大化的特征选择算法及应用[J]. 计算机工程与应用, 2008,44(13):130-133.
|
[42] |
( Tang Liang, Duan Jianguo, Xu Hongbo, et al. Mutual Information Maximization Based Feature Selection Algorithm in Text Classification[J]. Computer Engineering and Applications, 2008,44(13):130-133.)
|
[43] |
周海芳, 杜云飞, 杨学军, 等. 基于互信息的遥感图像区域配准并行算法的研究与实现[J]. 中国图象图形学报, 2010,15(1):174-180.
|
[43] |
( Zhou Haifang, Du Yunfei, Yang Xuejun, et al. Study and Implement of Parallel Region-based Registration Algorithm Based on Mutual Information for Remote-sensing Images[J]. Journal of Image and Graphics, 2010,15(1):174-180.)
|
[44] |
郭亚维, 刘晓霞. 文本分类中信息增益特征选择方法的研究[J]. 计算机工程与应用, 2012,48(27):119-122.
|
[44] |
( Guo Yawei, Liu Xiaoxia. Study on Information Gain-based Feature Selection in Chinese Text Categorization[J]. Computer Engineering and Applications, 2012,48(27):119-122.)
|
[45] |
Vatsavai R R, Cheriyadat A, Gleason S. Supervised Semantic Classification for Nuclear Proliferation Monitoring[C] //Proceedings of the 39th IEEE Applied Imagery Pattern Recognition Workshop. IEEE, 2010.
|
[46] |
Yin C F, Feng L, Ma L Y. An Improved Hoeffding-ID Data-stream Classification Algorithm[J]. The Journal of Supercomputing, 2016,72(7):2670-2681.
|
[47] |
Cao J W, Huang W H, Zhao T, et al. An Enhance Excavation Equipments Classification Algorithm Based on Acoustic Spectrum Dynamic Feature[J]. Multidimensional Systems and Signal Processing, 2017,28(3):921-943.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|