Please wait a minute...
Data Analysis and Knowledge Discovery  2019, Vol. 3 Issue (1): 46-54    DOI: 10.11925/infotech.2096-3467.2018.1365
Current Issue | Archive | Adv Search |
Mining Innovative Topics Based on Deep Learning
Changlei Fu1,Li Qian1,2(),Huaping Zhang3,Huaming Zhao1,Jing Xie1,2
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China
3School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China;
Download: PDF(975 KB)   HTML ( 9
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to identify innovative topics from massive volumes of texts. [Methods] First, we extracted knowledge points with heavier weights from the data of scholarly knowledge graph. Then, these knowledge points were labeled as innovative seeds from the perspectives of “popularity”, “novelty” and “authority”. Third, we computed the knowledge correlation of the innovative seeds. Finally, the results were input to a deep learning model trained by large amounts of sci-tech papers to generate innovative topics. Note: the model is sequence to sequence with Bi-LSTM. [Results] We used Chinese research papers on artificial intelligence as the experimental data and found the average innovation score of the retrieved topics was 6.52, which were evaluated by experts manually. [Limitations] At present, contents of the knowledge graph and the training datasets need to be improved. [Conclusions] The proposed model, which identifies innovative topics from scholarly papers, could be optimized in the future.

Key wordsInnovative Topic      Deep Learning      Seq2Seq      Intelligent Mining     
Received: 04 December 2018      Published: 04 March 2019

Cite this article:

Changlei Fu,Li Qian,Huaping Zhang,Huaming Zhao,Jing Xie. Mining Innovative Topics Based on Deep Learning. Data Analysis and Knowledge Discovery, 2019, 3(1): 46-54.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2018.1365     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2019/V3/I1/46

[1] 王珊, 王会举, 覃雄派, 等. 架构大数据: 挑战、现状与展望[J]. 计算机学报, 2011, 34(10): 1741-1752.
[1] (Wang Shan, Wang Huiju, Qin Xiongpai, et al.Architecting Big Data: Challenges, Studies and Forecasts[J]. Chinese Journal of Computers, 2011, 34(10): 1741-1752.)
[2] 李家清. 知识组织方法及策略研究[J]. 图书情报工作, 2005, 49(5): 41-44.
[2] (Li Jiaqing.Approches and Strategies of Knowledge Organizations[J]. Library and Information Service, 2005, 49(5): 41-44.)
[3] 苏新宁. 面向知识服务的知识组织[J]. 情报资料工作, 2015, 36(1): 5.
[3] (Su Xinning.Research on Knowledge Service-oriented Knowledge Organizations[J]. Information and Documentation Services, 2015, 36(1): 5.)
[4] 温有奎, 温浩, 徐端颐, 等. 基于创新点的知识元挖掘[J]. 情报学报, 2005, 24(6): 663-668.
[4] (Wen Youkui, Wen Hao, Xu Duanyi, et al.Knowledge Element Mining in Knowledge Management[J]. Journal of the China Society for Scientific and Technical Information, 2005, 24(6): 663-668.)
[5] Klavans J L, Muresan S.DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and Their Associated Definitions from On-line Text[J]. AMIA Annual Symposium Proceedings, 1999: 1049.
[6] Liu B, Chin C W, Ng H T.Mining Topic-specific Concepts and Definitions on the Web[C]//Proceedings of International Conference on World Wide Web. 2003: 251-260.
[7] 冷伏海, 白如江, 祝清松. 面向科技文献的混合语义信息抽取方法研究[J]. 图书情报工作, 2013, 57(11): 112-119.
[7] (Leng Fuhai, Bai Rujiang, Zhu Qingsong.A Hybrid Semantic Information Extraction Method for Scientific Research Papers[J]. Library and Information Service, 2013, 57(11): 112-119.)
[8] 张帆, 乐小虬. 面向领域科技文献的句子级创新点抽取研究[J]. 现代图书情报技术, 2014(9): 15-21.
[8] (Zhang Fan, Le Xiaoqiu.Research on Innovation Points Extraction from Scientific Research Paper Based on Field Thesaurus[J]. New Technology of Library and Information Service, 2014(9): 15-21.)
[9] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint. arXiv: 1301.3781.
[10] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[C]//Proceedings of International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[11] 朱群雄, 孙锋. RNN神经网络的应用研究[J]. 北京化工大学学报: 自然科学版, 1998, 25(1): 86-90.
[11] (Zhu Qunxiong, Sun Feng.Study on Application of Recurrent Neural Network[J]. Journal of Beijing University of Chemical Technology: Natural Science Edition, 1998, 25(1): 86-90.)
[12] Pascanu R, Mikolov T, Bengio Y.On the Difficulty of TrainingRecurrent Neural Networks[C]// Proceedings of International Conference on Machine Learning. 2013.
[13] Theodoridis S.Neural Networks and Deep Learning[A]// Machine Learning[M]. 2015: 875-936.
[14] Sundermeyer M, Schlüter R, Ney H.LSTM Neural Networks for Language Modeling[C]// Proceedings of Interspeech. 2012.
[15] Gers F A, Schmidhuber J, Cummins F.Learning to Forget: Continual Prediction with LSTM[J]. Neural Computation, 2014, 12(10): 2451-2471.
[16] Hakkani-Tür D, Tur G, Celikyilmaz A, et al.Multi-Domain Joint Semantic Frame Parsing Using Bi-directional RNN- LSTM[C]//Proceedings of the Meeting of the International Speech Communication Association. 2016.
[17] Lample G, Ballesteros M, Subramanian S, et al.Neural Architectures for Named Entity Recognition[OL]. arXiv Preprint. arXiv: 1603.0136.
[18] Ma X, Hovy E.End-to-End Sequence Labeling via Bi-directional LSTM-CNNs-CRF[OL]. arXiv Preprint. arXiv: 1603.01354.
[19] Sutskever I, Vinyals O, Le Q V.Sequence to Sequence Learning with Neural Networks[OL]. arXiv Preprint. arXiv: 1409.3215.
[20] Bahdanau D, Cho K, Bengio Y.Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint. arXiv: 1409.0473.
[21] 李如森, 彭彩红, 赵福荣. 科技论文创新性判断方法[J]. 鞍山钢铁学院学报, 2001, 24(3): 234-236.
[21] (Li Rusen, Peng Caihong, Zhao Furong.Judging Method of Innovation for Scientific and Technological Papers[J]. Journal of Anshan of Institute of I. & S. Technology, 2001, 24(3): 234-236.)
[22] Dahl T.Contributing to the Academic Conversation: A Study of New Knowledge Claims in Economics and Linguistics[J]. Journal of Pragmatics, 2008, 40(7): 1184-1201.
[23] Parkinson J.The Discussion Section as Argument: The Language Used to Prove Knowledge Claims[J]. English for Specific Purposes, 2011, 30(3): 164-175.
[24] El-Kishky A, Song Y, Voss C R, et al.Scalable Topical Phrase Mining from Text Corpora[J]. Proceedings of the VLDB Endowment, 2014, 8(3): 305-316.
[25] 杨建林, 钱玲飞. 基于关键词对逆文档频率的主题新颖度度量方法[J]. 情报理论与实践, 2013, 36(3): 99-102.
[25] (Yang Jianlin, Qian Lingfei.A Method for Novel Novelty Measurement Based on Keyword to Inverse Document Frequency[J]. Information Studies: Theory & Application, 2013, 36(3): 99-102.)
[26] Mikolov T, Le Q V, Sutskever I.Exploiting Similarities Among Languages for Machine Translation[OL]. arXiv Preprint. arXiv: 1309.4168.
[27] Hinton G E, Srivastava N, Krizhevsky A, et al.Improving Neural Networks by Preventing Co-adaptation of Feature Detectors[OL]. arXiv Preprint. arXiv: 1207.0580.
[28] Kajdanowicz T, Kazienko P, Kraszewski J.Boosting Algorithm with Sequence-Loss Cost Function for Structured Prediction[C]//Proceedings of International Conference on Hybrid Artificial Intelligence Systems. 2010: 573-580.
[29] Kingma D, Ba J.Adam: A Method for Stochastic Optimization [OL]. arXiv Preprint. arXiv: 1412.6980.
[30] 中图分类号[DB/OL]. [2018-07-21]. .
[30] (Chinese Library Classification[DB/OL]. [2018-07-21].
[31] jieba[EB/OL].[2018-09-09]..
[32] gensim[EB/OL].[2018-09-09]..
[1] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[2] Jingjing Pei,Xiaoqiu Le. Identifying Coordinate Text Blocks in Discourses[J]. 数据分析与知识发现, 2019, 3(5): 51-56.
[3] Li Yu,Li Qian,Changlei Fu,Huaming Zhao. Extracting Fine-grained Knowledge Units from Texts with Deep Learning[J]. 数据分析与知识发现, 2019, 3(1): 38-45.
[4] Bengong Yu,Peihang Zhang,Qingtang Xu. Selecting Products Based on F-BiGRU Sentiment Analysis[J]. 数据分析与知识发现, 2018, 2(9): 22-30.
[5] Wei Lu,Mengqi Luo,Heng Ding,Xin Li. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
[6] Guoming Feng,Xiaodong Zhang,Suhui Liu. Classifying Chinese Texts with CapsNet[J]. 数据分析与知识发现, 2018, 2(12): 68-76.
[7] Yanhui Xiao,Xin Wang,Wen’gang Feng,Huawei Tian,Shaozhong Wu,Lihua Li. Predicting Crime Locations Based on Long Short Term Memory and Convolutional Neural Networks[J]. 数据分析与知识发现, 2018, 2(10): 15-20.
[8] Wengang Feng,Jing Huang. Early Warning for Civil Aviation Security Checks Based on Deep Learning[J]. 数据分析与知识发现, 2018, 2(10): 46-53.
[9] Jiaheng Hu,Yonghua Cen,Chengyao Wu. Constructing Sentiment Dictionary with Deep Learning: Case Study of Financial Data[J]. 数据分析与知识发现, 2018, 2(10): 95-102.
[10] Sanhong Deng,Yuyangzi Fu,Hao Wang. Multi-Label Classification of Chinese Books with LSTM Model[J]. 数据分析与知识发现, 2017, 1(7): 52-60.
[11] Danhao Zhu, Lei Yang, Dongbo Wang. Recognizing Chinese Organization Names Based on Deep Learning: A Recurrent Network Model[J]. 数据分析与知识发现, 2016, 32(12): 36-43.
[12] Liyi Zhang,Chang Liu. Combine Deep Belief Networks and Fuzzy Set for Recognition of Fraud Transaction[J]. 现代图书情报技术, 2016, 32(1): 32-39.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn