|
|
Extracting Keywords with Topic Embedding and Network Structure Analysis |
Qingtian Zeng1,2,Xiaohui Hu2,Chao Li1,3() |
1(College of Electronic Information Engineering, Shandong University of Science and Technology, Qingdao 266590, China) 2(College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China) 3(Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai 201804, China) |
|
|
Abstract [Objective] This paper proposes a new model to extract topic keywords, aiming to detect those low frequency words of high relevance. [Methods] First, we designed a topic keyword extraction method, which integrated the topic embedding and network structure analysis techniques. Then, we extracted the preliminary set of topic keywords based on the LDA model, and trained the word vector with Word2Vec model. Third, we built a network based on word vector similarity and identified the final topic keywords with the help of network structure analysis. [Results] The new method improved the average similarity between topic keywords by 14.75%. Our method extracted the low frequency keywords with high topic relevance more effectively than the LDA model. [Limitations] The sample size needs to be expanded, and the segmentation process requires more manual adjustments. More research is needed to quantitatively analyze the topic keywords. [Conclusions] Our method improves the abstracting and public opinion analysis.
|
Received: 19 August 2018
Published: 06 September 2019
|
|
Corresponding Authors:
Chao Li
E-mail: 1008lichao@163.com
|
[1] |
Bharti S K, Babu K S . Automatic Keyword Extraction for Text Summarization: A Survey[OL]. arXiv Preprint, arXiv: 1704. 03242.
|
[2] |
Moody C E . Mixing Dirichlet Topic Models and Word Embeddings to Make Lda2vec[OL]. arXiv Preprint, arXiv: 1605. 02019.
|
[3] |
庞贝贝, 苟娟琼, 穆文歆 . 面向高校学生深度辅导领域的主题建模和主题上下位关系识别研究[J]. 数据分析与知识发现, 2018,2(6):92-101.
|
[3] |
( Pang Beibei, Gou Juanqiong, Mu Wenxin . Extracting Topics and Their Relationship from College Student Mentoring[J]. Data Analysis and Knowledge Discovery, 2018,2(6):92-101.)
|
[4] |
Nadkarni P M . An Introduction to Information Retrieval: Applications in Genomics[J]. The Pharmacogenomics Journal, 2002,2(2):96-102.
|
[5] |
Pawar D D, Bewoor M S, Patil S H . Text Rank: A Novel Concept for Extraction Based Text Summarization[J]. International Journal of Computer Science & Information Technology, 2014,5(3):3301-3304.
|
[6] |
Lai S, Liu K, He S , et al. How to Generate a Good Word Embedding[J]. IEEE Intelligent Systems, 2016,31(6):5-14.
|
[7] |
Blei D M, Ng A Y, Jordan M I . Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3:993-1022.
|
[8] |
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
|
[9] |
Cohen J D . Highlights: Language- and Domain-Independent Automatic Indexing Terms for Abstracting[J]. Journal of the American Society for Information Science, 1995,46(3):162-174.
|
[10] |
Luhn H P . A Statistical Approach to Mechanized Encoding and Searching of Literary Information[J]. IBM Journal of Research and Development, 1957,1(4):309-317.
|
[11] |
姚兆旭, 马静 . 面向微博话题的“主题+观点”词条抽取算法研究[J]. 现代图书情报技术, 2016(7):78-86.
|
[11] |
( Yao Zhaoxu, Ma Jing . Extracting Topic and Opinion from Microblog Posts with New Algorithm[J]. New Technology of Library and Information Service, 2016(7):78-86.)
|
[12] |
覃世安, 李法运 . 文本分类中TF-IDF方法的改进研究[J]. 现代图书情报技术, 2013(10):27-30.
|
[12] |
( Qin Shian, Li Fayun . Improved TF-IDF Method in Text Classification[J]. New Technology of Library and Information Service, 2013(10):27-30.)
|
[13] |
Matsuo Y, Ishizuka M . Keyword Extraction from a Single Document Using Word Co-occurrence Statistical Information[J]. International Journal on Artificial Intelligence Tools, 2004,13(1):157-169.
|
[14] |
Zhao Z, Li C, Zhang Y , et al. Identifying and Analyzing Popular Phrases Multi-dimensionally in Social Media Data[J]. International Journal of Data Warehousing & Mining, 2015,11(3):98-112.
|
[15] |
Barzilay R, Elhadad M. Using Lexical Chains for Text Summarization [C]. //Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization. 1997.
|
[16] |
Hulth A. Improved Automatic Keyword Extraction Given More Linguistic Knowledge [C]// Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing. 2003: 216-223.
|
[17] |
Salton G, Singhal A, Mitra M , et al. Automatic Text Structuring and Summarization[J]. Information Processing & Management, 1997,33(2):193-207.
|
[18] |
Conroy J M, O’leary D P. Text Summarization via Hidden Markov Models [C]// Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001: 406-407.
|
[19] |
Zhang K, Xu H, Tang J, et al. Keyword Extraction Using Support Vector Machine [C]// Proceedings of the 2006 International Conference on Web-Age Information Management. 2006: 85-96.
|
[20] |
Frank E, Paynter G W, Witten I H, et al. Domain-Specific Keyphrase Extraction [C]// Proceedings of the 16th International Joint Conference on Artificial Intelligence. 1999,2:668-673.
|
[21] |
Liu Z, Chen X, Zheng Y, et al. Automatic Keyphrase Extraction by Bridging Vocabulary Gap [C]// Proceedings of the 15th Conference on Computational Natural Language Learning. 2011: 135-144.
|
[22] |
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality [C]// Proceedings of the 2013 International Conference on Neural Information Processing Systems. 2013,26:3111-3119.
|
[23] |
Liu Y, Liu Z, Chua T S, et al. Topical Word Embeddings [C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2418-2424.
|
[24] |
Chang J, Boyd-Graber J, Gerrish S, et al. Reading Tea Leaves: How Humans Interpret Topic Models [C]// Proceedings of the 22nd International Conference on Neural Information Processing Systems. 2009: 288-296.
|
[25] |
王婷婷, 韩满, 王宇 . LDA模型的优化及其主题数量选择研究——以科技文献为例[J]. 数据分析与知识发现, 2018,2(1):29-40.
|
[25] |
( Wang Tingting, Han Man, Wang Yu . Optimizing LDA Model with Various Topic Numbers: Case Study of Scientific Literature[J]. Data Analysis and Knowledge Discovery, 2018,2(1):29-40.)
|
[26] |
陈磊, 李俊 . 基于LF-LDA和Word2vec的文本表示模型研究[J]. 电子技术, 2017(7):1-5.
|
[26] |
( Chen Lei, Li Jun . Text Representation Model Based on LF-LDA and Word2Vec[J]. Electronic Technology, 2017(7):1-5.)
|
[27] |
Liu W, Dong W . A Question Recommendation Model Based on LDA and Word2Vec[A]// Hussain A, Ivanovic M. Electronics, Communications and Networks IV[M]. 2015: 1527-1531.
|
[28] |
董文 . 基于LDA和Word2Vec的推荐算法研究[D]. 北京: 北京邮电大学, 2015.
|
[28] |
( Dong Wen . Research of Recommendation Algorithm Based on LDA and Word2Vec[D]. Beijing: Beijing University of Posts and Telecommunications, 2015.)
|
[29] |
Wang Z, Ma L, Zhang Y. A Hybrid Document Feature Extraction Method Using Latent Dirichlet Allocation and Word2Vec [C]// Proceedings of the 1st International Conference on Data Science in Cyberspace. 2016: 98-103.
|
[30] |
韦强申 . 领域关键词抽取: 结合LDA与Word2Vec[D]. 贵阳: 贵州师范大学, 2016.
|
[30] |
( Wei Qiangshen . Keyword Extraction Based on LDA and Word2Vec[D]. Guiyang: Guizhou Normal University, 2016.)
|
[31] |
宁建飞, 刘降珍 . 融合Word2Vec与TextRank的关键词抽取研究[J]. 现代图书情报技术, 2016(6):20-27.
|
[31] |
( Ning Jianfei, Liu Jiangzhen . Using Word2Vec with TextRank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6):20-27.)
|
[32] |
夏天 . 词向量聚类加权TextRank的关键词抽取[J]. 数据分析与知识发现, 2017,1(2):28-34.
|
[32] |
( Xia Tian . Extracting Keywords with Modified TextRank Model[J]. Data Analysis and Knowledge Discovery, 2017,1(2):28-34.)
|
[33] |
Wen Y, Yuan H, Zhang P. Research on Keyword Extraction Based on Word2Vec Weighted TextRank [C]// Proceedings of the 2nd International Conference on Computer and Communications. 2017: 2109-2113.
|
[34] |
刘奇飞, 沈炜域 . 基于Word2Vec和TextRank的时政类新闻关键词抽取方法研究[J]. 情报探索, 2018(6):22-27.
|
[34] |
( Liu Qifei, Shen Weiyu . Research of Keyword Extraction of Political News Based on Word2Vec and TextRank[J]. Information Research, 2018(6):22-27.)
|
[35] |
Brin S, Page L. The Anatomy of a Large-Scale Hyper Textual Web Search Engine [C]// Proceedings of the 7th International Conference on World Wide Web. 1998,30:107-117.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|