|
|
Automatic Summarization of User-Generated Content in Academic Q&A Community Based on Word2Vec and MMR |
Tao Xing1( ),Zhang Xiangxian1,Guo Shunli2,Zhang Liman1 |
1 School of Management, Jilin University, Changchun 130022, China 2 School of Communication, Qufu Normal University, Qufu 276826, China |
|
|
Abstract [Objective] Aiming at the knowledge aggregation problem of user-generated content (UGC) in the current academic Q&A community, an improved automatic summarization method was proposed to provide efficient and accurate knowledge aggregation services for scientific research users in the community. [Methods] The proposed method called W2V-MMR was combine the idea of the Maximal Marginal Relevance (MMR) with the Word2Vec model. Firstly, information quality of abstract sentences was optimized through Word2Vec in the process of score and similarity calculation. Then the Maximal Marginal Relevance (MMR) was introduced to extract the abstract of UGC in the academic Q&A community. [Results] The information quality scores obtained by the proposed method in the four groups of experimental data are 1.422 8, 1.447 6, 1.5921 and 3.416 8, which were all higher than the MMR and TextRank in the comparative experiment. [Limitations] The effect of the number of abstract sentences on the results is not considered, and the quality of abstract under different number of abstract sentences is not compared. [Conclusions] The proposed method provides useful reference for knowledge aggregation service of academic Q&A community.
|
Received: 20 May 2019
Published: 01 June 2020
|
|
Corresponding Authors:
Tao Xing
E-mail: 459978415@qq.com
|
[1] |
李宇佳 . 学术新媒体信息服务模式与服务质量评价研究[D]. 长春:吉林大学, 2017.
|
[1] |
( Li Yujia . Research on Information Service Mode and Service Quality Evaluation of Academic New Media[D]. Changchun:Jilin University, 2017.)
|
[2] |
王宝勋 . 面向网络社区问答对的语义挖掘研究[D]. 哈尔滨:哈尔滨工业大学, 2013.
|
[2] |
( Wang Baoxun . Research on the Semantic Mining of Question-Answer Pairs in Web Communities[D]. Harbin:Harbin Institute of Technology, 2013.)
|
[3] |
Rehurek R, Sojka P. Software Framework for Topic Modelling with Large Corpora[C]// Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. 2010.
|
[4] |
Carbonell J, Goldstein J. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in the Information Retrieval. 1998: 335-336.
|
[5] |
贯君, 毕强, 赵夷平 . 基于关联数据的知识聚合与发现研究进展[J]. 情报资料工作, 2015(3):15-21.
|
[5] |
( Guan Jun, Bi Qiang, Zhao Yiping . Linked Data-based Knowledge Aggregation and Discovery Research Progress[J]. Information and Documentation Services, 2015(3):15-21.)
|
[6] |
刘秉权, 徐振, 刘峰 , 等. 面向问答社区的答案摘要方法研究综述[J]. 中文信息学报, 2016,30(1):1-7,15.
|
[6] |
( Liu Bingquan, Xu Zhen, Liu Feng , et al. A Survey of Answer Summarization on Community Question Answering[J]. Journal of Chinese Information Processing, 2016,30(1):1-7,15.)
|
[7] |
侯丽微, 胡珀, 曹雯琳 . 主题关键词信息融合的中文生成式自动摘要研究[J]. 自动化学报, 2019,45(3):530-539.
|
[7] |
( Hou Liwei, Hu Po, Cao Wenlin . Automatic Chinese Abstractive Summarization with Topical Keywords Fusion[J]. Acta Automatica Sinica, 2019,45(3):530-539.)
|
[8] |
王连喜 . 自动摘要研究中的若干问题[J]. 图书情报工作, 2014,58(20):13-22.
|
[8] |
( Wang Lianxi . Issues in Automatic Summarization Research[J]. Library and Information Service, 2014,58(20):13-22.)
|
[9] |
罗文娟, 马慧芳, 何清 , 等. 权衡熵和相关度的自动摘要技术研究[J]. 中文信息学报, 2011,25(5):9-16.
|
[9] |
( Luo Wenjuan, Ma Huifang, He Qing , et al. Leveraging Entropy and Relevance for Document Summarization[J]. Journal of Chinese Information Processing, 2011,25(5):9-16.)
|
[10] |
荀静, 杨玉珍 . 基于TextRank的文本情感摘要提取方法[J]. 计算机应用与软件, 2018,35(10):80-84.
|
[10] |
( Xun Jing, Yang Yuzhen . Text Emotion Summarization Extraction Based on TextRank[J]. Computer Applications and Software, 2018,35(10):80-84.)
|
[11] |
Li A, Jiang T, Wang Q, et al. The Mixture of TextRank and LexRank Techniques of Single Document Automatic Summarization Research in Tibetan[C]// Proceedings of the 8th International Conference on Intelligent Human-Machine Systems & Cybernetics. IEEE, 2016.
|
[12] |
Yasunaga M, Zhang R, Meelu K, et al. Graph-Based Neural Multi-Document Summarization[C]// Proceedings of the 31st Conference on Computational Natural Language Learning. 2017: 452-462.
|
[13] |
王帅, 赵翔, 李博 , 等. TP-AS:一种面向长文本的两阶段自动摘要方法[J]. 中文信息学报, 2018,32(6):71-79.
|
[13] |
( Wang Shuai, Zhao Xiang, Li Bo , et al. TP-AS: A Two-phase Approach to Long Text Automatic Summarization[J]. Journal of Chinese Information Processing, 2018,32(6):71-79.)
|
[14] |
Bhargava R, Sharma Y, Sharma G . ATSSI: Abstractive Text Summarization Using Sentiment Infusion[J]. Procedia Computer Science, 2016,89:404-411.
doi: 10.1016/j.procs.2016.06.088
|
[15] |
Akhtar N . Hierarchical Summarization of News Tweets with Twitter-LDA[A]// Ali R, Beg M. Application of Soft Computing for the Web[M]. 2017: 83-98.
|
[16] |
Zhang R, Li W, Gao D , et al. Automatic Twitter Topic Summarization with Speech Acts[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013,21(3):649-658.
doi: 10.1109/TASL.2012.2229984
|
[17] |
Madhawa P K K, Atukorale A S. A Robust Algorithm for Determining the Newsworthiness of Microblogs[C]// Proceedings of the 15th International Conference on Advances in ICT for Emerging Regions. IEEE, 2015: 135-139.
|
[18] |
苏放, 王晓宇, 张治 . 基于注意力机制的评论摘要生成[J]. 北京邮电大学学报, 2018,41(3):7-13.
|
[18] |
( Su Fang, Wang Xiaoyu, Zhang Zhi . Review Summarization Generation Based on Attention Mechanism[J]. Journal of Beijing University of Posts and Telecommunications, 2018,41(3):7-13.)
|
[19] |
Chan W, Zhou X, Wang W, et al. Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization[C]// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 2012: 582-591.
|
[20] |
Tomasoni M, Huang M. Metadata-Aware Measures for Answer Summarization in Community Question Answering[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 760-769.
|
[21] |
Song H, Ren Z, Liang S, et al. Summarizing Answers in Non-Factoid Community Question-Answering[C]// Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 2017: 405-414.
|
[22] |
Omari A, Carmel D, Rokhlenko O, et al. Novelty Based Ranking of Human Answers for Community Questions[C]// Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2016: 215-224.
|
[23] |
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
|
[24] |
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
|
[25] |
王仁武, 陈川宝, 孟现茹 . 基于词向量扩展的学术资源语义检索技术[J]. 图书情报工作, 2018,62(19):111-119.
|
[25] |
( Wang Renwu, Chen Chuanbao, Meng Xianru . Semantic Retrieval Technology of Academic Resources Based on Word Embedding Extension[J]. Library and Information Service, 2018,62(19):111-119.)
|
[26] |
Carbonell J. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries[C]// Proc of International ACM SIGIR Conference on Research & Development in the Information Retrieval. 1998.
|
[27] |
Shannon C E . Mathematical Theory of Communication[J]. Bell System Technical Journal, 1948,27(4):379-423.
doi: 10.1002/bltj.1948.27.issue-3
|
[28] |
应文豪, 肖欣延, 李素建 , 等. 一种利用语义相似度改进问答摘要的方法[J]. 北京大学学报:自然科学版, 2017,53(2):197-203.
|
[28] |
( Ying Wenhao, Xiao Xinyan, Li Sujian , et al. Improving Query-Focused Summarization with CNN-Based Similarity[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2017,53(2):197-203.)
|
[29] |
刘娜, 路莹, 唐晓君 , 等. 基于LDA重要主题的多文档自动摘要算法[J]. 计算机科学与探索, 2015,9(2):242-248.
doi: 10.3778/j.issn.1673-9418.1407006
|
[29] |
( Liu Na, Lu Ying, Tang Xiaojun , et al. Multi-Document Summarization Algorithm Based on Significance Topic of LDA[J]. Journal of Frontiers of Computer Science and Technology, 2015,9(2):242-248.)
doi: 10.3778/j.issn.1673-9418.1407006
|
[30] |
苏剑林 . 【不可思议的Word2Vec】 2.训练好的模型[EB/OL]. [2017-04-03]. https://kexue.fm/archives/4304.
|
[30] |
( Su Jianlin. (Incredible Word2Vec[EB/OL]. [2017-04-03]. https://kexue.fm/archives/4304.))
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|