[Objective] Aiming at the knowledge aggregation problem of user-generated content (UGC) in the current academic Q&A community, an improved automatic summarization method was proposed to provide efficient and accurate knowledge aggregation services for scientific research users in the community. [Methods] The proposed method called W2V-MMR was combine the idea of the Maximal Marginal Relevance (MMR) with the Word2Vec model. Firstly, information quality of abstract sentences was optimized through Word2Vec in the process of score and similarity calculation. Then the Maximal Marginal Relevance (MMR) was introduced to extract the abstract of UGC in the academic Q&A community. [Results] The information quality scores obtained by the proposed method in the four groups of experimental data are 1.422 8, 1.447 6, 1.5921 and 3.416 8, which were all higher than the MMR and TextRank in the comparative experiment. [Limitations] The effect of the number of abstract sentences on the results is not considered, and the quality of abstract under different number of abstract sentences is not compared. [Conclusions] The proposed method provides useful reference for knowledge aggregation service of academic Q&A community.
陶兴,张向先,郭顺利,张莉曼. 学术问答社区用户生成内容的W2V-MMR自动摘要方法研究*[J]. 数据分析与知识发现, 2020, 4(4): 109-118.
Tao Xing,Zhang Xiangxian,Guo Shunli,Zhang Liman. Automatic Summarization of User-Generated Content in Academic Q&A Community Based on Word2Vec and MMR. Data Analysis and Knowledge Discovery, 2020, 4(4): 109-118.
( Li Yujia . Research on Information Service Mode and Service Quality Evaluation of Academic New Media[D]. Changchun:Jilin University, 2017.)
王宝勋 . 面向网络社区问答对的语义挖掘研究[D]. 哈尔滨:哈尔滨工业大学, 2013.
( Wang Baoxun . Research on the Semantic Mining of Question-Answer Pairs in Web Communities[D]. Harbin:Harbin Institute of Technology, 2013.)
Rehurek R, Sojka P. Software Framework for Topic Modelling with Large Corpora[C]// Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. 2010.
Carbonell J, Goldstein J. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries[C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in the Information Retrieval. 1998: 335-336.
( Xun Jing, Yang Yuzhen . Text Emotion Summarization Extraction Based on TextRank[J]. Computer Applications and Software, 2018,35(10):80-84.)
Li A, Jiang T, Wang Q, et al. The Mixture of TextRank and LexRank Techniques of Single Document Automatic Summarization Research in Tibetan[C]// Proceedings of the 8th International Conference on Intelligent Human-Machine Systems & Cybernetics. IEEE, 2016.
Yasunaga M, Zhang R, Meelu K, et al. Graph-Based Neural Multi-Document Summarization[C]// Proceedings of the 31st Conference on Computational Natural Language Learning. 2017: 452-462.
( Wang Shuai, Zhao Xiang, Li Bo , et al. TP-AS: A Two-phase Approach to Long Text Automatic Summarization[J]. Journal of Chinese Information Processing, 2018,32(6):71-79.)
Bhargava R, Sharma Y, Sharma G . ATSSI: Abstractive Text Summarization Using Sentiment Infusion[J]. Procedia Computer Science, 2016,89:404-411.
Akhtar N . Hierarchical Summarization of News Tweets with Twitter-LDA[A]// Ali R, Beg M. Application of Soft Computing for the Web[M]. 2017: 83-98.
Zhang R, Li W, Gao D , et al. Automatic Twitter Topic Summarization with Speech Acts[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2013,21(3):649-658.
Madhawa P K K, Atukorale A S. A Robust Algorithm for Determining the Newsworthiness of Microblogs[C]// Proceedings of the 15th International Conference on Advances in ICT for Emerging Regions. IEEE, 2015: 135-139.
( Su Fang, Wang Xiaoyu, Zhang Zhi . Review Summarization Generation Based on Attention Mechanism[J]. Journal of Beijing University of Posts and Telecommunications, 2018,41(3):7-13.)
Chan W, Zhou X, Wang W, et al. Community Answer Summarization for Multi-Sentence Question with Group L1 Regularization[C]// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics. 2012: 582-591.
Tomasoni M, Huang M. Metadata-Aware Measures for Answer Summarization in Community Question Answering[C]// Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 2010: 760-769.
Song H, Ren Z, Liang S, et al. Summarizing Answers in Non-Factoid Community Question-Answering[C]// Proceedings of the 10th ACM International Conference on Web Search and Data Mining. 2017: 405-414.
Omari A, Carmel D, Rokhlenko O, et al. Novelty Based Ranking of Human Answers for Community Questions[C]// Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2016: 215-224.
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301. 3781.
( Wang Renwu, Chen Chuanbao, Meng Xianru . Semantic Retrieval Technology of Academic Resources Based on Word Embedding Extension[J]. Library and Information Service, 2018,62(19):111-119.)
Carbonell J. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries[C]// Proc of International ACM SIGIR Conference on Research & Development in the Information Retrieval. 1998.
( Liu Na, Lu Ying, Tang Xiaojun , et al. Multi-Document Summarization Algorithm Based on Significance Topic of LDA[J]. Journal of Frontiers of Computer Science and Technology, 2015,9(2):242-248.)