Please wait a minute...
Advanced Search
数据分析与知识发现  2020, Vol. 4 Issue (10): 37-46     https://doi.org/10.11925/infotech.2096-3467.2019.1301
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于深度学习的问答平台查询推荐研究*
丁恒(),李映萱
华中师范大学信息管理学院 武汉 430079
Improving Online Q&A Service with Deep Learning
Ding Heng(),Li Yingxuan
School of Information Management, Central China Normal University, Wuhan 430079, China
全文: PDF (934 KB)   HTML ( 17
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 针对社会化问答平台场景,构建深度神经网络模型,改善查询推荐的效果。【方法】 以Yahoo Answers和Yahoo! L6为基础构建实验数据集,基于语义匹配矩阵、变长卷积层和多层感知机构建CNMNN神经网络模型,并与MQ2QC、IBLM、DRMM和MatchPyramid等基线进行了对比。【结果】 对比MQ2QC、IBLM、DRMM、MatchPyramid这4种现有方法的最优效果,CNMNN模型在nDCG@5、nDCG@10、nDCG@20、MRR和MAP等相关性评价指标上的提升率分别为45.0%、38.7%、33.4%、34.8%和52.9%,在α-nDCG@5、α-nDCG@10、α-nDCG@20、ERR-IA@5、ERR-IA@10和ERR-IA@20等多样性指标上的提升率分别为31.5%、23.6%、25.5%、38.1%、36.9%和30.7%。【局限】 尽管分析了多样性指标α-nDCG@k和ERR-IA@k,但是没有针对推荐结果提出进一步的多样化方法。【结论】 CNMNN模型不仅可以计算查询和自然语言问句在短语级别的语义相关性,还避免了层次卷积操作导致的特征信号压缩问题。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
丁恒
李映萱
关键词 查询推荐深度学习社会化问答    
Abstract

[Objective] This paper develops a neural network model to improve the online questioning and answering services.[Methods] First, we retrieved and constructed our experimental dataset from Yahoo Answers and Yahoo! L6 platform. Then, we proposed a neural network model (CNMNN) based on semantic matching matrix,variable-size convolutional layer, and multiple layer perceptron. Finally, we compared the results our model with the MQ2QC、IBLM、DRMM and MatchPyramid methods. [Results] The proposed model was 45.0%, 38.7%, 33.4%, 34.8% and 52.9% higher than the best results on relevance metrics of nDCG@5, nDCG@10, nDCG@20, MRR and MAP. It also gained 31.5%, 23.6%, 25.5%, 38.1%, 36.9% and 30.7% improvements on diversity metrics of α-nDCG@5, α-nDCG@10, α-nDCG@20 and ERR-IA@5, ERR-IA@10 and ERR-IA@20.[Limitations] We did not include new method to further diversify the results.[Conclusions] The new CNMNN model can effectively calculate the semantic relevance between queries and natural language questions at phrase level. It also avoids the issue of feature signal compression due to hierarchical convolution operation.

Key wordsQuery Suggestion    Deep Learning    Community-based Question and Answering
收稿日期: 2019-12-04      出版日期: 2020-11-09
ZTFLH:  TP393  
基金资助:*本文系国家自然科学基金青年科学基金项目“基于深度语义表示和多文档摘要的学术文献自动综述研究”(71904058);中央高校基本科研业务费资助项目“基于动态引文网络的人工智能算法演化路径研究”的研究成果之一(KJ02072020-0200)
通讯作者: 丁恒     E-mail: me@gmail.com
引用本文:   
丁恒,李映萱. 基于深度学习的问答平台查询推荐研究*[J]. 数据分析与知识发现, 2020, 4(10): 37-46.
Ding Heng,Li Yingxuan. Improving Online Q&A Service with Deep Learning. Data Analysis and Knowledge Discovery, 2020, 4(10): 37-46.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2019.1301      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2020/V4/I10/37
Fig.1  问答平台查询推荐示例
Fig.2  CNMNN神经网络模型结构示意图
分值 判断规则
2 (C1) 用户可能会使用查询词搜索该问题
(C2) 用户可能会使用查询词的简单修改(通过同义替换修改一个词或词组)搜索该问题
1 (C3) 问题包含多个提问意图,且覆盖了查询词的提问意图
(C4) 用户可能会使用查询词的复杂修改(通过同义替换修改两个及以上的词或词组)搜索该问题
0 (C5) 查询词与自然语言问句无关
Table 1  相关性人工标注评分规则表
方法 nDCG@5 nDCG@10 nDCG@20 MRR MAP
MQ2QC 0.448 0.458 0.502 0.468 0.310
IBLM 0.447 0.470 0.503 0.481 0.334
DRMM 0.382 0.429 0.506 0.379 0.269
MatchPyramid 0.484 0.517 0.572 0.528 0.326
CNMNN 0.702 0.717 0.763 0.712 0.511
Table 2  相关性得分对比结果
方法 nDCG@5 nDCG@10 nDCG@20 MRR MAP
CNMNN(SDF) 0.508 0.524 0.581 0.536 0.302
CNMNN(MF) 0.698 0.713 0.748 0.677 0.495
Table 3  相关性得分对比结果
方法 ERR-IA α-nDCG
@5 @10 @20 @5 @10 @20
MQ2QC 0.122 0.140 0.153 0.407 0.412 0.479
IBLM 0.124 0.144 0.151 0.417 0.447 0.464
DRMM 0.079 0.109 0.136 0.427 0.440 0.468
MatchPyramid 0.014 0.125 0.139 0.475 0.484 0.538
CNMNN 0.163 0.178 0.192 0.656 0.663 0.703
Table 4  多样性得分对比结果
方法 查询词:14th Amendment
MQ2QC [1] what is the significance of the 14th amendment
[2] was the 13th 14th amendments ratifed
[3] how does the 14th amendment violate states rights
IBLM [1] what things have violated the 14th amendment
[2] does anyone know what the 14th amendment is
[3] is the death penalty a violation of the 8th and 14th amendments
DRMM [1] who made the 13th amendment
[2] what is the importance of the third amendment
[3] what is the 14th amendment
MatchPyramid [1] what things have violated the 14th amendment
[2] what exactly is the 14th amendment
[3] who made the 13th amendment
CNMNN [1] what is the 14th amendment
[2] is abortion a violation of the 14th amendment to the constitution of the us
[3] what is a 14th amendment citizen
Table 5  “14th Amendment” Top3查询推荐结果
[1] 李亚楠, 王斌, 李锦涛. 搜索引擎查询推荐技术综述[J]. 中文信息学报, 2010,24(6):75-84.
[1] ( Li Ya’nan, Wang Bin, Li Jintao. A Survey of Query Suggestion in Search Engine[J]. Journal of Chinese Information Processing, 2010,24(6):75-84.)
[2] 孟玲玲. 基于WordNet的语义相似性度量及其在查询推荐中的应用研究[D]. 上海: 华东师范大学, 2014.
[2] ( Meng Lingling. Research on Semantic Similarity Metric Based on WordNet and Its Application in Query Suggestion[D]. Shanghai: East China Normal University, 2014.)
[3] Yang J M, Cai R, Jing F, et al. Search-based Query Suggestion[C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. 2008: 1439-1440.
[4] 季岚石. 基于搜索日志的查询推荐算法研究[D]. 长春: 吉林大学, 2013.
[4] ( Ji Lanshi. The Query Recommendation Algorithm Research Based on the Search Logs[D]. Changchun: Jilin University, 2013.)
[5] Ding H, Balog K. Generating Synthetic Data for Neural Keyword-to-Question Models[C]//Proceedings of the 4th ACM SIGIR International Conference on the Theory of Information Retrieval. 2018: 51-58.
[6] Xu J X, Croft W B. Quary Expansion Using Local and Global Document Analysis[J]. ACM SIGIR Forum, 2017,51(2):168-175.
doi: 10.1145/3130348.3130364
[7] Garigliotti D, Balog K. Generating Query Suggestions to Support Task-based Search[C]//Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017: 1153-1156.
[8] Cao H H, Jiang D X, Pei J, et al. Context-aware Query Suggestion by Mining Click-through and Session Data[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2008: 875-883.
[9] Mei Q Z, Zhou D Y, Church K. Query Suggestion Using Hitting Time[C]//Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, 2008: 469-478.
[10] 张伟男. 社区型问答中问句检索关键技术研究[D]. 哈尔滨:哈尔滨工业大学, 2014.
[10] ( Zhang Weinan. Research on Key Techniques of Question Retrieval for Community Question Answering[D]. Harbin: Harbin Institute of Technology, 2014.)
[11] 刘欣, 席耀一, 王波, 等. WordNet和词向量相结合的句子检索方法[J]. 信息工程大学学报, 2017,18(4):486-491.
[11] ( Liu Xin, Xi Yaoyi, Wang Bo, et al. WordNet and Word Embedding Based Sentence Retrieval Method[J]. Journal of Information Engineering University, 2017,18(4):486-491.)
[12] Xue X B, Jeon J, Croft W B. Retrieval Models for Question and Answer Archives[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2008: 475-482.
[13] Zhou G, Cai L, Zhao J, et al. Phrase-based Translation Model for Question Retrieval in Community Question Answer Archives[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. 2011: 653-662.
[14] Ichikawa H, Hakoda K, Hashimoto T, et al. Efficient Sentence Retrieval Based on Syntactic Structure[C]//Proceedings of the COLING/ACL on Main Conference Poster Sessions. ACL, 2006: 399-406.
[15] Wang K, Ming Z Y, Chua T S, et al. A Syntactic Tree Matching Approach to Finding Similar Questions in Community-based QA Services[C]//Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2009: 187-194.
[16] Cai L, Zhou G Y, Liu K, et al. Learning the Latent Topics for Question Retrieval in Community QA[C]//Proceedings of the 5th International Joint Conference on Natural Language Processing. 2011: 273-281.
[17] Zhang K, Wu W, Wu H C, et al. Question Retrieval with High Quality Answers in Community Question Answering[C]//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 2014: 371-380.
[18] Gao Y J, Chen L, Li R, et al. Mapping Queries to Questions: Towards Understanding Users’ Information Needs[C]//Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013: 977-980.
[19] Wu H C, Wu W, Zhou M, et al. Improving Search Relevance for Short Queries in Community Question Answering[C]//Proceedings of the 7th ACM International Conference on Web Search and Data Mining. ACM, 2014: 43-52.
[20] Fan Y X, Pang L, Hou J P, et al. MatchZoo: A Toolkit for Deep Text Matching[OL]. arXiv Preprint, arXiv:1707.07270, 2017.
[21] Guo J F, Fan Y X, Ai Q Y, et al. A Deep Relevance Matching Model for Ad-hoc Retrieval[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2016: 55-64.
[22] Pang L, Lan Y Y, Guo J F, et al. Text Matching as Image Recognition[OL]. arXiv Preprint, arXiv: 1602.06359, 2016.
[23] Glorot X, Bordes A, Bengio Y. Deep Sparse Rectifier Neural Networks[J]. Journal of Machine Learning Research, 2011,15:315-323.
[24] Aghdam H H, Heravi E J. Guide to Convolutional Neural Networks: A Practical Application to Traffic-Sign Detection and Classification[M]. Springer, 2017.
[25] Kalchbrenner N, Grefenstette E, Blunsom P, et al. A Convolutional Neural Network for Modelling Sentences[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. ACL, 2014: 655-665.
[26] Fleiss J L, Cohen J. The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability[J]. Educational and Psychological Measurement, 1973,33(3):613-619.
doi: 10.1177/001316447303300309
[27] Burges C J C. From RankNet to LambdaRank to LambdaMART: An Overview[R/OL].[2010-08-02]. https://www.microsoft.com/en-us/research/uploads/prod/2016/02/MSR-TR-2010-82.pdf.
[28] Sanderson M. Test Collection Based Evaluation of Information Retrieval Systems[J]. Foundations and Trends in Information Retrieval, 2010,4(4):247-375.
doi: 10.1561/1500000009
[29] Clarke C L A, Kolla M, Cormack G V, et al. Novelty and Diversity in Information Retrieval Evaluation[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2008: 659-666.
[30] Chapelle O, Ji S H, Liao C Y, et al. Intent-based Diversification of Web Search Results: Metrics and Algorithms[J]. Information Retrieval, 2011,14(6):572-592.
doi: 10.1007/s10791-011-9167-7
[1] 周泽聿,王昊,赵梓博,李跃艳,张小琴. 融合关联信息的GCN文本分类模型构建及其应用研究*[J]. 数据分析与知识发现, 2021, 5(9): 31-41.
[2] 赵丹宁,牟冬梅,白森. 基于深度学习的科技文献摘要结构要素自动抽取方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[3] 徐月梅, 王子厚, 吴子歆. 一种基于CNN-BiLSTM多特征融合的股票走势预测模型*[J]. 数据分析与知识发现, 2021, 5(7): 126-138.
[4] 钟佳娃,刘巍,王思丽,杨恒. 文本情感分析方法及应用综述*[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[5] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[6] 马莹雪,甘明鑫,肖克峻. 融合标签和内容信息的矩阵分解推荐方法*[J]. 数据分析与知识发现, 2021, 5(5): 71-82.
[7] 张国标,李洁. 融合多模态内容语义一致性的社交媒体虚假新闻检测*[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[8] 常城扬,王晓东,张胜磊. 基于深度学习方法对特定群体推特的动态政治情感极性分析*[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[9] 冯勇,刘洋,徐红艳,王嵘冰,张永刚. 融合近邻评论的GRU商品推荐模型*[J]. 数据分析与知识发现, 2021, 5(3): 78-87.
[10] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[11] 胡昊天,吉晋锋,王东波,邓三鸿. 基于深度学习的食品安全事件实体一体化呈现平台构建*[J]. 数据分析与知识发现, 2021, 5(3): 12-24.
[12] 张琪,江川,纪有书,冯敏萱,李斌,许超,刘浏. 面向多领域先秦典籍的分词词性一体化自动标注模型构建*[J]. 数据分析与知识发现, 2021, 5(3): 2-11.
[13] 吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[14] 李丹阳, 甘明鑫. 基于多源信息融合的音乐推荐方法 *[J]. 数据分析与知识发现, 2021, 5(2): 94-105.
[15] 余传明, 张贞港, 孔令格. 面向链接预测的知识图谱表示模型对比研究*[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn