Please wait a minute...
Advanced Search
数据分析与知识发现  2017, Vol. 1 Issue (12): 41-48     https://doi.org/10.11925/infotech.2096-3467.2017.0625
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于随机游走模型的排序学习方法*
贺婉莹, 杨建林()
南京大学信息管理学院 南京 210023
江苏省数据工程与知识服务重点实验室 南京 210023
Ranking Learning Method Based on Random Walk Model
He Wanying, Yang Jianlin()
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service, Nanjing 210023, China
全文: PDF (947 KB)   HTML ( 3
输出: BibTeX | EndNote (RIS)      
摘要 

目的】通过引入随机游走模型, 解决有监督排序学习中训练数据的标记信息难以获取的问题。【方法】提出一种基于重启随机游走模型的排序学习方法, 通过游走模型完成训练数据的自动标注, 降低排序学习对标记数据的依赖性, 并在OHSUMED数据集上进行实验。【结果】当已标注样本在数据集中占比50%时, 该方法能有效完成排序学习任务, 与标注样本占比100%的排序学习算法相比, 其排序效果明显优于RankNet算法, 略低于ListNet算法。【局限】本文方法要求对每个查询单独进行随机游走, 这对实际应用中多样查询下的文档标注工作来说仍然需要花费较多精力来完成。【结论】本文方法有很好的排序学习效果, 能有效解决排序学习中训练数据的标注难题。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
贺婉莹
杨建林
关键词 排序学习随机游走模型半监督学习ListNet    
Abstract

[Objective] This paper tries to obtain the tagging data of training corpus for supervised ranking learning tasks. [Methods] First, we proposed a ranking learning method based on the random walk model. Then, we used this method to automatically tag the training data, which also reduced the dependency of ranking on the tags. Finally, we examined our method with the OHSUMED data set. [Results] We finished the ranking learning tasks with only half of samples tagged. Compared with algorithms based on all tagged samples, performance of the proposed method was better than the RankNet algorithm but not as good as the ListNet one. [Limitations] Our method requires a random walk for each query, which is time consuming in practice. [Conclusions] The proposed method can effectively rank the learning results of training data.

Key wordsRanking Learning    Random Walk Model    Semi-supervised Learning    ListNet
收稿日期: 2017-06-29      出版日期: 2017-12-29
ZTFLH:  G350  
基金资助:*本文系国家社会科学基金项目“社会化信息搜寻认知模型研究”(项目编号: 14BTQ049)和江苏省社会科学基金重大项目“习近平总书记构建中国特色哲学社会科学重大命题研究”(项目编号: 16ZD004)的研究成果之一
引用本文:   
贺婉莹, 杨建林. 基于随机游走模型的排序学习方法*[J]. 数据分析与知识发现, 2017, 1(12): 41-48.
He Wanying,Yang Jianlin. Ranking Learning Method Based on Random Walk Model. Data Analysis and Knowledge Discovery, 2017, 1(12): 41-48.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2017.0625      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2017/V1/I12/41
  基于RWR的半监督排序算法流程
相关性得分 查询编号 特征值1 ... 特征值44 特征值45 文档编号
0 qid:1 0.2000 ... 0.0535 0.1157 docid = 82929
1 qid:1 0.0000 ... 0.0469 0.0389 docid = 264238
2 qid:1 0.4000 ... 0.0001 0.0000 docid = 156875
0 qid:2 0.7500 ... 1.0000 1.0000 docid = 40092
... ... ... ... ... ... ...
  QueryLevelNorm的数据格式
数据集编号 p=10% p=20% p=30% p=40% p=50% p=60%
S1 0.7034±0.080 0.7490±0.0348 0.7985±0.0363 0.8638±0.0361 0.9080±0.0420 0.9408±0.0454
S2 0.6850±0.0611 0.7343±0.0353 0.7928±0.0320 0.8772±0.0355 0.9190±0.0405 0.9327±0.0479
S3 0.6436±0.0407 0.7601±0.0278 0.8272±0.0270 0.8724±0.0291 0.9026±0.0319 0.9149±0.0381
S4 0.6959±0.0425 0.7398±0.0270 0.8168±0.0264 0.8987±0.0298 0.9213±0.0358 0.9232±0.0420
S5 0.7035±0.0413 0.7469±0.0215 0.8189±0.0206 0.8503±0.0248 0.9204±0.0299 0.9216±0.0355
  不同训练样本比例下的标注准确率
文件夹 训练集 验证集 测试集
Folder1 {S1, S2, S3} S4 S5
Folder2 {S2, S3, S4} S5 S1
Folder3 {S3, S4, S5} S1 S2
Folder4 {S4, S5, S1} S2 S3
Folder5 {S5, S1, S2} S3 S4
  用于5-折交叉验证的数据划分
排序学习方法 Folder1 Folder2 Folder3 Folder4 Folder5 平均
RankNet 0.3159 0.4518 0.4316 0.4924 0.4597 0.4303
RWR+ListNet 0.2937 0.4364 0.4298 0.47 0.4563 0.4172
ListNet 0.3588 0.4536 0.4459 0.5065 0.4686 0.4467
  三种排序方法MAP值对比
  p=40%时三种排序算法在NDGG@n上的对比
  p=50%时三种排序算法在NDGG@n上的对比
[1] Tian W, Zhu F.Learning to Rank Using Semantic Features in Document Retrieval[C]//Proceedings of Wri Global Congress on Intelligent Systems. IEEE Computer Society, 2009: 500-504.
[2] Farzi S, Faili H. Improving Statistical Machine Translation Using Syntax-based Learning-to-Rank System[J]. International Journal of Electrical Power & Energy Systems, 2015, 54(11): fqv032.
doi: 10.1093/llc/fqv032
[3] Grady L.Random Walks for Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(11): 1768-1783.
doi: 10.1109/TPAMI.2006.233 pmid: 17063682
[4] Zhang X, He B, Luo T.Training Query Filtering for Semi-supervised Learning to Rank with Pseudo Labels[J]. World Wide Web-Internet & Web Information Systems, 2016, 19(5): 833-864.
doi: 10.1007/s11280-015-0363-z
[5] Duh K, Kirchhoff K.Learning to Rank with Partially-labeled Data[C]//Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2008: 251-258.
[6] Duh K, Kirchhoff K.Semi-supervised Ranking for Document Retrieval[J]. Computer Speech & Language, 2011, 25(2): 261-281.
doi: 10.1016/j.csl.2010.05.002
[7] Kim K H, Choi S.Incremental Learning to Rank with Partially-labeled Data[C]//Proceedings of the Workshop on Web Search Click Data. ACM, 2009: 20-27.
[8] Abdel Hady M F, Schwenker F. Combining Committee-Based Semi-Supervised Learning and Active Learning[J]. Journal of Computer Science & Technology(计算机科学技术学报(英文版)), 2010, 25(4): 681-698.
doi: 10.1007/s11390-010-1053-z
[9] Usunier N, Amini M-R, Goutte C.Multiview Semi-supervised Learning for Ranking Multilingual Documents[C]// Proceedings of the 2011 European Conference on Machine Learning and Knowledge Discovery in Databases, 2011: 443-458.
[10] 何海江, 龙跃进. 适应文档检索的半监督多样本排序学习算法[J]. 计算机应用, 2011, 31(11): 3108-3111.
doi: 10.3724/SP.J.1087.2011.03108
[10] (He Haijiang, Long Yuejin.Semi-supervised Learning Listwise Ranking Functions for Document Retrival[J]. Journal of Computer Applications, 2011, 31(11): 3108-3111.)
doi: 10.3724/SP.J.1087.2011.03108
[11] 高炜, 梁立, 徐天伟, 等. 半监督k-部排序算法及在本体中的应用[J]. 中北大学学报: 自然科学版, 2013, 34(2): 140-146.
[11] (Gao Wei, Liang Li, Xu Tianwei, et al.Semi-supervised k-Partite Ranking Algorithm and Application in Ontology[J]. Journal of North University of China: Natural Science Edition, 2013, 34(2): 140-146.)
[12] 缪志高. 半监督排序学习算法研究[D]. 合肥: 中国科学技术大学, 2014.
[12] (Miao Zhigao.Research on Semi-supervised Ranking Algorithms[D]. Hefei: University of Science and Technology of China, 2014.)
[13] 张新. 半监督排序学习研究[D]. 北京: 中国科学院大学, 2015.
[13] (Zhang Xin.Research on Semi-supervised Ranking Learning[D]. Beijing: University of Chinese Academy of Sciences, 2015.)
[14] 奚凌然, 王小平. 一种结合LPA 半监督学习的排序学习算法[J]. 计算机应用与软件, 2016, 33(1): 286-290.
[14] (Xi Lingran, Wang Xiaoping.A Ranking Learning Algorithm Combing Semi-supervised Learning of LPA[J]. Computer Applications and Software, 2016, 33(1): 286-290.)
[15] Cao Y, Xu J, Liu T Y, et al.Adapting Ranking SVM to Document Retrieval[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2006: 186-193.
[16] Cao Z, Qin T, Liu T Y, et al.Learning to Rank: From Pairwise Approach to Listwise Approach[C]// Proceedings of the 24th International Conference on Machine Learning. ACM, 2007: 129-136.
[17] Grady L.Random Walks for Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(11): 1768-1783.
doi: 10.1109/TPAMI.2006.233 pmid: 17063682
[18] Tong H, Faloutsos C, Pan J Y.Fast Random Walk with Restart and Its Applications[C]//Proceedings of the International Conference on Data Mining. IEEE Computer Society, 2006: 613-622.
[19] Wang H, Huang H, Ding C.Image Annotation Using Bi-relational Graph of Images and Semantic Labels[C]// Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2011: 793-800.
[20] Szummer M, Jaakkola T.Partially Labeled Classification with Markov Random Walks[C]// Proceedings of the International Conference on Neural Information Processing Systems: Natural and Synthetic. MIT Press, 2001:945-952.
[21] 郑伟, 王朝坤, 刘璋, 等. 一种基于随机游走模型的多标签分类算法[C]. 见: NDBC 2010中国数据库学术会议集. 2010: 1418-1426.
[21] (Zheng Wei, Wang Chaokun, Liu Zhang, et al.A Multi-label Classfication Algorithm Based on Random Walk Model[C]// Proceedings of the NDBC 2010 National Database Conference. 2010:1418-1426.)
[22] Belkin M, Niyogi P, Sindhwani V.On Manifold Regularization[C]// Proceedings of the AISTATS. 2005.
[23] Qin T, Liu T Y, Xu J, et al.LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval[J]. Information Retrieval, 2010, 13(4): 346-374.
doi: 10.1007/s10791-009-9123-y
[24] Burges C, Shaked T, Renshaw E, et al.Learning To Rank Using Gradient Descent[C]//Proceedings of International Conference on Machine Learning. ACM, 2005:89-96.
[1] 刘彤,刘琛,倪维健. 多层次数据增强的半监督中文情感分析方法*[J]. 数据分析与知识发现, 2021, 5(5): 51-58.
[2] 张倩, 刘怀亮. 一种基于半监督学习的短文本分类方法[J]. 现代图书情报技术, 2013, 29(2): 30-35.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn