Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (4): 13-24     https://doi.org/10.11925/infotech.2096-3467.2020.1164
  综述评介 本期目录 | 过刊浏览 | 高级检索 |
近十年信息检索领域的研究热点与演化趋势研究——基于SIGIR会议论文的分析
李跃艳,王昊(),邓三鸿,王伟
南京大学信息管理学院 南京 210023
江苏省数据工程与知识服务重点实验室 南京 210023
Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers
Li Yueyan,Wang Hao(),Deng Sanhong,Wang Wei
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering & Knowledge Service, Nanjing 210023, China
全文: PDF (2431 KB)   HTML ( 21
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 实时准确地了解信息检索领域的研究热点和演化趋势,为本领域的研究人员提供参考和帮助,对于加速与交叉学科的融合,促进信息检索技术的快速应用具有至关重要的作用。【方法】 以SIGIR年会2008-2019年的录用论文作为数据源。首先,采用LDA模型识别并生成主题;其次,根据文献与主题的相似度过滤边缘文献,并通过计算文献主题区分度进行文档多主题划分;接着,通过构建领域主题在时间序列上的演化路径,展示主题的上升、下降及稳定三种演化方式;最后,通过模块化社团结构发现,构建单一主题的细粒度演化路径,充分展示主题群落内部知识单元间的动态演化过程。【结果】 本文方法避免了边缘文献对领域主题识别和演化路径造成的干扰,文献多主题划分有助于揭示主题之间的交叉融合。研究发现,目前信息检索领域主要以用户为中心,检索模型不断优化,注重过滤和推荐,注重语义网技术,深度学习方法得到广泛应用,医疗健康等应用领域逐渐成为信息检索领域重点关注的内容。【局限】 设置阈值过滤边缘文献并进行文献多主题划分,具有一定的主观性。【结论】 智能化与信息化将逐渐成为一种常态,用户对信息检索的需求更加凸显。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李跃艳
王昊
邓三鸿
王伟
关键词 信息检索LDA社会网络分析主题演变    
Abstract

[Objective] This paper summarizes the research development trends of information retrieval, aiming to promote interdisciplinary studies and application of related technologies. [Methods] First, we used LDA model to identify topics of papers accepted by the SIGIR Annual Conference from 2008 to 2019. Second, we removed irrelevant papers based on the similarity between documents and topics, and grouped papers into multiple categories by calculating topic discrimination. Third, we constructed the evolution path of domain topics in time series which showed the increasing, decreasing and stable patterns. Finally, we created the fine-grained evolution path of a single topic through the modular community, which demonstrated the dynamic evolution process of knowledge units within the topics. [Results] The proposed method avoids the interference of irrelevant documents on identifying topics and evolution paths. The multi-topic classification of documents helps reveal the cross-fusion among topics. The current information retrieval research trends include user-centric, continuously optimized models, filtering and recommending, semantic web technology, deep learning methods, as well as medical and health information retrieval. [Limitations] It might be subjective to remove irrelevant documents and categorize documents with multi-topics. [Conclusions] Intelligent information services is becoming a new norm, and users’ needs for information retrieval becomes more prominent.

Key wordsInformation Retrieval    LDA    Social Network Analysis    Topics Evolution
收稿日期: 2020-11-25      出版日期: 2020-12-15
ZTFLH:  分类号: G250  
基金资助:*国家自然科学基金面上项目(72074108);中央高校基本科研业务费专项资金资助项目的研究成果之一(010814370113)
通讯作者: 王昊     E-mail: ywhaowang@nju.edu.cn
引用本文:   
李跃艳,王昊,邓三鸿,王伟. 近十年信息检索领域的研究热点与演化趋势研究——基于SIGIR会议论文的分析[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers. Data Analysis and Knowledge Discovery, 2021, 5(4): 13-24.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2020.1164      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I4/13
Fig.1  主题困惑度
Fig.2  主题距离
序号 主题标号 主题标识 词项
1 topic1 挖掘和建模搜索活动 search user web engine behaviour session
2 topic2 排名学习和排名模型 rank learning model feature train algorithm
3 topic3 术语表示 term model retrieval document query weight
4 topic4 过滤与推荐 recommendation user item model collaborative system
5 topic5 交互式信息检索 system tutorial user application interface interactive
6 topic6 跨语言信息检索 language cross natural translation processing wikipedia
7 topic7 检索评价 collection test system evaluation performance effectiveness
8 topic8 深度学习 network model neural learn representation embed
9 topic9 网络搜索 click model advertisement privacy rate online
10 topic10 图像搜索 image tag visual annotation video content
11 topic11 社交搜索 social user news medium twitter content
12 topic12 问答系统 question answer expert community expertise collaborative
13 topic13 查询与查询分析 query suggestion completion context auto log
14 topic14 分类 model text classification semantic representation document
15 topic15 多样性搜索 document diversity search aspect diversification rank
16 topic16 * topic hash random similarity code walk
17 topic17 检索的效率和可伸缩性 search time engine efficiency algorithm index
18 topic18 聚类 time cluster feedback temporal tweet pseudo
19 topic19 语义网信息检索 entity knowledge link graph base recognition
20 topic20 评估指标 metric measure evaluation gain framework discount
21 topic21 文档摘要分析 document sentence summarization summary multi level
22 topic22 情感分析 web sentiment location opinion review mining
23 topic23 音乐检索 music passage sequence detection local similarity
24 topic24 相关性评估 judgment assessment crowdsourcing assessor label distribution
25 topic25 医疗信息搜索 medical match content video keyword domain
Table 1  主题-词项分布
Fig.3  呈上升趋势的主题
Fig.4  呈下降趋势的主题
Fig.5  呈稳定趋势的主题
Fig.6  主题热度谱图
Fig.7  “过滤与推荐”主题群落内部知识结构单元的动态演化
[1] Smeaton A F, Keogh G, Gurrin C, et al. Analysis of Papers from Twenty-Five Years of SIGIR Conferences: What Have We been Doing for the Last Quarter of a Century?[J]. ACM SIGIR Forum, 2002,37(1):49-53.
doi: 10.1145/945546.945550
[2] Hiemstra D, Hauff C, de Jong F , et al. SIGIR’s 30th Anniversary: An Analysis of Trends in IR Research and the Topology of Its Community[J]. ACM SIGIR Forum, 2007,41(2):18-24.
[3] 刘茜. SIGIR最新研究动向分析[J]. 图书馆学研究, 2007(2):88-90,60.
[3] ( Liu Qian. The Analysis of the Latest Research of SIGIR[J]. Researches in Library Science, 2007(2):88-90, 60.)
[4] 窦永香, 苏山佳, 赵捧未. 信息检索研究的发展与动向——对ACM SIGIR信息检索年会的主题分析[J]. 情报理论与实践, 2010,33(7):124-128.
[4] ( Dou Yongxiang, Su Shanjia, Zhao Pengwei. Progress and Development Trend in the Study of Information Retrieval[J]. Information Studies: Theory & Application, 2010,33(7):124-128.)
[5] 陈少涌, 李广建. 近十年来信息检索研究发展动向——基于SIGIR年会主题及论文集的统计分析[J]. 情报科学, 2015,33(5):150-156.
[5] ( Chen Shaoyong, Li Guangjian. Research on Information Retrieval over the Last Decade: Analysis of SIGIR Annual Conferences’ Research Topics and Proceedings[J]. Information Science, 2015,33(5):150-156.)
[6] 杨超凡, 邓仲华, 彭鑫, 等. 近5年信息检索的研究热点与发展趋势综述——基于相关会议论文的分析[J]. 数据分析与知识发现, 2017,1(7):35-43.
[6] ( Yang Chaofan, Deng Zhonghua, Peng Xin, et al. Review of Information Retrieval Research: Case Study of Conference Papers[J]. Data Analysis and Knowledge Discovery, 2017,1(7):35-43.)
[7] 赵忠伟, 程齐凯. 信息检索领域主题研究——基于SIGIR邮件列表和会议论文的比较研究[J]. 数字图书馆论坛, 2017(6):46-52.
[7] ( Zhao Zhongwei, Cheng Qikai. Research on the Subject of Information Retrieval: A Comparative Study Based on SIGIR Mailing List and Conference Papers[J]. Digital Libary Forum, 2017(6):46-52.)
[8] 杨建梁. iConference会议研究热点研究——基于2008~2017年会议论文的文本数据分析[J]. 情报资料工作, 2019,40(1):52-63.
[8] ( Yang Jianliang. A Probe into the Conference Research Hotspot of iConference: Based on Text Data Analysis of Conference Papers from 2008 to 2017[J]. Information and Documentation Services, 2019,40(1):52-63.)
[9] 杜丽君. 学科交叉视角下的信息检索研究主题演化分析——以情报学和计算机科学为例[J]. 信息技术与信息化, 2020(1):178-183.
[9] ( Du Lijun. An Analysis of the Evolution of Information Retrieval Research Subjects from the Perspective of Interdisciplinary Studies——Taking Information Science and Computer Science as Examples[J]. Information Technology and Informatization, 2020(1):178-183.)
[10] 郭红梅, 张智雄. 基于图挖掘的文本主题识别方法研究综述[J]. 中国图书馆学报, 2015,41(6):97-108.
[10] ( Guo Hongmei, Zhang Zhixiong. Methods of Text Theme Identification Based on Graph Mining[J]. Journal of Library Science in China, 2015,41(6):97-108.)
[11] 刘军. 整体网分析讲义: Ucinet软件实用指南[M]. 上海:格致出版社, 2009: 6-10.
[11] ( Liu Jun. Lectures on Whole Network Approach——A Practical Guide to Ucinet[M]. Shanghai:Truth & Wisdom Press, 2009: 6-10.)
[12] Chen C M. CiteSpace II: Detecting and Visualizing Emerging Trends and Transient Patterns in Scientific Literature[J]. Journal of the American Society for Information Science and Technology, 2006,57(3):359-377.
doi: 10.1002/(ISSN)1532-2890
[13] 王晓光, 程齐凯. 基于NEViewer的学科主题演化可视化分析[J]. 情报学报, 2013,32(9):900-911.
[13] ( Wang Xiaoguang, Cheng Qikai. Analysis on Evolution of Research Topics in a Discipline Based on NEViewer[J]. Journal of the China Society for Scientific and Technical Information, 2013,32(9):900-911.)
[14] Cobo M J, López-Herrera A G, Herrera-Viedma E, et al. An Approach for Detecting, Quantifying, and Visualizing the Evolution of a Research Field: A Practical Application to the Fuzzy Sets Theory Field[J]. Journal of Informetrics, 2011,5(1):146-166.
doi: 10.1016/j.joi.2010.10.002
[15] 刘勇, 杜一. 网络数据可视化与分析利器Gephi中文教程[M]. 北京:电子工业出版社, 2017.
[15] ( Liu Yong, Du Yi. A Chinese Course of Gephi——Used for Network Data Visualization and Analysis[M]. Beijing:Publishing House of Electronics Industry, 2017.)
[16] 刘自强, 王效岳, 白如江. 多维度视角下学科主题演化可视化分析方法研究——以我国图书情报领域大数据研究为例[J]. 中国图书馆学报, 2016,42(6):67-84.
[16] ( Liu Ziqiang, Wang Xiaoyue, Bai Rujiang. Research on Visualization Analysis Method of Discipline Topics Evolution from the Perspective of Multi-Dimensions: A Case Study of the Big Data in the Field of Library and Information Science in China[J]. Journal of the Library Science in China, 2016,42(6):67-84.)
[17] 曲佳彬, 欧石燕. 基于主题过滤与主题关联的学科主题演化分析[J]. 数据分析与知识发现, 2018,2(1):64-75.
[17] ( Qu Jiabin, Ou Shiyan. Analyzing Topic Evolution with Topic Filtering and Relevance[J]. Data Analysis and Knowledge Discovery, 2018,2(1):64-75.)
[18] Donohue J C. Understanding Scientific Literatures: A Bibliometric Approach[M]. Cambridge: the MIT Press, 1973: 49-50.
[19] Krestel R, Fankhauser P, Nejdl W. Latent Dirichlet Allocation for Tag Recommendation[C]// Proceedings of the 3rd ACM Conference on Recommender Systems. 2009: 61-68.
[20] Newman M E. Modularity and Community Structure in Networks[J]. Proceedings of the National Academy of Sciences of the United States of America, 2006,103(23):8577-8582.
[21] Arun R, Suresh V, Madhavan C E V, et al. On Finding the Natural Number of Topics with Latent Dirichlet Allocation: Some Observations[C]// Proceedings of the 14th Pacific-Asia Conference on Advances in Knowledge Discovery & Data Mining. 2010: 391-402.
[22] Li P, Burges C J, Wu Q. McRank: Learning to Rank Using Multiple Classification and Gradient Boosting[C]// Proceedings of the 20th International Conference on Neural Information Processing Systems. 2008: 897-904.
[23] Hinton G E, Salakhutdinov R R. Reducing the Dimensionality of Data with Neural Networks[J]. Science, 2006,313(5786):504-507.
doi: 10.1126/science.1127647
[24] 张彦文. Facebook社交搜索及其对图书馆服务的影响[J]. 图书馆论坛, 2014,34(10):115-121.
[24] ( Zhang Yanwen. Facebook Social Search and Its Impact on Library Service[J]. Library Tribune, 2014,34(10):115-121.)
[25] Gao J, Galley M, Li L, et al. Neural Approaches to Conversational AI[J]. Foundations and Trends® in Information Retrieval, 2019,13(2-3):127-298.
doi: 10.1561/1500000074
[1] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[2] 高伊林,闵超. 中美对“一带一路”沿线技术扩散结构比较研究*[J]. 数据分析与知识发现, 2021, 5(6): 80-92.
[3] 孟镇,王昊,虞为,邓三鸿,张宝隆. 基于特征融合的声乐分类研究*[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[4] 伊惠芳,刘细文. 一种专利技术主题分析的IPC语境增强Context-LDA模型研究[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[5] 王伟, 高宁, 徐玉婷, 王洪伟. 基于LDA的众筹项目在线评论主题动态演化分析*[J]. 数据分析与知识发现, 2021, 5(10): 103-123.
[6] 蔡永明,刘璐,王科唯. 网络虚拟学习社区重要用户与核心主题联合分析*[J]. 数据分析与知识发现, 2020, 4(6): 69-79.
[7] 叶光辉,曾杰妍,胡婧岚,毕崇武. 城市画像视角下的社会公众情感演化研究*[J]. 数据分析与知识发现, 2020, 4(4): 15-26.
[8] 潘有能,倪秀丽. 基于Labeled-LDA模型的在线医疗专家推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[9] 刘玉文,王凯. 面向地域的网络话题识别方法*[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[10] 叶光辉,徐彤,毕崇武,李心悦. 基于多维度特征与LDA模型的城市旅游画像演化分析*[J]. 数据分析与知识发现, 2020, 4(11): 121-130.
[11] 黄微,赵江元,闫璐. 网络热点事件话题漂移指数构建与实证研究*[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[12] 王晰巍,张柳,黄博,韦雅楠. 基于LDA的微博用户主题图谱构建及实证研究*——以“埃航空难”为例[J]. 数据分析与知识发现, 2020, 4(10): 47-57.
[13] 关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[14] 邵云飞,刘东苏. 基于类别特征扩展的短文本分类方法研究 *[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[15] 黄名选,卢守东,徐辉. 基于加权关联模式挖掘与规则后件扩展的跨语言信息检索 *[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn