一种基于生命周期理论的文献热点发现方法——以肿瘤领域为例

doi:10.11925/infotech.1003-3513.2012.11.14

现代图书情报技术

2012, Vol.

Issue (11): 86-91 https://doi.org/10.11925/infotech.1003-3513.2012.11.14

情报分析与研究

本期目录 | 过刊浏览 | 高级检索

一种基于生命周期理论的文献热点发现方法——以肿瘤领域为例

赵迎光, 安新颖, 李勇, 贾晓峰

中国医学科学院医学信息研究所北京 100020

A Method for Detecting the Hot Topic of Literature Based on Lifecycle——A Case Study of Neoplasm Field

Zhao Yingguang, An Xinying, Li Yong, Jia Xiaofeng

Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China

摘要
参考文献
相关文章
Metrics

全文: PDF (696 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要针对文献热点发现方法存在的指标单一、高频常用词过滤效果不明显等问题,将TDT领域的生命周期理论和TF*PDF方法应用到文献热点发现中,通过跟踪词在时间上的变化率来发现热点词,并确定热点出现的具体时间。实验结果表明,该方法能够有效过滤掉高频常用词,对各时间窗内的研究热点有较高的识别率。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	赵迎光
	安新颖
	李勇
	贾晓峰

关键词 ：生命周期理论, 热点发现, 文本挖掘

Abstract：There are some shortcomings of hot topic detection in literature,such as single index and the inefficient filtering of high-frequency common words. The paper applies lifecycle theory and TF*PDF algorithm to literature detection, which finds the hot words by tracking the variation of words over time, then locates the time hot words appeared. The results of the empirical tests show that this approach is effective in filtering high frequently used terms and identifying hot research topics in time windows.

Key words： Lifecycle theory Hot topic detection Text mining

收稿日期: 2012-10-29 出版日期: 2013-02-06

G250

基金资助:本文系国家“十二五”科技支撑计划基金项目“基于STKOS的科技监测应用示范”(项目编号:2011BAH10B06-02)、中国医学科学院医学信息研究所中央级公益性科研院所基本科研业务经费基金项目“基于阈值自动设置的热点识别方法研究”(项目编号:12R0118)和教育部人文社会科学青年基金项目“基于知识组织体系的科技文献新主题监测研究”(项目编号:11YJC870001)的研究成果之一。

通讯作者: 赵迎光 E-mail: zhao.yingguang@imicams.ac.cn

引用本文:

赵迎光, 安新颖, 李勇, 贾晓峰. 一种基于生命周期理论的文献热点发现方法——以肿瘤领域为例[J]. 现代图书情报技术, 2012, (11): 86-91.
Zhao Yingguang, An Xinying, Li Yong, Jia Xiaofeng. A Method for Detecting the Hot Topic of Literature Based on Lifecycle——A Case Study of Neoplasm Field. New Technology of Library and Information Service, 2012, (11): 86-91.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2012.11.14 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2012/V/I11/86

[1] 章成志, 梁勇. 基于主题聚类的学科研究热点与研究趋势监测方法[J]. 情报学报, 2010, 29 (2): 342-349.(Zhang Chengzhi, Liang Yong. Detecting Hotspot and Trend of Disciplines Using Topic Clustering[J]. Journal of the China Society for Scientific and Technical Information, 2010,29 (2): 342-349.)
[2] Mrchen F, Dejori M, Fradkin D, et al. Anticipating Annotations and Emerging Trends in Biomedical Literature[C]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08), Las Vegas, Nevada, USA. New York: ACM, 2008:954-962.
[3] Swan R, Jensen D. TimeMines: Constructing TimeMines with Statistical Models of Word Usage[C]. In: Proceedings of the ACM SIGKDD 2000 Workshop on Text Mining, Boston, MA, USA. ACM, 2000:73-80.
[4] Guo H,Weingart S, Brner K. Mixed-indicators Model for Identifying Emerging Research Areas[J]. Scientometrics, 2011, 89(1):421-435.
[5] Bun K K, Ishizuka M. Topic Extraction from News Archive Using TF*PDF Algorithm[C]. In: Proceedings of the 3rd International Conference on Web Information Systems Engineering (WISE'02), Singapore. Washington, DC: IEEE Computer Society, 2002: 73-82.
[6] Chen K Y, Luesukprasert L, Chou S T. Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling[J]. IEEE Transactions on Knowledge and Data Engineering, 2007,19(8):1016-1025.
[7] Kumaran G, Allan J. Text Classification and Named Entities for New Event Detection[C]. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'04), Sheffield, UK. New York: ACM, 2004: 297-304.
[8] Pazienza M T. Information Extraction in the Web Era[M]. Springer, 2003.
[9] Hisamitsu T, Niwa Y. A Measure of Term Representativeness Based on the Number of Co-occurring Salient Words[C].In: Proceedings of the 19th International Conference on Computational Linguistics (COLING'02). Stroudsburg: Association for Computational Linguistics, 2002:1-7.
[10] Holmes D E, Jain L C. Data Mining: Foundations and Intelligent Paradigms,Volume 1: Clustering, Association and Classification[M]. Springer,2012.
[11] Bun K K, Ishizuka M. Emerging Topic Tracking System[C].In: Proceedings of the 1st Asia-Pacific Conference on Web Intelligence: Research and Development (WI'01). London:Springer-Verlag, 2001:125-130.
[12] Chen C C, Chen Y T, Sun Y S, et al. Life Cycle Modeling of News Events Using Aging Theory[C]. In: Proceedings of Machine Learning: ECML 2003. Berlin,Heidelberg:Springer-Verlag, 2003: 47-59.
[13] Liu M, Liu Y, Xiang L, et al. Extracting Key Entities and Significant Events from Online Daily News[C]. In: Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL'08), Daejeon, South Korea. Springer, 2008: 201-209.
[14] Wang C, Zhang M, Ru L, et al. Automatic Online News Topic Ranking Using Media Focus and User Attention Based on Aging Theory[C]. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08), Napa Valley, California. New York: ACM, 2008: 1033-1042.
[15] Zheng D H, Li F. Hot Topic Detection on BBS Using Aging Theory[C]. In: Proceedings of the International Conference on Web Information Systems and Mining (WISM'09). Berlin, Heidelberg:Springer-Verlag,2009: 129-138.
[16] Lee Y, Jung H Y, Song W S, et al. Mining the Blogosphere for Top News Stories Identification[C].In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10), Geneva, Switzerland. New York: ACM, 2010: 395-402.
[17] MetaMap[EB/OL].[2012-09-26]. http://metamap.nlm.nih.gov/.
[18] 王福飞. microRNA在肿瘤诊治中的进展[J]. 江西医药, 2011, 46(6): 580-582.(Wang Fufei. The Progress of microRNA in Cancer Diagnosis and Treatment[J]. Jiangxi Medical Journal, 2011, 46(6): 580-582.)
[19] 侯萍,李剑平.肿瘤干细胞的研究进展[J]. 中国组织工程研究与临床康复,2011,15(14):2629-2632.(Hou Ping, Li Jianping. Advances in Cancer Stem Cell Research[J].Journal of Clinical Rehabilitative Tissue Engineering Research, 2011, 15(14): 2629-2632.)

[1]	黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展^*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[2]	许光,任明,宋城宇. 西方媒体新闻中的中国经济形象提取^*[J]. 数据分析与知识发现, 2021, 5(5): 30-40.
[3]	代冰,胡正银. 基于文献的知识发现新近研究综述 ^*[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[4]	余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[5]	夏天. 面向中文学术文本的单文档关键短语抽取 ^*[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[6]	马建霞,袁慧,蒋翔. 基于Bi-LSTM+CRF的科学文献中生态治理技术相关命名实体抽取研究^*[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[7]	杜建. 医学知识不确定性测度的进展与展望^*[J]. 数据分析与知识发现, 2020, 4(10): 14-27.
[8]	关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[9]	黄名选,卢守东,徐辉. 基于加权关联模式挖掘与规则后件扩展的跨语言信息检索 ^*[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[10]	杨亚楠,赵文辉,张健,谭珅,张贝贝. 基于多视图协同的政策文本可视化研究^*[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[11]	张梦吉,杜婉钰,郑楠. 引入新闻短文本的个股走势预测模型[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[12]	张宁, 尹乐民, 何立峰. 网络股评“发布者-关注者”BSI与股票市场关联性研究^*[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[13]	范馨月, 崔雷. 基于文本挖掘的药物副作用知识发现研究[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[14]	汪强兵, 章成志. 融合内容与用户手势行为的用户画像构建系统设计与实现^*[J]. 数据分析与知识发现, 2017, 1(2): 80-86.
[15]	谢秀芳, 张晓林. 针对科技路线图的文本挖掘研究: 集成分析及可视化^*[J]. 数据分析与知识发现, 2017, 1(1): 16-25.

Viewed

Full text

Abstract

Cited

Shared

Discussed