|
|
A Method for Detecting the Hot Topic of Literature Based on Lifecycle——A Case Study of Neoplasm Field |
Zhao Yingguang, An Xinying, Li Yong, Jia Xiaofeng |
Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing 100020, China |
|
|
Abstract There are some shortcomings of hot topic detection in literature,such as single index and the inefficient filtering of high-frequency common words. The paper applies lifecycle theory and TF*PDF algorithm to literature detection, which finds the hot words by tracking the variation of words over time, then locates the time hot words appeared. The results of the empirical tests show that this approach is effective in filtering high frequently used terms and identifying hot research topics in time windows.
|
Received: 29 October 2012
Published: 06 February 2013
|
|
[1] 章成志, 梁勇. 基于主题聚类的学科研究热点与研究趋势监测方法[J]. 情报学报, 2010, 29 (2): 342-349.(Zhang Chengzhi, Liang Yong. Detecting Hotspot and Trend of Disciplines Using Topic Clustering[J]. Journal of the China Society for Scientific and Technical Information, 2010,29 (2): 342-349.) [2] Mrchen F, Dejori M, Fradkin D, et al. Anticipating Annotations and Emerging Trends in Biomedical Literature[C]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08), Las Vegas, Nevada, USA. New York: ACM, 2008:954-962. [3] Swan R, Jensen D. TimeMines: Constructing TimeMines with Statistical Models of Word Usage[C]. In: Proceedings of the ACM SIGKDD 2000 Workshop on Text Mining, Boston, MA, USA. ACM, 2000:73-80. [4] Guo H,Weingart S, Brner K. Mixed-indicators Model for Identifying Emerging Research Areas[J]. Scientometrics, 2011, 89(1):421-435. [5] Bun K K, Ishizuka M. Topic Extraction from News Archive Using TF*PDF Algorithm[C]. In: Proceedings of the 3rd International Conference on Web Information Systems Engineering (WISE'02), Singapore. Washington, DC: IEEE Computer Society, 2002: 73-82. [6] Chen K Y, Luesukprasert L, Chou S T. Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling[J]. IEEE Transactions on Knowledge and Data Engineering, 2007,19(8):1016-1025. [7] Kumaran G, Allan J. Text Classification and Named Entities for New Event Detection[C]. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'04), Sheffield, UK. New York: ACM, 2004: 297-304. [8] Pazienza M T. Information Extraction in the Web Era[M]. Springer, 2003. [9] Hisamitsu T, Niwa Y. A Measure of Term Representativeness Based on the Number of Co-occurring Salient Words[C].In: Proceedings of the 19th International Conference on Computational Linguistics (COLING'02). Stroudsburg: Association for Computational Linguistics, 2002:1-7. [10] Holmes D E, Jain L C. Data Mining: Foundations and Intelligent Paradigms,Volume 1: Clustering, Association and Classification[M]. Springer,2012. [11] Bun K K, Ishizuka M. Emerging Topic Tracking System[C].In: Proceedings of the 1st Asia-Pacific Conference on Web Intelligence: Research and Development (WI'01). London:Springer-Verlag, 2001:125-130. [12] Chen C C, Chen Y T, Sun Y S, et al. Life Cycle Modeling of News Events Using Aging Theory[C]. In: Proceedings of Machine Learning: ECML 2003. Berlin,Heidelberg:Springer-Verlag, 2003: 47-59. [13] Liu M, Liu Y, Xiang L, et al. Extracting Key Entities and Significant Events from Online Daily News[C]. In: Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL'08), Daejeon, South Korea. Springer, 2008: 201-209. [14] Wang C, Zhang M, Ru L, et al. Automatic Online News Topic Ranking Using Media Focus and User Attention Based on Aging Theory[C]. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM'08), Napa Valley, California. New York: ACM, 2008: 1033-1042. [15] Zheng D H, Li F. Hot Topic Detection on BBS Using Aging Theory[C]. In: Proceedings of the International Conference on Web Information Systems and Mining (WISM'09). Berlin, Heidelberg:Springer-Verlag,2009: 129-138. [16] Lee Y, Jung H Y, Song W S, et al. Mining the Blogosphere for Top News Stories Identification[C].In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10), Geneva, Switzerland. New York: ACM, 2010: 395-402. [17] MetaMap[EB/OL].[2012-09-26]. http://metamap.nlm.nih.gov/. [18] 王福飞. microRNA在肿瘤诊治中的进展[J]. 江西医药, 2011, 46(6): 580-582.(Wang Fufei. The Progress of microRNA in Cancer Diagnosis and Treatment[J]. Jiangxi Medical Journal, 2011, 46(6): 580-582.) [19] 侯萍,李剑平.肿瘤干细胞的研究进展[J]. 中国组织工程研究与临床康复,2011,15(14):2629-2632.(Hou Ping, Li Jianping. Advances in Cancer Stem Cell Research[J].Journal of Clinical Rehabilitative Tissue Engineering Research, 2011, 15(14): 2629-2632.) |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|