Detection Method of Latent Burst Word Based on the Clue of Energy Evolution
Hong Na1, Zhang Zhixiong2, Le Xiaoqiu2
1. Institute of Medical Information,Chinese Academy of Medical Sciences, Beijing 100020,China;
2. National Science Library, Chinese Academy of Sciences, Beijing 100190, China
This article analyzes the feasibility of latent burst word detection through tracking the clue of energy evolution, and proposes a method based on energy of words and energy evolution trends. First, it describes the life cycle and the evolution progress of words. Then based on the analysis of the energy accumulation and decay and the energy change trend, this article proposes the model evidence and establishes the EneTr model to detect the latent burst words. In addition, it proposes correspond solving method about the key problems of EneTr and implements the algorithm. Finaly, the model is separately validated by experiment on two different document streams which are Web news and scientific literature.
洪娜, 张智雄, 乐小虬. 基于能量演化线索的潜在爆发词探测方法[J]. 现代图书情报技术, 2010, 26(11): 45-52.
Hong Na, Zhang Zhixiong, Le Xiaoqiu. Detection Method of Latent Burst Word Based on the Clue of Energy Evolution. New Technology of Library and Information Service, 2010, 26(11): 45-52.
[1] Roberts I, Wentz R, Edwards P.Car Manufacturers and Global Road Safety: A Word Frequency Analysis of Road Safety Documents [J].Injury Prevention,2006(12):320-322.
[4] Charikar M, Chen K, Farach-Colton M. Finding Frequent Items in Data Streams [C]. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming.2002: 1530-1541.
[5] Havre S, Hetzler B, Nowell L. ThemeRiver: Visualizing Theme Changes Over Time [C]. In: Proceedings of Information Visualization 2000.2000:115-123.
[6] He Q, Chang K, Lim E P. Analyzing Feature Trajectories for Event Detection [C]. In: Proceedings of Annual ACM Conference on Research and Development in Information Retrieval.2007: 207-214.
[7] Kleinberg J.Bursty and Hierarchical Structure in Streams [J]. Data Mining and Knowledge Discovery, 2003,7(4): 373-397.
[9] Swan R, Allan J. Automatic Generation of Overview Timelines [C]. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2000:49-56.
[10] Swan R, Jensen D. TimeMines: Constructing Timelines with Statistical Models of Word Usage [EB/OL]. [2010-01-12]. http://www.cs.cmu.edu/~dunja/KDDpapers/Swan_TM.pdf.
[11] Shaparenko B, Caruana R, Gehrke J, et al. Identifying Temporal Patterns and Key Players in Document Collections [EB/OL]. [2010-09-20]. http://www.cs.cornell.edu/people/tj/publications/shaparenko_etal_05a.pdf.
[13] Mane K K, Brner K.Mapping Topics and Topic Bursts in PNAS [C]. In: Proceedings of the National Academy of Sciences of the United States of America. 2004: 5287-5290.
[15] Zhang J, Tsui F C, Wagner M M, et al. Detection of Outbreaks from Time Series Data Using Wavelet Transform [EB/OL]. [2010-09-29]. http://rods.health.pitt.edu/LIBRARY/AMIA03-JunZhang-fnl.pdf.
[16] Gosnell C F.The Rate of Obsolescence in College Library Book Collection by an Analysis of Three Select Lists of Books for College Libraries [D]. New York: New York University,1943.
[17] Chen C C, Chen Y T, Sun Y, et al. Life Cycle Modeling of News Events Using Aging Theory [C]. In:Proceeding of the 14th European Conference on Machine Learning.2003: 47-59.
[18] Wang C, Zhang M, Ru L,et al. Automatic Online News Topic Ranking Using Media Focus and User Attention Based on Aging Theory [C]. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management.2008: 1033-1042.
[19] Medelyna O. Automatic Keyphrase Indexing with a Domain-Specific Thesaurus [D]. Germany:University of Freiburg, 2005.
[20] Cimiano P, Volker J. Text2Onto-A Framework for Ontology Learning and Data-driven Change Discovery [C]. In: Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems.2005: 227-238.