|
|
Detection Method of Latent Burst Word Based on the Clue of Energy Evolution |
Hong Na1, Zhang Zhixiong2, Le Xiaoqiu2 |
1. Institute of Medical Information,Chinese Academy of Medical Sciences, Beijing 100020,China;
2. National Science Library, Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract This article analyzes the feasibility of latent burst word detection through tracking the clue of energy evolution, and proposes a method based on energy of words and energy evolution trends. First, it describes the life cycle and the evolution progress of words. Then based on the analysis of the energy accumulation and decay and the energy change trend, this article proposes the model evidence and establishes the EneTr model to detect the latent burst words. In addition, it proposes correspond solving method about the key problems of EneTr and implements the algorithm. Finaly, the model is separately validated by experiment on two different document streams which are Web news and scientific literature.
|
Received: 12 October 2010
Published: 04 January 2011
|
|
[1] Roberts I, Wentz R, Edwards P.Car Manufacturers and Global Road Safety: A Word Frequency Analysis of Road Safety Documents [J].Injury Prevention,2006(12):320-322.
[2] 赵蓉英,冼丽莹,董菡,等.2007年国内图书馆学研究热点分析及与国外之比较研究 [J].图书情报工作网刊,2009(1):1-7.
[3] 唐琴,许侃,林鸿飞.搜索引擎发展阶段研究及热点发现 [J].情报学报,2008,27(5):664-669.
[4] Charikar M, Chen K, Farach-Colton M. Finding Frequent Items in Data Streams [C]. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming.2002: 1530-1541.
[5] Havre S, Hetzler B, Nowell L. ThemeRiver: Visualizing Theme Changes Over Time [C]. In: Proceedings of Information Visualization 2000.2000:115-123.
[6] He Q, Chang K, Lim E P. Analyzing Feature Trajectories for Event Detection [C]. In: Proceedings of Annual ACM Conference on Research and Development in Information Retrieval.2007: 207-214.
[7] Kleinberg J.Bursty and Hierarchical Structure in Streams [J]. Data Mining and Knowledge Discovery, 2003,7(4): 373-397.
[8] 魏晓俊.基于科技文献中词语的科技发展监测方法研究 [J].情报杂志,2007 (3):34-36.
[9] Swan R, Allan J. Automatic Generation of Overview Timelines [C]. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2000:49-56.
[10] Swan R, Jensen D. TimeMines: Constructing Timelines with Statistical Models of Word Usage [EB/OL]. [2010-01-12]. http://www.cs.cmu.edu/~dunja/KDDpapers/Swan_TM.pdf.
[11] Shaparenko B, Caruana R, Gehrke J, et al. Identifying Temporal Patterns and Key Players in Document Collections [EB/OL]. [2010-09-20]. http://www.cs.cornell.edu/people/tj/publications/shaparenko_etal_05a.pdf.
[12] 章成志,梁勇.基于主题聚类的学科研究热点及其趋势监测方法 [J].情报学报,2010,29(2):342-349.
[13] Mane K K, Brner K.Mapping Topics and Topic Bursts in PNAS [C]. In: Proceedings of the National Academy of Sciences of the United States of America. 2004: 5287-5290.
[14] 赵星,高小强,郭吉安,等. 基于主题词频和g指数的研究热点分析方法 [J]. 图书情报工作,2009,53 (2): 59-61,7.
[15] Zhang J, Tsui F C, Wagner M M, et al. Detection of Outbreaks from Time Series Data Using Wavelet Transform [EB/OL]. [2010-09-29]. http://rods.health.pitt.edu/LIBRARY/AMIA03-JunZhang-fnl.pdf.
[16] Gosnell C F.The Rate of Obsolescence in College Library Book Collection by an Analysis of Three Select Lists of Books for College Libraries [D]. New York: New York University,1943.
[17] Chen C C, Chen Y T, Sun Y, et al. Life Cycle Modeling of News Events Using Aging Theory [C]. In:Proceeding of the 14th European Conference on Machine Learning.2003: 47-59.
[18] Wang C, Zhang M, Ru L,et al. Automatic Online News Topic Ranking Using Media Focus and User Attention Based on Aging Theory [C]. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management.2008: 1033-1042.
[19] Medelyna O. Automatic Keyphrase Indexing with a Domain-Specific Thesaurus [D]. Germany:University of Freiburg, 2005.
[20] Cimiano P, Volker J. Text2Onto-A Framework for Ontology Learning and Data-driven Change Discovery [C]. In: Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems.2005: 227-238.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|