Please wait a minute...
Advanced Search
现代图书情报技术  2010, Vol. 26 Issue (11): 45-52     https://doi.org/10.11925/infotech.1003-3513.2010.11.07
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
基于能量演化线索的潜在爆发词探测方法
洪娜1, 张智雄2, 乐小虬2
1. 中国医学科学院医学信息研究所 北京 100020;
2. 中国科学院国家科学图书馆 北京 100190
Detection Method of Latent Burst Word Based on the Clue of Energy Evolution
Hong Na1, Zhang Zhixiong2, Le Xiaoqiu2
1. Institute of Medical Information,Chinese Academy of Medical Sciences, Beijing 100020,China;
2. National Science Library, Chinese Academy of Sciences, Beijing 100190, China
全文: PDF (524 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

从跟踪词能量演化线索的角度分析潜在爆发词探测的可行性,提出一种基于词的能量和能量增长趋势的潜在爆发词探测方法。首先对词的生命周期及其演化现象进行阐述,在方法分析和词的能量积累与衰减、能量趋势变化分析的基础上,提出建模依据,设计EneTr模型,并分别针对EneTr模型中的关键问题提出相应的解决方案,实现具体的算法,最后分别针对网络新闻和科学文献两种类型的文档流进行分析和实验,验证本方法的效果。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
洪娜
张智雄
乐小虬
关键词 时间序列爆发词潜在爆发词能量    
Abstract

This article analyzes the feasibility of latent burst word detection through tracking the clue of energy evolution, and proposes a method based on energy of words and energy evolution trends. First, it describes the life cycle and the evolution progress of words. Then based on the analysis of the energy accumulation and decay and the energy change trend, this article proposes the model evidence and establishes the EneTr model to detect the latent burst words. In addition, it proposes correspond solving method about the key problems of EneTr and implements the algorithm. Finaly, the model is separately validated by experiment on two different document streams which are Web news and scientific literature.

Key wordsTime series    Burst words    Latent burst words    Energy
收稿日期: 2010-10-12      出版日期: 2011-01-04
: 

G250.76

 
基金资助:

本文系国家社会科学基金项目“网络科技信息中爆发主题的监测与分析方法研究”(项目编号:09BTQ035)的研究成果之一。

引用本文:   
洪娜, 张智雄, 乐小虬. 基于能量演化线索的潜在爆发词探测方法[J]. 现代图书情报技术, 2010, 26(11): 45-52.
Hong Na, Zhang Zhixiong, Le Xiaoqiu. Detection Method of Latent Burst Word Based on the Clue of Energy Evolution. New Technology of Library and Information Service, 2010, 26(11): 45-52.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2010.11.07      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2010/V26/I11/45


[1] Roberts I, Wentz R, Edwards P.Car Manufacturers and Global Road Safety: A Word Frequency Analysis of Road Safety Documents
[J].Injury Prevention,2006(12):320-322.

[2] 赵蓉英,冼丽莹,董菡,等.2007年国内图书馆学研究热点分析及与国外之比较研究
[J].图书情报工作网刊,2009(1):1-7.

[3] 唐琴,许侃,林鸿飞.搜索引擎发展阶段研究及热点发现
[J].情报学报,2008,27(5):664-669.

[4] Charikar M, Chen K, Farach-Colton M. Finding Frequent Items in Data Streams
[C]. In: Proceedings of the 29th International Colloquium on Automata, Languages, and Programming.2002: 1530-1541.

[5] Havre S, Hetzler B, Nowell L. ThemeRiver: Visualizing Theme Changes Over Time
[C]. In: Proceedings of Information Visualization 2000.2000:115-123.

[6] He Q, Chang K, Lim E P. Analyzing Feature Trajectories for Event Detection
[C]. In: Proceedings of Annual ACM Conference on Research and Development in Information Retrieval.2007: 207-214.

[7] Kleinberg J.Bursty and Hierarchical Structure in Streams
[J]. Data Mining and Knowledge Discovery, 2003,7(4): 373-397.

[8] 魏晓俊.基于科技文献中词语的科技发展监测方法研究
[J].情报杂志,2007 (3):34-36.

[9] Swan R, Allan J. Automatic Generation of Overview Timelines
[C]. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2000:49-56.

[10] Swan R, Jensen D. TimeMines: Constructing Timelines with Statistical Models of Word Usage
[EB/OL].
[2010-01-12]. http://www.cs.cmu.edu/~dunja/KDDpapers/Swan_TM.pdf.

[11] Shaparenko B, Caruana R, Gehrke J, et al. Identifying Temporal Patterns and Key Players in Document Collections
[EB/OL].
[2010-09-20]. http://www.cs.cornell.edu/people/tj/publications/shaparenko_etal_05a.pdf.

[12] 章成志,梁勇.基于主题聚类的学科研究热点及其趋势监测方法
[J].情报学报,2010,29(2):342-349.

[13] Mane K K, Brner K.Mapping Topics and Topic Bursts in PNAS
[C]. In: Proceedings of the National Academy of Sciences of the United States of America. 2004: 5287-5290.

[14] 赵星,高小强,郭吉安,等. 基于主题词频和g指数的研究热点分析方法
[J]. 图书情报工作,2009,53 (2): 59-61,7.

[15] Zhang J, Tsui F C, Wagner M M, et al. Detection of Outbreaks from Time Series Data Using Wavelet Transform
[EB/OL].
[2010-09-29]. http://rods.health.pitt.edu/LIBRARY/AMIA03-JunZhang-fnl.pdf.

[16] Gosnell C F.The Rate of Obsolescence in College Library Book Collection by an Analysis of Three Select Lists of Books for College Libraries
[D]. New York: New York University,1943.

[17] Chen C C, Chen Y T, Sun Y, et al. Life Cycle Modeling of News Events Using Aging Theory
[C]. In:Proceeding of the 14th European Conference on Machine Learning.2003: 47-59.

[18] Wang C, Zhang M, Ru L,et al. Automatic Online News Topic Ranking Using Media Focus and User Attention Based on Aging Theory
[C]. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management.2008: 1033-1042.

[19] Medelyna O. Automatic Keyphrase Indexing with a Domain-Specific Thesaurus
[D]. Germany:University of Freiburg, 2005.

[20] Cimiano P, Volker J. Text2Onto-A Framework for Ontology Learning and Data-driven Change Discovery
[C]. In: Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems.2005: 227-238.

[1] 丁浩, 艾文华, 胡广伟, 李树青, 索炜. 融合用户兴趣波动时序的个性化推荐模型*[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[2] 颜靖华,侯苗苗. 基于LSTM网络的盗窃犯罪时间序列预测研究*[J]. 数据分析与知识发现, 2020, 4(11): 84-91.
[3] 丁浩,李树青. 基于用户多类型兴趣波动趋势预测分析的个性化推荐方法 *[J]. 数据分析与知识发现, 2019, 3(11): 43-51.
[4] 王伟军, 鲍丽倩, 刘凯. 时间维度的云服务发展态势研究[J]. 现代图书情报技术, 2014, 30(3): 42-48.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn