Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 200-206    DOI: 10.11925/infotech.2096-3467.2019.0634
Current Issue | Archive | Adv Search |
Tracking Static Topics with Bayesian Network
Xu Jianmin(),Zhang Liqing,Wang Miao
School of Cyber Security and Computer, Hebei University, Baoding 071002, China
Download: PDF (710 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The paper analyzed the feasibility of using Bayesian network for topic tracking, and proposed a new method to improve its performance.[Methods] We constructed two topic tracking models, one with Bayesian Network, and the other with Extended Bayesian Network. The nodes in the models represent terms, events and topics, while the arcs represent relationships among nodes. Finally, we calculated the similarity among topics, events and reports with the Propagation and Evaluation method.[Results] We examined our models on TDT4 data set and found the DET curve of the Bayesian Network model was below the curve of vector space topic model, the former had better performance. The result of extended Bayesian network topic tracking model was 1.7% higher than the first one.[Limitations] Extended Bayesian network topic tracking model was a static topic model while events were generated by the evolution of topics, so the model had limited performance improvement.[Conclusions] The new models can describe the structural relationships among topics, events and stories, and conduct probability inference, which improve the performance of topic tracking effectively.

Key wordsBayesian Network      Topic Tracking      Event      Static Topic Model     
Received: 10 June 2019      Published: 26 April 2020
ZTFLH:  TP391.1  
Corresponding Authors: Jianmin Xu     E-mail: hbuxjm@hbu.edu.cn

Cite this article:

Xu Jianmin,Zhang Liqing,Wang Miao. Tracking Static Topics with Bayesian Network. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 200-206.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0634     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I2/3/200

Bayesian Network
BNTT Model
E_BNTT Model
真实为“是” 真实为“否”
模型判断为“是” a b
模型判断为“否” c d
Parameters Description
δ Pmiss Pfa optimal((Cdet)norm)
0.05 0.093 46 0.012 81 0.156 21
0.10 0.074 77 0.013 15 0.139 22
0.15 0.065 42 0.015 58 0.141 74
0.20 0.062 31 0.018 00 0.150 50
0.25 0.096 57 0.015 58 0.172 90
0.30 0.093 46 0.016 61 0.174 87
0.35 0.115 26 0.020 08 0.213 64
Performance of E_BNTT Model with Different Values of Parameter δ
Performance of BNTT and VSM
性能

模型
BNTT E_BNTT
Pmiss 0.093 46 0.065 42
Pfa 0.012 81 0.015 58
optimal((Cdet)norm) 0.156 21 0.139 22
Performance of BNTT and E_BNTT
[1] 洪宇, 仓玉, 姚建民 , 等. 话题跟踪中静态和动态话题模型的核捕捉衰减[J]. 软件学报, 2012,23(5):1100-1119.
[1] ( Hong Yu, Cang Yu, Yao Jianmin , et al. Descending Kernel Track of Static and Dynamic Topic Models in Topic Tracking[J]. Journal of Software, 2012,23(5):1100-1119.)
[2] Allan J, Papka R, Lavrenko V . On-Line New Event Detection and Tracking [C]// Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1998: 37-75.
[3] 屈庆涛, 刘其成, 牟春晓 . 基于N-Gram语言模型的并行自适应新闻话题追踪算法[J]. 山东大学学报:工学版, 2018,48(6):37-43.
[3] ( Qu Qingtao, Liu Qicheng, Mu Chunxiao . A Parallel Adaptive News Topic Tracking Algorithm Based on N-Gram Language Model[J]. Journal of Shandong University: Engineering Science, 2018,48(6):37-43.)
[4] 王亚民, 胡悦 . 基于BTM的微博舆情热点发现[J]. 情报杂志, 2016,35(11):119-124, 140.
[4] ( Wang Yamin, Hu Yue . Hotspot Detection in Microblog Public Opinion Based on Biterm Topic Model[J]. Journal of Intelligence, 2016,35(11):119-124, 140.)
[5] 宋莉娜, 冯旭鹏, 刘利军 , 等. 基于SOM聚类的微博话题发现[J]. 计算机应用研究, 2018,35(3):671-674, 679.
[5] ( Song Lina, Feng Xupeng, Liu Lijun , et al. Microblog Topics Detection Based on SOM Clustering[J]. Application Research of Computers, 2018,35(3):671-674, 679.)
[6] Xu J M, Wu S F, Hong Y . Topic Tracking with Bayesian Belief Network[J]. Optik, 2014,125(9):2164-2169.
[7] De Campos L M, Fernández-Luna J M, Huete J F . The BNR Model: Foundations and Performance of a Bayesian Network-Based Retrieval Model[J]. International Journal of Approximate Reasoning, 2003,34(2-3):265-285.
[8] Doddington G, Fiscus J . The 2002 Topic Detection and Tracking (TDT2002) Task Definition and Evaluation Plan[R]. 2002.
[9] 郑伟, 侯宏旭, 武静 . 贝叶斯网络在信息检索中的应用[J]. 情报科学, 2018,36(6):136-141.
[9] ( Zheng Wei, Hou Hongxu, Wu Jing . Application of Bayesian Network for Information Retrieval[J]. Information Science, 2018,36(6):136-141.)
[10] Turtle H R, Croft W B . Inference Networks for Document Retrieval [C]// Proceedings of the 13th SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1989: 1-24.
[11] Ribeiro-Neto B A N, Muntz R . A Belief Network Model for IR [C]// Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval. New York: ACM, 1996: 253-260.
[12] Acid S, De Campos L M, Fernández-Luna J M , et al. An Information Retrieval Model Based on Simple Bayesian Networks[J]. International Journal of Intelligent Systems, 2003,18(2):251-265.
[13] 周楠, 杜攀, 靳小龙 , 等. 面向舆情事件的子话题标签生成模型ET-TAG[J]. 计算机学报, 2018,41(7):1490-1503.
[13] ( Zhou Nan, Du Pan, Jin Xiaolong , et al. ET-TAG: A Tag Generation Model for the Sub-Topic of Public Opinion Events[J]. Chinese Journal of Computers, 2018,41(7):1490-1503.)
[14] 郑伟, 张宇, 邹博伟 , 等. 基于相关性模型的中文话题跟踪研究[C]// 第九届全国计算语言学学术会议论文集. 中国中文信息学会, 2007: 558-563.
[14] ( Zheng Wei, Zhang Yu, Zou Bowei , et al. Research of Chinese Topic Tracking Based on Relevance Model[C]// Proceedings of the 9th China National Conference on Computational Linguistics. Chinese Information Processing Society of China, 2007: 558-563.)
[1] Chen Xingyue, Ni Liping, Ni Zhiwei. Extracting Financial Events with ELECTRA and Part-of-Speech[J]. 数据分析与知识发现, 2021, 5(7): 36-47.
[2] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[3] Zhao Tianzi, Duan Liang, Yue Kun, Qiao Shaojie, Ma Zijuan. Generating News Clues with Biterm Topic Model[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[4] Wu Shengnan, Pu Hongjun, Tian Ruonan, Liang Wenqi, Yu Qi. Network Structure’s Impacts on Link Prediction Algorithm from Meta-Analysis Perspective[J]. 数据分析与知识发现, 2021, 5(11): 102-113.
[5] Yin Haoran,Cao Jinxuan,Cao Luzhe,Wang Guodong. Identifying Emergency Elements Based on BiGRU-AM Model with Extended Semantic Dimension[J]. 数据分析与知识发现, 2020, 4(9): 91-99.
[6] Yu Chuanming,Yuan Sai,Zhu Xingyu,Lin Hongjun,Zhang Puliang,An Lu. Research on Deep Learning Based Topic Representation of Hot Events[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[7] Liang Yanping,An Lu,Liu Jing. Topic Resonance of Micro-blogs on Similar Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[8] Liu Yuwen,Wang Kai. Finding Geographic Locations of Popular Online Topics[J]. 数据分析与知识发现, 2020, 4(2/3): 173-181.
[9] Sun Xinrui,Meng Yu,Wang Wenle. Identifying Traffic Events from Weibo with Knowledge Graph and Target Detection[J]. 数据分析与知识发现, 2020, 4(12): 136-147.
[10] Huang Wei,Zhao Jiangyuan,Yan Lu. Empirical Research on Topic Drift Index for Trending Network Events[J]. 数据分析与知识发现, 2020, 4(11): 92-101.
[11] Wang Yi,Shen Zhe,Yao Yifan,Cheng Ying. Domain-Specific Event Graph Construction Methods:A Review[J]. 数据分析与知识发现, 2020, 4(10): 1-13.
[12] Wang Ling,Dai Qianjin,Wu Xiaojun. The Study on the Temporal and Spatial Distribution of Event Tourism Based on Large-scale Tourism Early Warning Platform[J]. 数据分析与知识发现, 2018, 2(8): 31-40.
[13] Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying. Extracting Names of Historical Events Based on Chinese Character Tags[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[14] Wang Jiaqi,Zhang Junsheng,Qiao Xiaodong. Analyzing Representation and Semantic Links of Scientific Research Events[J]. 数据分析与知识发现, 2018, 2(5): 32-39.
[15] Cen Yonghua,Zhang Can,Wu Chengyao. Media Information and Overtrading——An Empirical Study on Micro-Blog Posts, Industry News and Company Announcements[J]. 数据分析与知识发现, 2018, 2(4): 20-28.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn