Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (10): 33-41     https://doi.org/10.11925/infotech.1003-3513.2016.10.04
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
动态热门话题的“特征词条本体”自动构建与进化研究*
马静(),何雪枫,简旭文
南京航空航天大学经济与管理学院 南京 210016
Automatically Building “Feature Items Ontology” for Trending Topics
Ma Jing(),He Xuefeng,Jian Xuwen
College of Economic and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
全文: PDF (1887 KB)   HTML ( 26
输出: BibTeX | EndNote (RIS)      
摘要 

目的】设计一种“特征词条本体”的自动构建及进化算法。【应用背景】热门话题产生的时间和话题演化往往是快速的, 且涉及领域广泛, 而现有的本体自动构建研究局限于具体领域的知识表达, 无法有效地对这种动态热门话题进行本体语义支持, 也不能进行有效跟踪与优化。【方法】通过对热门话题中关键事件的内容分析并由特征词组合而成的“特征词条本体”来描述热门话题的方法, 设计一种快速自动生成“特征词条本体”的算法; 在初始本体指导下, 利用话题跟踪结果进行“特征词条本体”进化算法的设计, 以满足不断更新的话题语义表述需求。【结果】针对热门话题“魏则西百度推广事件”, 使用爬虫工具采集11 174条新浪微博作为语料库进行实验, 抽取生成拥有7 421个特征词条、39个特征词节点、781个特征词关系的初始本体, 基于话题跟踪结果进化为拥有24 564个特征词条, 67个特征词节点, 1 818个特征词关系的进化本体, 其漏报率、误报率、损耗代价分别为0.1261, 0.0964, 0.5985, 优于TF-IDF算法。【结论】“特征词条本体”的表述方式明显比单个词汇的本体表述准确率高, 且语义相似度更容易计算, 比较符合动态热门话题的快速语义处理。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
马静
何雪枫
简旭文
关键词 特征词条本体生成本体进化话题跟踪    
Abstract

[Objective] This paper aims to propose an algorithm to build “Feature Items Ontology”. [Context] Trending topics online are constantly changing and involve extensive fields. The existing research on automatically creating Ontology is limited to specific areas, which cannot effectively process the dynamic trending topics. [Methods] First, we analyzed the contents of major events from the trending topics. Second, we designed an algorithm automatically generating the Ontology. Third, with the guidance of initial Ontology, proposed an evolutionary algorithm to track the changing topics. [Results] Using the case of “Wei Zexi and Baidu” as an example, we collected 11,174 Sina Weibo posts to conduct two rounds of experiment. We initially extracted 7,421 feature items, 39 key nodes, and 781 key relationships. For the evolutionary results, we got 24,564 feature items, 67 key nodes, and 1,818 key relations. The missing rates, the false positive rates, and the loss costs were 0.1261, 0.0964 and 0.5985, which were all better than those of the TF-IDF algorithm. [Conclusions] The “Feature Items Ontology” is more accurate than the single word Ontology description, and is easier to calculate the semantic similarity. It is an appropriate method to retrieve semantic information from the dynamic trending topics.

Key wordsFeature items    Ontology generation    Ontology evolution    Topic tracking
收稿日期: 2016-06-12      出版日期: 2016-11-23
基金资助:*本文系国家自然科学基金面上项目“基于演化本体的网络舆情自适应话题跟踪方法研究”(项目编号: 71373123)、江苏高校哲学社会科学研究重点项目“基于超网络的江苏教育微博舆情多元意见演化模型及应用研究”(项目编号: 2015ZDIXM007)、高校重大项目培育基金“基于‘模型-数据双驱动’的复杂社会网络行为大数据分析方法研究”(项目编号: NP201630X)的研究成果之一
引用本文:   
马静,何雪枫,简旭文. 动态热门话题的“特征词条本体”自动构建与进化研究*[J]. 现代图书情报技术, 2016, 32(10): 33-41.
Ma Jing,He Xuefeng,Jian Xuwen. Automatically Building “Feature Items Ontology” for Trending Topics. New Technology of Library and Information Service, 2016, 32(10): 33-41.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.10.04      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2016/V32/I10/33
[1] Studer R, Benjamins V R, Fensel D.Knowledge Engineering: Principles and Methods[J]. Data & Knowledge Engineering, 1998, 25(1): 161-197.
[2] 杜小勇, 李曼, 王珊. 本体学习研究综述[J]. 软件学报, 2006, 17(9): 1837-1847.
[2] (Du Xiaoyong, Li Man, Wang Shan.A Survey on Ontology Learning Research[J]. Journal of Software, 2006, 17(9): 1837-1847.)
[3] 尚新丽. 国外本体构建方法比较分析[J]. 图书情报工作, 2012, 56(4): 116-119.
[3] (Shang Xinli.Comparative Analysis of Foreign Ontology Construction Methods[J]. Library and Information Service, 2012, 56(4): 116-119.)
[4] Lin D, Pantel P.Induction of Semantic Classes from Natural Language Text[C]. In:Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA. 2001: 317-322.
[5] Srivastava S, Lamadrid J G.Extracting an Ontology from a Document Using Singular Value Decomposition [R]. Association of Computer and Information Science and Engineering Departments at Minority Institutions, 2001.
[6] 何婷婷, 张小鹏. 特定领域本体自动构造方法[J]. 计算机工程, 2007, 33(22): 235-237.
[6] (He Tingting, Zhang Xiaopeng.Approach to Automatical Construction of Domain Ontology[J]. Computer Engineering, 2007, 33(22): 235-237.)
[7] He T T, Zhang X P, Ye X H.An Approach to Automatically Constructing Domain Ontology[C]. In: Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, Wuhan, China.2006:150-157.
[8] Lim S Y, Park S B, Lee S J.Constructing an Ontology Based on Terminology Processing [C]. In: Proceedings of the 9th International Conference on Knowledge-Based Intelligent Information and Engineering Systems. Springer, 2005: 304-310.
[9] 马静, 吴一占, 刘思峰. 基于领域本体的信息抽取模式生成与系统实现[J]. 情报学报, 2008, 27(2): 193-198.
[9] (Ma Jing, Wu Yizhan, Liu Sifeng.Domain Ontology-based Information Extraction[J]. Journal of the China Society for Scientific and Technical Information, 2008, 27(2): 193-198.)
[10] 唐爱民, 真溱, 樊静. 基于叙词表的领域本体构建研究[J]. 现代图书情报技术, 2005 (4): 1-5.
[10] (Tang Aimin, Zhen Zhen, Fan Jing.Thesaurus-based Approach to Build Domain Ontology[J]. New Technology of Library and Information Service, 2005 (4): 1-5.)
[11] Chen R C, Chuang C H.Automating Construction of a Domain Ontology Using a Projective Adaptive Resonance Theory Neural Network and Bayesian Network[J]. Expert Systems, 2008, 25(4): 414-430.
[12] 侯鑫, 张旭堂, 金天国, 等. 面向知识与信息管理的领域本体自动构建算法[J]. 计算机集成制造系统, 2011, 17(1): 159-170.
[12] (Hou Xin, Zhang Xutang, Jin Tianguo, et al.Automatic Construction of Domain Ontology Oriented to Knowledge and Information Management[J]. Computer Integrated Manufacturing Systems, 2011, 17(1): 159-170.)
[13] 郑学伟. 基于知识管理的本体自动构建算法研究[J]. 计算机技术与发展, 2014, 24(12): 64-69.
[13] (Zheng Xuewei.Research on Ontology Automatic Construction Algorithm Based on Knowledge Management[J]. Computer Technology and Development, 2014, 24(12): 64-69.)
[14] 马文峰, 杜小勇. 领域本体进化研究[J]. 图书情报工作, 2006, 50(6): 71-75.
[14] (Ma Wenfeng, Du Xiaoyong.A Study on Domain Ontology Evolution[J]. Library and Information Service, 2006, 50(6): 71-75.)
[15] 杜小勇, 马文峰, 武文娟. 学科领域本体的构建与进化——以经济学领域本体为例[J]. 现代图书情报技术, 2007(3): 7-12.
[15] (Du Xiaoyong, Ma Wenfeng, Wu Wenjuan.Construction and Evolution of Discipline Domain Ontology——A Case Study for Economics Domain Ontology[J]. New Technology of Library and Information Service, 2007(3): 7-12.)
[16] 洪宇, 张宇, 刘挺, 等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-96.
[16] (Hong Yu, Zhang Yu, Liu Ting, et al.Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6): 71-96.)
[17] 焦健, 瞿有利. 知网的话题更新与跟踪算法研究[J]. 北京交通大学学报, 2009, 33(5):132-136.
[17] (Jiao Jian, Qu Youli.Algorithm Study of Topic Tracking Based on HowNet and Topic Renewal[J]. Journal of Beijing Jiaotong University, 2009, 33(5):132-136.)
[18] 洪宇, 仓玉, 姚建民, 等. 话题跟踪中静态和动态话题模型的核捕捉衰减[J]. 软件学报, 2012, 23(5): 1101-1119.
[18] (Hong Yu, Cang Yu, Yao Jianmin, et al.Descending Kernel Track of Static and Dynamic Topic Models in Topic Tracking[J]. Journal of Software, 2012, 23(5): 1101-1119.)
[1] 杜小勇,马文峰,武文娟 . 学科领域本体的构建与进化*——以经济学领域本体为例[J]. 现代图书情报技术, 2007, 2(3): 7-12.
[2] 易明,饶洋辉 . 基于点击流数据的用户近期兴趣视图生成方法[J]. 现代图书情报技术, 2006, 1(6): 55-58.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn