Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (10): 25-32     https://doi.org/10.11925/infotech.1003-3513.2014.10.05
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
专题知识库中文本聚类结果的可视化研究——以中华烹饪文化知识库为例
许鑫, 洪韵佳
华东师范大学商学院信息学系 上海 200241
Study on Text Visualization of Clustering Result for Domain Knowledge Base —— Take Knowledge Base of Chinese Cuisine Culture as the Object
Xu Xin, Hong Yunjia
Department of Information Science, Business School, East China Normal University, Shanghai 200241, China
全文: PDF (2657 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 通过对专题知识库中文本资源的可视化展现为用户提供更直观的导航。[方法] 在多层次文本聚类生成的资源划分结果的基础上, 通过主题发现、降维处理与可视化展现等步骤, 实现专题知识库中文本资源的可视化导航。[结果] 提出一种TF-ICF主题词抽取算法, 并综合利用优化的树图与散点图实现专题知识库的可视化展现, 帮助用户便捷地了解知识库概况、定位所需关注的主题、理清各资源间的关联。[局限] 在可视化展现过程中存在部分人工干预, 知识库可视化展现的交互性仍有待改善。[结论] 提出的可视化方法能较好地应用于专题知识库的资源展现, 对进一步优化专题知识库的用户体验有重要意义。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
洪韵佳
许鑫
关键词 文本可视化文本聚类专题知识库中华烹饪文化    
Abstract

[Objective] An intuitive navigation is provided to users by the text visualization of clustering results in the domain knowledge base. [Methods] The visual navigation of the texts in the domain knowledge base is realized by the procedures of topic discovery, dimensional reduction and visual display based on the automatic multi-level text organization by clustering. [Results] An algorithm of topic extraction named TF-ICF is put forward, and the visual display of domain knowledge base is realized by the optimized tree map and scatter diagram to help users know about the overview of knowledge base, find the required topics, understand the relation between different texts. [Limitations] The visual display partly depends on the manual participation, and the interaction of the visualization needs to optimize further. [Conclusions] The visualization method is applied successfully in domain knowledge base and helps to optimize the users' experiences further.

Key wordsText visualization    Text clustering    Domain knowledge base    Chinese cuisine culture
收稿日期: 2014-05-07      出版日期: 2014-11-28
:  G250.7  
基金资助:

本文系国家社会科学基金青年项目"联合虚拟参考咨询系统的知识库研究"(项目编号:11CTQ003)的研究成果之一。

通讯作者: 许鑫 E-mail: xxu@infor.ecnu.edu.cn     E-mail: xxu@infor.ecnu.edu.cn
作者简介: 作者贡献声明: 许鑫: 提出研究思路, 设计研究方案, 负责最终版本修订; 洪韵佳: 进行实验, 采集数据, 起草论文。
引用本文:   
许鑫, 洪韵佳. 专题知识库中文本聚类结果的可视化研究——以中华烹饪文化知识库为例[J]. 现代图书情报技术, 2014, 30(10): 25-32.
Xu Xin, Hong Yunjia. Study on Text Visualization of Clustering Result for Domain Knowledge Base —— Take Knowledge Base of Chinese Cuisine Culture as the Object. New Technology of Library and Information Service, 2014, 30(10): 25-32.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.10.05      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I10/25

[1] 张鸣. 知识服务方式之一——构建学科专题知识库[J]. 图书馆学刊, 2006, 28(3): 108-110. (Zhang Ming. One Model of Knowledge Service in Network Era——Constructing Knowledge Storehouse of Specialized Subject [J]. Journal of Library Science, 2006, 28(3): 108-110.)
[2] 钱智勇. 基于本体的专题域知识库系统设计与实现——以张謇研究专题知识库系统实现为例[J]. 情报理论与实践, 2006, 29(4): 476-479. (Qian Zhiyong. Design & Realization of the Ontology-based Subject Domain Knowledge Base System [J]. Information Studies: Theory & Application, 2006, 29(4): 476-479.)
[3] 闫洪森, 张野, 孙娜, 等. 基于本体的知识库构建方法[J]. 情报科学, 2007, 25(9): 1398-1400, 1408. (Yan Hongsen, Zhang Ye, Sun Na, et al. Construction Method of Knowledge Database Based on Ontology[J]. Information Science, 2007, 25(9): 1398-1400, 1408.)
[4] 许鑫, 郭金龙. 基于领域本体的专题库构建——以中华烹饪文化知识库为例[J]. 现代图书情报技术, 2013(12): 2-9. (Xu Xin, Guo Jinlong. Construction of Subject Knowledge Base ——Taking the Domain of Chinese Cuisine Culture as an Example [J]. New Technology of Library and Information Service, 2013(12): 2-9.)
[5] 洪韵佳, 许鑫. 基于领域本体的知识库多层次文本聚类研究——以中华烹饪文化知识库为例[J]. 现代图书情报技术, 2013(12): 19-26. (Hong Yunjia, Xu Xin. Study on Multi-level Text Clustering for Knowledge Base Based on Domain Ontology——Taking Knowledge Base of Chinese Cuisine Culture as an Example [J]. New Technology of Library and Information Service, 2013(12): 19-26.)
[6] Don A, Zheleva E, Gregory M, et al. Discovering Interesting Usage Patterns in Text Collections: Integrating Text Mining with Visualization[C]. In: Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM'07). New York: ACM, 2007: 213-222.
[7] Luo D, Yang J, Krstajic M, et al. EventRiver: Visually Exploring Text Collections with Temporal References [J]. IEEE Transactions on Visualization and Computer Graphics, 2012, 18(1): 93-105.
[8] Pearson K. Onlines and Planes of Closest Fit to Systems of Points in Space [J]. Philosophical Magazine, 1901, 2(6): 559-572.
[9] Scholkopf B, Smola A, Muller K. Nonlinear Component Analysis as a Kernel Eigenvalue Problem[J]. Neural Computation, 1998, 10(5): 1299-1319.
[10] 冯燕, 何明一, 宋江红, 等. 基于独立成分分析的高光谱图像数据降维及压缩[J]. 电子与信息学报, 2007, 29(12): 2871-2875. (Feng Yan, He Mingyi, Song Jianghong, et al. ICA-Based Dimensionality Reduction and Compression of Hyperspectral Images [J]. Journal of Electronics & Information Technology, 2007, 29(12): 2871-2875.)
[11] Pu J, Kalyanaraman Y, Jayanti S, et a1. Navigation and Discovery in 3D CAD Repositories [J]. IEEE Computer Graphics and Applications, 2007, 27(4): 38-47.
[12] Jee T, Lee H, Lee Y. Visualization of Document Retrieval Using External Cluster Relationship [J]. Journal of Information Science and Engineering, 2013, 29 (1): 35-48.
[13] 任永功. 面向聚类的数据可视化方法及相关技术研究[D]. 沈阳: 东北大学, 2006. (Ren Yonggong. Study on Data Visualization Methods and Related Techniques for Clustering[D]. Shenyang: Northeastern University, 2006.)
[14] 薛浩, 马静, 朱恒民, 等. 基于SOM聚类的文本挖掘知识展现可视化研究[J]. 情报理论与实践, 2009, 32(7): 120-123. (Xue Hao, Ma Jing, Zhu Hengmin, et al. Research on Knowledge Visualization of Text Mining Based on SOM Cluster [J]. Information Studies: Theory & Application, 2009, 32(7): 120-123.)
[15] 杨钤雯, 寇纪淞, 陈富赞, 等. 基于本体的语义网络会话聚类和可视化方法[J]. 模式识别与人工智能, 2011, 24(1): 111-116. (Yang Qianwen, Kou Jisong, Chen Fuzan, et al. Semantic Web Session Clustering and Visualization Method Based on Ontology [J]. Pattern Recognition and Artificial Intelligence, 2011, 24(1): 111-116.)
[16] 任永功, 于戈. 一种多维数据的聚类算法及其可视化研究[J]. 计算机学报, 2005, 28(11): 1861-1865. (Ren Yonggong, Yu Ge. Clustering for Multi-Dimensional Data and Its Visualization[J]. Chinese Journal of Computers, 2005, 28(11): 1861-1865.)
[17] Krishman M, Bohn S, Cowley W, et al. Scalable Visual Analytics of Massive Textual Datasets [C]. In: Proceedings of the 21st International Parallel and Distributed Processing Symposium, Long Beach, CA, US. IEEE, 2007: 26-30.
[18] 王伟. 基于网络信息的热点事件发现与分析研究——以创业板上市公司为例[D]. 上海: 华东师范大学, 2011. (Wang Wei. Hot Event Detection and Analysis Based on Internet Information - Case Studies on GEM Listed Companies [D]. Shanghai: East China Normal University, 2011.)
[19] Tirunagari S, Hänninen M, Stählberg K, et al. Mining Causal Relations and Concepts in Maritime Accidents Investigation Reports[C]. In: Proceedings of International Conference cum Exhibition on Technology of the Sea, Visakhapatnam, India. 2012: 548-566.
[20] 赵琦, 张智雄, 孙坦, 等. 主题发现技术方法研究[J]. 情报理论与实践, 2009, 32(4): 104-108. (Zhao Qi, Zhang Zhixiong, Sun Tan, et al. Study on Topic Discovery Technology [J]. Information Studies: Theory & Application, 2009, 32(4): 104-108.)
[21] 王小华, 徐宁, 谌志群. 基于共词分析的文本主题词聚类与主题发现[J]. 情报科学, 2011, 29 (11): 1621-1624. (Wang Xiaohua, Xu Ning, Chen Zhiqun. Discovering of Subjects and Clustering of Textual Subject Terms Based on Co-word Analysis[J]. Information Science, 2011, 29(11): 1621-1624.)
[22] Fortuna B, Mladenic D, Crobelnik M. Semi-automatic Construction of Topic Ontologies [C]. In: Proceedings of the 2005 Joint International Conference on Semantics, Web and Mining (EWMF'05/KDO'05). Berlin, Heidelberg: Springer- Verlag, 2006: 121-131.
[23] 钟伟金, 李佳. 共词分析法研究(一)——共词分析的过程与方式[J]. 情报杂志, 2008 (5): 70-72. (Zhong Weijin, Li Jia. The Research of Co-word Analysis (1) - the Process and Methods of Co-word Analysis [J]. Journal of Information, 2008 (5): 70-72.)
[24] 马连浩. Web文本聚类技术及聚类结果可视化研究[D]. 大连: 大连交通大学, 2008. (Ma Lianhao. Research of Web Text Clustering Technology and Clustering Result Visualization [D]. Dalian: Dalian Jiaotong University, 2008.)

[1] 赵华茗,余丽,周强. 基于均值漂移算法的文本聚类数目优化研究 *[J]. 数据分析与知识发现, 2019, 3(9): 27-35.
[2] 杨亚楠,赵文辉,张健,谭珅,张贝贝. 基于多视图协同的政策文本可视化研究*[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[3] 陆泉,朱安琪,张霁月,陈静. 中文网络健康社区中的用户信息需求挖掘研究*——以求医网肿瘤板块数据为例[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[4] 张涛, 马海群. 一种基于LDA主题模型的政策文本聚类方法研究*[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[5] 官琴, 邓三鸿, 王昊. 中文文本聚类常用停用词表对比研究*[J]. 数据分析与知识发现, 2017, 1(3): 72-80.
[6] 陈东沂,周子程,蒋盛益,王连喜,吴佳林. 面向企业微博的客户细分框架*[J]. 现代图书情报技术, 2016, 32(2): 43-51.
[7] 龚凯乐,成颖,孙建军. 基于参与者共现分析的博文聚类研究*[J]. 现代图书情报技术, 2016, 32(10): 50-58.
[8] 赵华茗. 分布式环境下的文本聚类研究与实现[J]. 现代图书情报技术, 2015, 31(1): 82-88.
[9] 顾晓雪, 章成志. 结合内容和标签的Web文本聚类研究[J]. 现代图书情报技术, 2014, 30(11): 45-52.
[10] 邓三鸿,万接喜,王昊,刘喜文. 基于特征翻译和潜在语义标引的跨语言文本聚类实验分析*[J]. 现代图书情报技术, 2014, 30(1): 28-35.
[11] 赵辉, 刘怀亮. 面向用户生成内容的短文本聚类算法研究[J]. 现代图书情报技术, 2013, 29(9): 88-92.
[12] 何文静, 何琳. 基于社会标签的文本聚类研究[J]. 现代图书情报技术, 2013, 29(7/8): 49-54.
[13] 许鑫, 郭金龙. 基于领域本体的专题库构建——以中华烹饪文化知识库为例[J]. 现代图书情报技术, 2013, (12): 2-9.
[14] 郭金龙, 洪韵佳, 许鑫. 中华烹饪文化领域本体构建及其应用[J]. 现代图书情报技术, 2013, (12): 10-18.
[15] 洪韵佳, 许鑫. 基于领域本体的知识库多层次文本聚类研究——以中华烹饪文化知识库为例[J]. 现代图书情报技术, 2013, (12): 19-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn