Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (6): 83-91    DOI: 10.11925/infotech.2096-3467.2018.0887
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于知识库的图书评论主题抽取研究*
祁瑞华1,2(),周俊艺1,2,郭旭2,刘彩虹2
1(大连外国语大学语言学研究基地 大连 116044)
2(大连外国语大学网络空间多语言大数据智能分析研究中心 大连 116044)
Extracting Book Review Topics with Knowledge Base
Ruihua Qi1,2(),Junyi Zhou1,2,Xu Guo2,Caihong Liu2
1(Linguistics Research Center, Dalian University of Foreign Languages, Dalian 116044, China)
2(Research Center for Multilingual Big Data in Cyberspace, Dalian University of Foreign Languages, Dalian 116044, China)
全文: PDF(1976 KB)   HTML ( 9
输出: BibTeX | EndNote (RIS)      
摘要 

目的】尝试在图书评论主题抽取中引入自然语言语义信息。【方法】将常识知识库的全局语义信息应用到图书评论主题词发现和主题聚类任务中, 自动抽取评论中的显性主题词和隐性主题词。【结果】实验结果表明: 与双向传播算法相比, 基于知识库方法抽取结果的句覆盖率高出30.8%, 主题词汇多样性高出0.36%。以此为基础绘制主题词共词聚类图谱, 结合知识网络中的节点中心度呈现各个类簇中的关键主题词。【局限】由于目前没有成熟的图书评论领域知识库, 本文主题挖掘过程未引入领域知识, 还未达到最理想效果。【结论】基于知识库方法有助于提高图书评论主题抽取的句子覆盖率和主题词汇多样性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
祁瑞华
周俊艺
郭旭
刘彩虹
关键词 知识库图书评论主题抽取    
Abstract

[Objective] This paper tries to extract topics from book reviews with the help of natural language semantics. [Methods] We proposed a method to retrieve the explicit and implicit topic keywords with the global semantic information from common sense knowledge base. [Results] The sentence coverage rate with the knowledge base method and the lexical diversity of the proposed method were 30.8% and 0.36% higher than those of the Double-Propagation algorithm. Then, based on the extracted topic words, we created a cluster map to identify the topic keywords identified by the nodes cluster centrality. [Limitations] There is no domain knowledge base in the field of book reviews. [Conclusions] The proposed method based on Knowledge Base improves the sentence coverage and lexical diversity of topics extracted from book reviews.

Key wordsKnowledge Base    Book Review    Topic Extraction
收稿日期: 2018-08-10     
基金资助:*本文系国家社会科学基金一般项目“典籍英译国外读者网上评论观点挖掘研究”(项目编号: 15BYY028)、大连外国语大学研究创新团队“计算语言学与人工智能创新团队”(项目编号: 2016CXTD06)和辽宁省教育厅一般项目“基于用户行为模式发现的移动情境感知推荐系统研究”(项目编号: 2016JYT01)的研究成果之一
引用本文:   
祁瑞华,周俊艺,郭旭,刘彩虹. 基于知识库的图书评论主题抽取研究*[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base. Data Analysis and Knowledge Discovery, DOI:10.11925/infotech.2096-3467.2018.0887.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0887
[1] 刘君. 试论文献的隐性主题[J]. 图书情报知识, 1996(2): 24-27.
[1] (Liu Jun.On the Implicit Topic of Literature[J]. Documentation, Information and Knowledge, 1996(2): 24-27.)
[2] Hu M, Liu B.Mining and Summarizing Customer Reviews[C]// Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2004: 168-177.
[3] Hu M, Liu B.Mining Opinion Features in Customer Reviews[C]// Proceedings of the 19th National Conference on Artificial Intelligence. 2004: 755-760.
[4] Qiu G, Liu B, Bu J, et al.Opinion Word Expansion and Target Extraction Through Double Propagation[J]. Computational Linguistics, 2011, 37(1): 9-27.
[5] Poria S, Cambria E, Ku L W, et al.A Rule-based Approach to Aspect Extraction from Product Reviews[C]// Proceedings of the 2nd Workshop on Natural Language Processing for Social Media. 2014: 28-37.
[6] Su Q, Xu X, Guo H, et al.Hidden Sentiment Association in Chinese Web Opinion Mining[C]// Proceedings of the 17th International Conference on World Wide Web. ACM, 2008: 959-968.
[7] Jin W, Ho H H.A Novel Lexicalized HMM-based Learning Framework for Web Opinion Mining[C]// Proceedings of the 26th Annual International Conference on Machine Learning. 2009: 465-472.
[8] Poria S, Cambria E, Gelbukh A.Aspect Extraction for Opinion Mining with a Deep Convolutional Neural Network[J]. Knowledge-Based Systems, 2016, 108: 42-49.
[9] Cruz I, Gelbukh A, Sidorov G.Implicit Aspect Indicator Extraction for Aspect Based Opinion Mining[J]. International Journal of Computational Linguistics and Applications, 2014, 5(2): 135-152.
[10] Zhang Y, Zhu W.Extracting Implicit Features in Online Customer Reviews for Opinion Mining[C]// Proceedings of the 22nd International Conference on World Wide Web. 2013: 103-104.
[11] 冯淑芳, 王素格. 面向观点挖掘的汽车评价本体知识库的构建[J]. 计算机应用与软件, 2011, 28(5): 45-47, 105.
[11] (Feng Shufang, Wang Suge.Automobile Reviews Ontology Knowledge Base Construction Oriented Towards Opinion Mining[J].Computer Applications and Software, 2011, 28(5): 45-47, 105.)
[12] 王素格, 李大宇, 李旸. 基于联合模型的商品口碑数据情感挖掘[J]. 清华大学学报: 自然科学版, 2017, 57(9): 926-931.
[12] (Wang Suge, Li Dayu, Li Yang.Sentiment Mining of Commodity Reputation Data Based on Joint Model[J]. Journal of Tsinghua University: Science and Technology, 2017, 57(9): 926-931.)
[13] Zhang P, Gu H, Gartrell M, et al.Group-based Latent Dirichlet Allocation (Group-LDA): Effective Audience Detection for Books in Online Social Media[J]. Knowledge-Based Systems, 2016, 105: 134-146.
[14] Sohail S S, Siddiqui J, Ali R.Book Recommendation System Using Opinion Mining[C]// Proceedings of the 2013 International Conference on Advances in Computing, Communications and Informatics. 2013: 1609-1614.
[15] Sohail S S, Siddiqui J, Ali R.Feature Extraction and Analysis of Online Reviews for the Recommendation of Books Using Opinion Mining Technique[J]. Perspectives in Science, 2016, 8: 754-756.
[16] 陈晓美. 网络评论观点知识发现研究[D]. 长春: 吉林大学, 2014.
[16] (Chen Xiaomei.Study of Knowledge Discovery of Opinions from Web Reviews[D]. Changchun: Jilin University, 2014.)
[17] Cambria E, Chandra P, Sharma A, et al.Do not Feel the Trolls[C]// Proceedings of the CEUR Workshop. 2010.
[18] Rajagopal D, Cambria E, Olsher D, et al.A Graph-based Approach to Commonsense Concept Extraction and Semantic Similarity Detection[C]// Proceedings of the 22nd International Conference on World Wide Web. ACM, 2013: 565-570.
[19] 李锋. 基于核心关键词的聚类分析——兼论共词聚类分析的不足[J]. 情报科学, 2017, 35(8): 68-71, 78.
[19] (Li Feng.Clustering Analysis Based on Core Keyword—— Concurrently Discuss the Deficiency of Co-word Analysis[J]. Information Science, 2017, 35(8): 68-71, 78.)
[20] 傅柱, 王曰芬. 共词分析中术语收集阶段的若干问题研究[J]. 情报学报, 2016, 35(7): 704-713.
[20] (Fu Zhu, Wang Yuefen.A Discussion on Some Questions of Term Collection in Co-Word Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(7): 704-713.)
[21] 胡昌平, 陈果. 科技论文关键词特征及其对共词分析的影响[J]. 情报学报, 2014, 33(1): 23-32.
[21] (Hu Changping, Chen Guo.Characteristics of Keywords in Scientific Papers and Their Impact on Co-Word Analysis[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(1): 23-32.)
[22] Wang Z Y, Li G, Li C Y, et a1. Research on the Semantic-Based Co-Word Analysis[J]. Scientometrics, 2012, 90(3): 855-875.
[23] Waltman L, Van Eck N J. A Smart Local Moving Algorithm for Large-Scale Modularity-Based Community Detection[J]. The European Physical Journal B, 2013, 86(11): 471.
[24] Waltman L, Van Eck N J, Noyons E C M. A Unified Approach to Mapping and Clustering of Bibliometric Networks[J]. Journal of Informetrics, 2010, 4(4): 629-635.
[25] Manning C D.Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics?[C]//Proceedings of the 2011 International Conference on Intelligent Text Processing and Computational Linguistics. Berlin, Heidelberg: Springer, 2011: 171-189.
[26] Chris D P.Another Stemmer[J].ACM SIGIR Forum, 1990, 24(3): 56-61.
[27] SenticNet. Concept Parser[OL]. [2018-01-28]. .
[28] Van Eck N J, Waltman L. Software Survey: VOSviewer, a Computer Program for Bibliometric Mapping[J]. Scientometrics, 2010, 84(2): 523-538.
[29] 杨颖, 崔雷. 基于共词可视化的学科战略情报研究[J]. 情报学报, 2011, 30(3): 325-330.
[29] (Yang Ying, Cui Lei.Subject Strategic Information Research Based on Visualization of Co-Word Network[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(3): 325-330.)
[1] 张旺强,祝忠明,李雅梅,卢利农,刘巍. 机构知识库作者名自动消歧框架设计与实践*[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[2] 吴志强,祝忠明,刘巍,王思丽. CSpace知识分析与可视化功能扩展研究与实践*[J]. 数据分析与知识发现, 2019, 3(3): 112-119.
[3] 吴志强,祝忠明,姚晓娜,王思丽. CSpace机构知识库影音资源支持能力扩展研究与实践*[J]. 数据分析与知识发现, 2017, 1(9): 90-96.
[4] 陈果,肖璐. 网络社区中的知识元链接体系构建研究*[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
[5] 王思丽,刘巍,祝忠明,吴志强,王金平. 基于CSpace的科技信息可配置化自动监测功能设计与实现*[J]. 数据分析与知识发现, 2017, 1(10): 85-93.
[6] 吴志强,祝忠明,刘巍,张旺强,姚晓娜. 机构知识库三维模型检索与展示技术研究与实践*[J]. 数据分析与知识发现, 2017, 1(1): 73-80.
[7] 周鹏程,武川,陆伟. 基于多知识库的短文本实体链接方法研究*——以Wikipedia和Freebase为例[J]. 现代图书情报技术, 2016, 32(6): 1-11.
[8] 王曰芬,傅柱,陈必坤. 采用LDA主题模型的国内知识流研究结构探讨: 以学科分类主题抽取为视角*[J]. 现代图书情报技术, 2016, 32(4): 8-19.
[9] 张旺强,祝忠明,姚晓娜,刘巍. 基于开放获取论文推送转发服务系统iSwitch的机构知识库内容建设*[J]. 现代图书情报技术, 2016, 32(4): 91-96.
[10] 刘峰,黎建辉,张进,韩芳,刘昂. TeamDR:面向科研团队的数据知识库管理系统*[J]. 现代图书情报技术, 2016, 32(3): 82-89.
[11] 郭顺利,张向先. 面向中文图书评论的情感词典构建方法研究[J]. 现代图书情报技术, 2016, 32(2): 67-74.
[12] 翟东升, 刘鹤, 张杰, 蔡力伟. 基于图形数据库的专利语义知识库构建技术研究[J]. 数据分析与知识发现, 2016, 32(12): 66-75.
[13] 钱力, 师洪波, 张晓林, 梁娜. 开放获取论文推送转发服务系统iSwitch: 论文分发推送[J]. 现代图书情报技术, 2015, 31(6): 7-12.
[14] 严潮斌, 陈嘉勇, 侯瑞芳, 李玲, 周婕. 查收查引服务支撑需求驱动下的高校机构知识库建设[J]. 现代图书情报技术, 2015, 31(5): 94-100.
[15] 白海燕. ORCID在机构知识库中的整合介绍[J]. 现代图书情报技术, 2015, 31(3): 8-17.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn