Please wait a minute...
Advanced Search
数据分析与知识发现  2019, Vol. 3 Issue (7): 14-22     https://doi.org/10.11925/infotech.2096-3467.2018.1098
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于多维小波聚类的空间文本数据情感分布分析
李柯1(),佐々木勇和2
1(南京大学信息管理学院 南京 210046)
2(大阪大学大学院情报科学研究科 大阪 565-0871)
Analyzing Sentiment Distribution with Spatial-textual Data of Multi-dimensional Clustering
Ke Li1(),Yuya Sasaki2
1(School of Information Management, Nanjing University, Nanjing 210046, China)
2(Graduate School of Information Science and Technology, Osaka University, Osaka 565-0871, Japan)
全文: PDF (2259 KB)   HTML ( 17
输出: BibTeX | EndNote (RIS)      
摘要 

目的】构建基于多维小波聚类的空间文本数据情感分析模型, 实现文本情感和空间位置的综合分析。【方法】将Yelp数据集进行整合以构建空间文本数据库, 使用基于词典的情感分析方法构建特征向量。提出使用多维小波聚类的混合算法和文本-空间算法两种模型并进行分析。【结果】实验结果验证了使用db2和bior2.2小波基函数的多维小波聚类算法比DBSCAN和K-means算法在空间文本数据挖掘中能识别出更精确的聚类集合, 且在十万级至千万级数据聚类中速度最佳。【局限】情感分析部分使用一元语言模型, 缺乏对语句层面意义的分析。【结论】本文所提文本-空间算法模型能有效挖掘多维空间文本数据的情感倾向分布; 混合算法模型为空间文本数据推荐系统提供了同时计算空间接近性和情感相似性的有效方案。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李柯
佐々木勇和
关键词 空间文本数据情感分布分析小波变换聚类    
Abstract

[Objective] This paper builds a spatial-textual sentiment analyzing model based on multi-dimensional WaveCluster, aiming to analyze text sentiment and spatial position effectively. [Methods] First, we integrated several datasets from Yelp to build spatial-textual database. Then, we used lexicon-based sentiment analysis to generate feature vector. Third, we proposed a new method using Hybrid model, Textual-Spatial model, as well as multi-dimensional clustering model to analyze the data. [Results] We found that multi-dimensional clustering based on db2 or bior2.2 wavelet can recognize clusters more accurately than DBSCAN and K-means on spatial-textual feature mining. It also achieved the highest speed for data at 100 thousand to 10 million levels. [Limitations] We used unigram model for sentiment analysis, which cannot analyze sentences. [Conclusions] The proposed Textual-Spatial model could find out sentiment tendency distribution from spatial-textual data effectively. The Hybrid model provides a new approach for spatial-textual recommend system to calculate sentiment similarity and spatial proximity simultaneously.

Key wordsSpatial-Textual Data    Sentiment Distribution Analysis    Wavelet Transform    Clustering
收稿日期: 2018-10-08      出版日期: 2019-09-06
ZTFLH:  G35  
通讯作者: 李柯     E-mail: LIKE950905@163.com
引用本文:   
李柯,佐々木勇和. 基于多维小波聚类的空间文本数据情感分布分析[J]. 数据分析与知识发现, 2019, 3(7): 14-22.
Ke Li,Yuya Sasaki. Analyzing Sentiment Distribution with Spatial-textual Data of Multi-dimensional Clustering. Data Analysis and Knowledge Discovery, 2019, 3(7): 14-22.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.1098      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2019/V3/I7/14
  基于多维小波聚类的空间文本数据情感分析
  聚类算法时间比较
  基于Bing词典的文本-空间算法与混合算法比较
  基于NRC词典的文本-空间算法与混合算法比较
  空间文本特征向量混合算法的实际应用举例
[1] Sheikholeslami G, Chatterjee S, Zhang A. WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases [C]// Proceedings of the 24th International Conference on Very Large Data Bases. 1998: 428-439.
[2] Hu H, Li G, Bao Z , et al. Top-k Spatio-Textual Similarity Join[J]. IEEE Transactions on Knowledge and Data Engineering, 2015,28(2):551-565.
[3] 胡卉芪 . 空间文本数据的量质融合与推送[D]. 北京: 清华大学, 2016.
[3] ( Hu Huiqi . A Study on Effective Spatio-Textual Data Integration and Delivery[D]. Beijing: Tsinghua University, 2016.)
[4] 刘思彤 . 空间文本数据的查询处理技术研究[D]. 北京: 清华大学, 2015.
[4] ( Liu Sitong . Key Techniques of Spatio-Textual Query Processing[D]. Beijing: Tsinghua University, 2015.)
[5] Vaid S, Jones C B, Joho H, et al. Spatio-Textual Indexing for Geographical Search on the Web [C]// Proceeding of the 9th International Symposium on Advances in Spatial and Temporal Databases. 2005: 218-235.
[6] Wu D, Jensen C S. A Density-Based Approach to the Retrieval of Top-K Spatial Textual Clusters [C]// Proceedings of the 25th ACM International Conference on Information and Knowledge Management. 2016: 2095-2100.
[7] Arya K K, Goyal V, Navathe S B, et al. Mining Frequent Spatial-Textual Sequence Patterns [C]// Proceedings of the 20th International Conference on Database Systems for Advanced Applications. 2015: 123-138.
[8] Agrawal R, Srikant R. Mining Sequential Patterns [C]// Proceedings of the 11th International Conference on Data Engineering. 1995: 3-14.
[9] Huang W, Li S, Xu S . A Three-Step Spatial-Temporal- Semantic Clustering Method for Human Activity Pattern Analysis[J]. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2016,41(B2):549-552.
[10] Cambria E, Schuller B, Xia Y , et al. New Avenues in Opinion Mining and Sentiment Analysis[J]. IEEE Intelligent Systems, 2013,28(2):15-21.
[11] 周立柱, 贺宇凯, 王建勇 . 情感分析研究综述[J]. 计算机应用, 2008,28(11):2725-2728.
[11] ( Zhou Lizhu, He Yukai, Wang Jianyong . Survey on Research of Sentiment Analysis[J]. Computer Applications, 2008,28(11):2725-2728.)
[12] Turney P D. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews [C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002: 417-424.
[13] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment Classification Using Machine Learning Techniques [C]// Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. 2002: 79-86.
[14] Mohammad S M, Turney P D. Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon [C]// Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. 2010: 26-34.
[15] Liu B, Hu M, Cheng J. Opinion Observer: Analyzing and Comparing Opinions on the Web [C]// Proceedings of the 14th International Conference on World Wide Web. 2005: 342-351.
[16] Zhang T, Ramakrishnan R, Livny M. Birch: An Efficient Data Clustering Method for Very Large Databases [C]// Proceedings of the 1996 ACM Special Interest Group on Management of Data International Conference on Management of Data. 1996,25(2):103-114.
[17] Ester M, Kriegel H P, Sander J, et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise [C]// Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 1996: 226-231.
[18] Wang W, Yang J, Muntz R R. STING: A Statistical Information Grid Approach to Spatial Data Mining [C]// Proceedings of the 23rd International Conference on Very Large Data Bases. 1997: 186-195.
[19] Horn B K P . Robot Vision[M]. MIT Press, 1986.
[1] 王若琳, 牛振东, 蔺奇卡, 朱一凡, 邱萍, 陆浩, 刘东磊. 基于异质信息嵌入与RNN聚类参数预测的作者姓名消歧方法*[J]. 数据分析与知识发现, 2021, 5(8): 13-24.
[2] 王晰巍,贾若男,韦雅楠,张柳. 多维度社交网络舆情用户群体聚类分析方法研究*[J]. 数据分析与知识发现, 2021, 5(6): 25-35.
[3] 卢利农,祝忠明,张旺强,王小春. 基于Lingo3G聚类算法的机构知识库跨库知识整合与知识指纹服务实现[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[4] 张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[5] 丁浩, 艾文华, 胡广伟, 李树青, 索炜. 融合用户兴趣波动时序的个性化推荐模型*[J]. 数据分析与知识发现, 2021, 5(11): 45-58.
[6] 杨辰, 陈晓虹, 王楚涵, 刘婷婷. 基于用户细粒度属性偏好聚类的推荐策略*[J]. 数据分析与知识发现, 2021, 5(10): 94-102.
[7] 于丰畅,程齐凯,陆伟. 基于几何对象聚类的学术文献图表定位研究[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[8] 温萍梅,叶志炜,丁文健,刘颖,徐健. 命名实体消歧研究进展综述*[J]. 数据分析与知识发现, 2020, 4(9): 15-25.
[9] 邬金鸣,侯跃芳,崔雷. 基于医学主题词标引规则的词共现聚类分析结果自动判读和表达的研究[J]. 数据分析与知识发现, 2020, 4(9): 133-144.
[10] 席运江, 杜蝶蝶, 廖晓, 仉学红. 基于超网络的企业微博用户聚类研究及特征分析*[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[11] 杨旭,钱晓东. 基于改进的Vicsek模型的社会网络同步聚类算法*[J]. 数据分析与知识发现, 2020, 4(4): 119-128.
[12] 熊回香,李晓敏,李跃艳. 基于图书评论属性挖掘的群组推荐研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 214-222.
[13] 魏家泽,董诚,何彦青,刘志辉,彭柯芸. 基于均衡段落和分话题向量的新闻热点话题检测研究*[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
[14] 赵华茗,余丽,周强. 基于均值漂移算法的文本聚类数目优化研究 *[J]. 数据分析与知识发现, 2019, 3(9): 27-35.
[15] 李珊,姚叶慧,厉浩,刘洁,嘎玛白姆. 基于ISA联合聚类的组推荐算法研究 *[J]. 数据分析与知识发现, 2019, 3(8): 77-87.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn